ML Odyssey: Part 5 - Exploratory Data Analysis with Pokemon Dataset II

Q5. Are migration patterns different for Team Rocket members?
- 📈 Graph: Violin plot of migration counts per group.
- 🌍 Test: Mann-Whitney U test.
|
|
Mann-Whitney U statistic: 1510978.50
p-value: 0.0000
Descriptive Statistics: Team Rocket members - Median migrations: 16.0 Non-Team Rocket members - Median migrations: 12.0 Team Rocket members - Mean migrations: 16.6 Non-Team Rocket members - Mean migrations: 12.5 Result: 🏆 Significant migration pattern differences found between Team Rocket and non-Team Rocket members.
Q6. Do Rocket members tend to avoid charity participation?
- 📊 Graph: Grouped bar chart of charity participation rates.
- 🕵️♂️ Test: Fisher’s Exact Test for small sample sizes.
|
|

Q7. Do Rocket members disguise themselves in certain professions?
- 📊 Graph: Horizontal bar chart of profession frequency per group.
- 🕵️♂️ Test: Chi-square test for profession-Team Rocket relationship.
Chart Analysis: Profession Distribution Visualization
|
|

Statistical Analysis: Chi-Square Test and Effect Size
|
|
Q8. Is there an unusual cluster of Rocket members in specific cities?
- 🗺 Graph: Geographic heatmap of city distributions.
- 📌 Test: Spatial autocorrelation test.
To answer this question properly, we need to go beyond simple statistics and consider the geographic layout of cities. Let’s break this analysis into clear steps:
Q8.1. Initial Data Overview
Let’s start by examining the distribution of Team Rocket members across cities with their actual coordinates:
|
|

Q8.2. Standard Statistical Test (Chi-Square)
Now let’s answer the core question: Are there unusual clusters? Let’s use the chi-square test to determine if the clustering is significant.
|
|
🤔 Why Can the Chi-square Test Be Misleading?
- At first glance, some cities look way more “infiltrated” by Team Rocket than others.
- So why doesn’t the Chi-square test always confirm this?
👉 Because it only checks the amount in each city, not where those cities are! - The Chi-square test:
- ✅ Sees if numbers are different between cities
- ❌ Ignores the map—doesn’t care if cities are neighbors or far apart
Example:
- Imagine:
- Scenario 1: Rocket members are scattered randomly
- Scenario 2: They’re all packed into a few neighboring cities
- The Chi-square test treats both the same!
(It can’t “see” the map, just a list of city names.)
⚠️ Bottom line:
If you want to know if there’s a geographic cluster, the Chi-square test might miss it.
It’s a classic trap: using non-spatial tools for spatial questions can hide the real story!
Q8.3. Spatial Autocorrelation Analysis (Moran’s I)
🗺️ Why Use a Spatial Algorithm? Meet Moran’s I!
- To really check if Team Rocket is clustering in certain areas, we need a test that understands the map!
- Moran’s I is built for this:
- It checks if high (or low) values are grouped together in space
- In our case: “Are cities with lots of Rocket members close to each other, forming a real ‘hot spot’?”
How it helps:
- Moran’s I will flag clusters that the Chi-square test might miss
- It knows which cities are neighbors and looks for patterns on the map
✨ In short:
If you want to know if something is spatially clustered, you need a spatial test.
Moran’s I gives you a direct, map-aware answer!
|
|
Q8.4. Geographic Visualization
Finally, let’s visualize these patterns on the actual Kanto map:
|
|

Q9. How does badge count affect the likelihood of being a Rocket member?
- 📉 Graph: Histogram of gym badge distributions.
- 🏅 Test: Kruskal-Wallis test.
Data Preparation
|
|
Visualization: Badge Distribution Analysis
|
|

Statistical Test: Kruskal-Wallis Analysis
|
|
What’s Next?
In our next post, we’ll:
- Apply these pandas basics to a real-world dataset
- Learn advanced data cleaning techniques
- Create insightful visualizations
- Prepare data for machine learning
📝 Practice Exercise: Try these tasks to reinforce your learning:
- Create a DataFrame with your own data
- Practice different selection methods
- Try grouping and aggregating data
- Experiment with basic plotting
All code examples are available in our ML Odyssey repository.