In this notebook, I will generate different visualizations of the Favorite Pokémon Survey results. The data was collected and made public by reddit user mamamia1001. Full credit goes to him/her. I will also be including a few comments on the results and how I interpret them. If you want a more insightful version (i.e., including the code to generate the plots), you can find it here. For more info, take a look at the README file.
Alright, let's get started.
First, let's take a look at the number of votes that each generation got. We can start by looking at the average:
We can get a better insight by looking at the boxplot. The horizontal line of each box represents the median. The diamonds represent the outliers. In other words, the diamonds represent Pokémon with a high number of votes. As we will see in a second, it is clear that no Pokémon from generations 5, 6, or 7 cracks the top preference spots.
It is clear that there were more votes for Pokémon from earlier generations. This is a very interesting result. Does nostalgia play an important role here? It is hard to say. It would be very interesting to have more information on the voters' profile (e.g., age). For instance, older fans could have more preference for Pokémon with which they grew up with. Unfortunately, we don't have those data.
This is probably the plot which most people want to see. Which are the most popular Pokémon?
Now we know who were those outliers from earlier ;) . We can also see which Pokémon families are the most popular:
I suppose it is no surprise that the Eevee family is the most popular given that Eevee is the Pokemon with the most evolutions (plus, let's face it, most of them are quite cool).
We can also take a look at the most popular Pokémon of each generation:
...as well as the top voted Pokémon of each generation:
We can also see Pokémon preference based on their type. Although Pokémon can have several types, for this section we are considering only the main (i.e., first) type of each Pokémon.
We can also take a look at the ranking per type. In this case, all types of a Pokémon were considered (e.g., Charizard is a fire and flying Pokémon, plus one of its megaevolutions has dragon type as well). We might also find some surprises. However, remember that Alolan versions might have different types for the same Pokémon (e.g., Vulpix is fire type, while Alolan Vulpix is ice).
mamamia1001 also generated a plot of the Paretto principle (also known as the 80/20 rule). Originally, it states that roughly 80% of the effects (in this case, votes) come from 20% of the causes (in this case, Pokémon). In this case, the Paretto plot looks like this:
It looks like the Paretto principle kinda works in this case: 20% of the Pokémon received ~72% of the votes.
Finally, we can see how people voted in time overall and for each of the top Pokémon:
There is a very interesting spike in the number of votes which then drops dramatically. I was curious about this. I contacted
mamamia1001 and (s)he believes that the first surge was caused when the post was featured in reddit's front page (r/all). The huge decrease was probably because on the day the survey was posted, reddit had issues and it temporarily lost lots of upvotes around that time. This could, of course, bring some bias to the data... but there's very little we can do about that.
We can also see how each of the top Pokémon were voted in time:
It looks like votes for each of the top Pokémon were more or less cast in the same way.