Who is the best character in Mario Kart? This is actually a non-trivial question, because the characters have widely varying stats across a number of attributes (for the unfamiliar, Mario Kart is a video game where you select characters from the Nintendo universe and race them against each other in cartoonish go-karts). The question is compounded when you consider the modifications introduced by the the various karts and tires players can select from. In general it isn't possible to optimize across multiple dimensions simultaneously, however some setups are undeniably worse than others. The question for an aspiring Mario Kart champion is "How can one pick a character / kart / tire combination that is in some sense optimal, even if there isn't one 'best' option?" To answer this question we turn to one of Mario's compatriots, the nineteenth century Italian economist Vilfredo Pareto who introduced the concept of Pareto efficiency and the related Pareto frontier.
The concept of Pareto efficiency applies to situations where a finite pool of resources is being allocated among several competing groups. A particular allocation is said to be Pareto efficient if it is impossible to increase the portion assigned to any group without also decreasing the portion assigned to some other group. The set of allocations which are Pareto efficient define the Pareto frontier. As with many things, this is more easily explained with a picture (courtesy of wikipedia).
The elements in red lie on the Pareto frontier: for each element in the set an increase along one axis requires a decrease along the other.
We can apply this same concept to Mario Kart: the resources are total stat points and the groups are the individual attributes, for instance, speed, acceleration, or traction. (In general, characters in Mario Kart have the same number of total stat points, and differ only in their allocation). Speed and acceleration are generally the two most important attributes of any given setup, so the goal of this post is to identify those configurations that lie on the Pareto frontier for speed and acceleration.
%matplotlib inline import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns import itertools as it from sklearn.cluster import KMeans sns.set_context('talk')
As usual, we start with a little data wrangling to get things into a form we can use. One particular quirk of Mario Kart is that while there are a couple dozen characters, lots of them have identical stats. We'll start by picking out one character from each stat group to use in this analysis (and then do the same for karts and tires).
# originally from https://github.com/woodnathan/MarioKart8-Stats, added DLC and fixed a few typos bodies = pd.read_csv('bodies.csv') chars = pd.read_csv('characters.csv') gliders = pd.read_csv('gliders.csv') tires = pd.read_csv('tires.csv') # use only stock (non-DLC) characters / karts / tires chars = chars.loc[chars['DLC']==0] bodies = bodies.loc[bodies['DLC']==0] tires = tires.loc[tires['DLC']==0] gliders = gliders.loc[gliders['DLC']==0] stat_cols = bodies.columns[2:-1] main_cols = ['Weight','Speed','Acceleration','Handling','Traction'] # lots of characters/karts/tires are exactly the same. here we just want one from each stat type chars_unique = chars.drop_duplicates(subset=stat_cols).set_index('Character')[stat_cols].sort('Weight') bodies_unique = bodies.drop_duplicates(subset=stat_cols).set_index('Body')[stat_cols].sort('Acceleration') tires_unique = tires.drop_duplicates(subset=stat_cols).set_index('Tire')[stat_cols].sort('Speed') n_uniq_chars = len(chars_unique) n_uniq_bodies = len(bodies_unique) n_uniq_tires = len(tires_unique) # add a column indicating which category each character/kart/tire is in chars['char_class'] = KMeans(n_uniq_chars, random_state=0).fit_predict(chars[stat_cols]) bodies['body_class'] = KMeans(n_uniq_bodies).fit_predict(bodies[stat_cols]) tires['tire_class'] = KMeans(n_uniq_tires).fit_predict(tires[stat_cols]) # change the character class labels so that they correspond to weight order # without DLC char_class_dict = dict(zip([3, 0, 5, 4, 2, 6, 1], [0, 1, 2, 3, 4, 5, 6])) # with DLC # char_class_dict = dict(zip([0, 3, 2, 7, 8, 4, 1, 6, 5], [0, 1, 2, 3, 4, 5, 6, 7, 8])) chars['char_class'] = chars['char_class'].apply(lambda c: char_class_dict[c]) # only two types of gliders, one of which is pretty clearly just better glider_best = gliders.loc[gliders['Glider']=='Flower']
From here on out, I'll refer to the character (or kart, or tire) class by the name of its first member. For example, in the heatmap below the row labelled 'Peach' also describes the stats for Daisy and Yoshi. The complete class memberships are listed at the end of the post in case you want to see where your favorite character lands.
There are seven classes of characters, let's have a look at how their stats compare.
# plot a heatmap of the stats for each component class fig, ax = plt.subplots(1,1, figsize=(8,5)) sns.heatmap(chars_unique[main_cols], annot=True, ax=ax, linewidth=1, fmt='.3g') fig.tight_layout()
The most obvious trend is the trade-off between speed and acceleration; heavy characters have good speed but bad acceleration, while light characters have snappy acceleration but a low top speed. There are variations in the other stats as well, but to a large extend the speed and acceleration dominate the performace of a particular setup so we'll be ignoring the rest of the stats.
Karts and tires modify the base stats of the characters; the final configuration is a sum of the character's stats and the kart / tire modifiers. As with characters, there are dozens of karts and tires but only a few categories with different stats.
# plot a heatmap of the stats for each component class fig, axes = plt.subplots(2,1, figsize=(8,10)) tables = [bodies_unique, tires_unique] keys = ['Body', 'Tire'] for ax, table, key in zip(axes, tables, keys): sns.heatmap(table[main_cols], annot=True, ax=ax, linewidth=1, fmt='.3g') fig.tight_layout()
The trends here are less obvious, but they generally agree with what we saw in the character stats: improvements in speed come at the expense of acceleration, and vice versa.
Our goal is to find all the configurations that have an optimal combination of speed and acceleration, so the next step is to compute the stats for each unique (character, kart, tire) combination.
def check(char_name, body_type, tire_type): # find the stats for each element of the configuration character = chars.loc[chars['Character']==char_name] kart = bodies.loc[bodies['Body']==body_type] wheels = tires.loc[tires['Tire']==tire_type] # the total stats for the configuration are just the sum of the components stats = pd.concat([character[stat_cols], kart[stat_cols], wheels[stat_cols], glider_best[stat_cols]]).sum() # index the row by the configuration (character, kart, tire) index = pd.MultiIndex.from_tuples([(char_name, body_type, tire_type)], names=['Character', 'Body', 'Tire']) df = pd.DataFrame(stats).transpose() df.index = index return df # generate list of tuples for every possible configuration config_all = it.product(chars_unique.index, bodies_unique.index, tires_unique.index) # generate a dataframe with stats for each unique configuration config_base = pd.DataFrame() for (c,b,t) in config_all: this_config = check(c,b,t) config_base = config_base.append(this_config)
Equipped with the statistics for each possible combination, we can can plot the speed vs the acceleration of each possible setup, and identify those that lie on the Pareto frontier.
# returns True if the row is at the pareto frontier for variables xlabel and ylabel def is_pareto_front(row, xlabel, ylabel): x = row[xlabel] y = row[ylabel] # look for points with the same y value but larger x value is_max_x = config_base.loc[config_base[ylabel]==y].max()[xlabel] <= x # look for points with the same x value but larger y value is_max_y = config_base.loc[config_base[xlabel]==x].max()[ylabel] <= y # look for points that are larger in both x and y is_double = len(config_base.loc[(config_base[xlabel]>x) & (config_base[ylabel]>y)])==0 return is_max_x and is_max_y and is_double # array of True/False indicating whether the corresponding row is on the pareto frontier is_pareto = config_base.apply(lambda row: is_pareto_front(row, 'Speed', 'Acceleration'), axis=1) # just the configurations that are on the pareto frontier config_pareto = config_base.ix[is_pareto].sort('Speed')
# plot all the configurations fig, ax = plt.subplots(1,1, figsize=(8,5)) sns.regplot(x='Speed', y='Acceleration', data=config_base, fit_reg=False, ax=ax) # plot the pareto frontier plt.plot(config_pareto['Speed'], config_pareto['Acceleration'], '--', label='Pareto frontier', alpha=0.5) plt.xlim([0.75,6]); plt.legend(loc='best');
Looks like the optimal configurations make up a fairly small subset of the total possible setups. In fact, we can quantify this.
# number of possible combinations print('Possible combinations : ',len(list(it.product(chars.index, bodies.index, tires.index, gliders.index)))) # number of combinations with different statistics print('Unique stat combinations : ',len(config_base.drop_duplicates(subset=stat_cols))) # number of optimal combinations (considering only speed and acceleration) print('Optimal combinations : ',len(config_pareto))
Possible combinations : 149760 Unique stat combinations : 294 Optimal combinations : 15
Let's have a look at what these optimal configurations look like.
Speed Acceleration Character Body Tire Baby Mario Biddybuggy Roller 1.00 5.75 Toad Biddybuggy Roller 1.50 5.50 Peach Biddybuggy Roller 2.00 5.25 Mario Biddybuggy Roller 2.50 5.00 Donkey Kong Biddybuggy Roller 3.00 4.75 Wario Biddybuggy Roller 3.50 4.50 Donkey Kong Sports Bike Roller 3.75 4.25 Wario Sports Bike Roller 4.25 4.00 Wood 4.50 3.25 Biddybuggy Slick 4.50 3.25 Donkey Kong Sports Bike Slick 4.75 3.00 Wario Gold Standard Roller 4.75 3.00 Sports Bike Standard 4.75 3.00 Slick 5.25 2.75 Gold Standard Slick 5.75 1.75
Unless you're going all-in on acceleration, it looks like a heavy character is the way to go; the two heaviest character classes (Wario and Donkey Kong) account for 11/15 of the Pareto-optimal configurations.
We can also look at the other main stats for each of these configurations.
fig, ax = plt.subplots(1,1, figsize=(8,7)) sns.heatmap(config_pareto[main_cols].sort('Speed'), annot=True, ax=ax, linewidth=1, fmt='.3g');
So there it is, if speed and acceleration are your main concerns then one of these 15 configurations is your best bet.
Sometimes an optimal configuration isn't what you're looking for though (say, because your roommate threatened to stop playing if there wasn't some sort of handicap, to choose a random example). In that case, we can explore all the possible configurations with a quick bokeh interactive graphic. I'll omit the code here, but you can find it in the notebook for this post.
# note: needs modifications from https://github.com/josherick/bokeh/tree/2715_add_callbacks_to_groups to work from bokeh.io import output_notebook, show from bokeh.plotting import figure, ColumnDataSource from bokeh.models import HoverTool, CustomJS from bokeh.models.widgets import CheckboxButtonGroup from bokeh.models.widgets import Dropdown from bokeh.io import output_file, show, vform output_notebook() output_file('bokeh_plot.html')