Who is the best character in Mario Kart? This is actually a non-trivial question, because the characters have widely varying stats across a number of attributes (for the unfamiliar, Mario Kart is a video game where you select characters from the Nintendo universe and race them against each other in cartoonish go-karts). The question is compounded when you consider the modifications introduced by the the various karts and tires players can select from. In general it isn't possible to optimize across multiple dimensions simultaneously, however some setups are undeniably worse than others. The question for an aspiring Mario Kart champion is "How can one pick a character / kart / tire combination that is in some sense optimal, even if there isn't one 'best' option?" To answer this question we turn to one of Mario's compatriots, the nineteenth century Italian economist Vilfredo Pareto who introduced the concept of Pareto efficiency and the related Pareto frontier.

The concept of Pareto efficiency applies to situations where a finite pool of resources is being allocated among several competing groups. A particular allocation is said to be Pareto efficient if it is impossible to increase the portion assigned to any group without also decreasing the portion assigned to some other group. The set of allocations which are Pareto efficient define the Pareto frontier. As with many things, this is more easily explained with a picture (courtesy of wikipedia).

The elements in red lie on the Pareto frontier: for each element in the set an increase along one axis requires a decrease along the other.

We can apply this same concept to Mario Kart: the resources are total stat points and the groups are the individual attributes, for instance, speed, acceleration, or traction. (In general, characters in Mario Kart have the same number of total stat points, and differ only in their allocation). Speed and acceleration are generally the two most important attributes of any given setup, so the goal of this post is to identify those configurations that lie on the Pareto frontier for speed and acceleration.

In [82]:
%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import itertools as it

from sklearn.cluster import KMeans

sns.set_context('talk')

As usual, we start with a little data wrangling to get things into a form we can use. One particular quirk of Mario Kart is that while there are a couple dozen characters, lots of them have identical stats. We'll start by picking out one character from each stat group to use in this analysis (and then do the same for karts and tires).

In [57]:
# originally from https://github.com/woodnathan/MarioKart8-Stats, added DLC and fixed a few typos
bodies = pd.read_csv('bodies.csv')
chars = pd.read_csv('characters.csv')
gliders = pd.read_csv('gliders.csv')
tires = pd.read_csv('tires.csv')

# use only stock (non-DLC) characters / karts / tires
chars = chars.loc[chars['DLC']==0]
bodies = bodies.loc[bodies['DLC']==0]
tires = tires.loc[tires['DLC']==0]
gliders = gliders.loc[gliders['DLC']==0]

stat_cols = bodies.columns[2:-1]
main_cols = ['Weight','Speed','Acceleration','Handling','Traction']

# lots of characters/karts/tires are exactly the same. here we just want one from each stat type
chars_unique = chars.drop_duplicates(subset=stat_cols).set_index('Character')[stat_cols].sort('Weight')
bodies_unique = bodies.drop_duplicates(subset=stat_cols).set_index('Body')[stat_cols].sort('Acceleration')
tires_unique = tires.drop_duplicates(subset=stat_cols).set_index('Tire')[stat_cols].sort('Speed')

n_uniq_chars = len(chars_unique)
n_uniq_bodies = len(bodies_unique)
n_uniq_tires = len(tires_unique)

# add a column indicating which category each character/kart/tire is in
chars['char_class'] = KMeans(n_uniq_chars, random_state=0).fit_predict(chars[stat_cols])
bodies['body_class'] = KMeans(n_uniq_bodies).fit_predict(bodies[stat_cols])
tires['tire_class'] = KMeans(n_uniq_tires).fit_predict(tires[stat_cols])

# change the character class labels so that they correspond to weight order
# without DLC
char_class_dict = dict(zip([3, 0, 5, 4, 2, 6, 1], [0, 1, 2, 3, 4, 5, 6]))
# with DLC
# char_class_dict = dict(zip([0, 3, 2, 7, 8, 4, 1, 6, 5], [0, 1, 2, 3, 4, 5, 6, 7, 8]))
chars['char_class'] = chars['char_class'].apply(lambda c: char_class_dict[c])

# only two types of gliders, one of which is pretty clearly just better
glider_best = gliders.loc[gliders['Glider']=='Flower']

From here on out, I'll refer to the character (or kart, or tire) class by the name of its first member. For example, in the heatmap below the row labelled 'Peach' also describes the stats for Daisy and Yoshi. The complete class memberships are listed at the end of the post in case you want to see where your favorite character lands.

There are seven classes of characters, let's have a look at how their stats compare.

In [70]:
# plot a heatmap of the stats for each component class
fig, ax = plt.subplots(1,1, figsize=(8,5))

sns.heatmap(chars_unique[main_cols], annot=True, ax=ax, linewidth=1, fmt='.3g')
    
fig.tight_layout()

The most obvious trend is the trade-off between speed and acceleration; heavy characters have good speed but bad acceleration, while light characters have snappy acceleration but a low top speed. There are variations in the other stats as well, but to a large extend the speed and acceleration dominate the performace of a particular setup so we'll be ignoring the rest of the stats.

Karts and tires modify the base stats of the characters; the final configuration is a sum of the character's stats and the kart / tire modifiers. As with characters, there are dozens of karts and tires but only a few categories with different stats.

In [66]:
# plot a heatmap of the stats for each component class
fig, axes = plt.subplots(2,1, figsize=(8,10))

tables = [bodies_unique, tires_unique]
keys = ['Body', 'Tire']

for ax, table, key in zip(axes, tables, keys):
    sns.heatmap(table[main_cols], annot=True, ax=ax, linewidth=1, fmt='.3g')
    
fig.tight_layout()

The trends here are less obvious, but they generally agree with what we saw in the character stats: improvements in speed come at the expense of acceleration, and vice versa.

Our goal is to find all the configurations that have an optimal combination of speed and acceleration, so the next step is to compute the stats for each unique (character, kart, tire) combination.

In [42]:
def check(char_name, body_type, tire_type):
    # find the stats for each element of the configuration
    character = chars.loc[chars['Character']==char_name]
    kart = bodies.loc[bodies['Body']==body_type]
    wheels = tires.loc[tires['Tire']==tire_type]

    # the total stats for the configuration are just the sum of the components
    stats = pd.concat([character[stat_cols], kart[stat_cols], wheels[stat_cols], glider_best[stat_cols]]).sum()
    
    # index the row by the configuration (character, kart, tire)
    index = pd.MultiIndex.from_tuples([(char_name, body_type, tire_type)], names=['Character', 'Body', 'Tire'])
    
    df = pd.DataFrame(stats).transpose()
    df.index = index
    return df

# generate list of tuples for every possible configuration
config_all = it.product(chars_unique.index, bodies_unique.index, tires_unique.index)

# generate a dataframe with stats for each unique configuration
config_base = pd.DataFrame()
for (c,b,t) in config_all:
    this_config = check(c,b,t)
    config_base = config_base.append(this_config)

Equipped with the statistics for each possible combination, we can can plot the speed vs the acceleration of each possible setup, and identify those that lie on the Pareto frontier.

In [43]:
# returns True if the row is at the pareto frontier for variables xlabel and ylabel
def is_pareto_front(row, xlabel, ylabel):
    
    x = row[xlabel]
    y = row[ylabel]
    
    # look for points with the same y value but larger x value
    is_max_x = config_base.loc[config_base[ylabel]==y].max()[xlabel] <= x
    # look for points with the same x value but larger y value
    is_max_y = config_base.loc[config_base[xlabel]==x].max()[ylabel] <= y
    # look for points that are larger in both x and y
    is_double = len(config_base.loc[(config_base[xlabel]>x) & (config_base[ylabel]>y)])==0
    
    return is_max_x and is_max_y and is_double

# array of True/False indicating whether the corresponding row is on the pareto frontier
is_pareto = config_base.apply(lambda row: is_pareto_front(row, 'Speed', 'Acceleration'), axis=1)

# just the configurations that are on the pareto frontier
config_pareto = config_base.ix[is_pareto].sort('Speed')
In [79]:
# plot all the configurations
fig, ax = plt.subplots(1,1, figsize=(8,5))
sns.regplot(x='Speed', y='Acceleration', data=config_base, fit_reg=False, ax=ax)
            
# plot the pareto frontier
plt.plot(config_pareto['Speed'], config_pareto['Acceleration'], '--', label='Pareto frontier', alpha=0.5)

plt.xlim([0.75,6]);
plt.legend(loc='best');

Looks like the optimal configurations make up a fairly small subset of the total possible setups. In fact, we can quantify this.

In [47]:
# number of possible combinations
print('Possible combinations    : ',len(list(it.product(chars.index, bodies.index, tires.index, gliders.index))))

# number of combinations with different statistics
print('Unique stat combinations : ',len(config_base.drop_duplicates(subset=stat_cols)))

# number of optimal combinations (considering only speed and acceleration)
print('Optimal combinations     : ',len(config_pareto))
Possible combinations    :  149760
Unique stat combinations :  294
Optimal combinations     :  15

Let's have a look at what these optimal configurations look like.

In [81]:
print(config_base.ix[is_pareto][['Speed','Acceleration']].sort('Speed'))
                                    Speed  Acceleration
Character   Body          Tire                         
Baby Mario  Biddybuggy    Roller     1.00          5.75
Toad        Biddybuggy    Roller     1.50          5.50
Peach       Biddybuggy    Roller     2.00          5.25
Mario       Biddybuggy    Roller     2.50          5.00
Donkey Kong Biddybuggy    Roller     3.00          4.75
Wario       Biddybuggy    Roller     3.50          4.50
Donkey Kong Sports Bike   Roller     3.75          4.25
Wario       Sports Bike   Roller     4.25          4.00
                          Wood       4.50          3.25
            Biddybuggy    Slick      4.50          3.25
Donkey Kong Sports Bike   Slick      4.75          3.00
Wario       Gold Standard Roller     4.75          3.00
            Sports Bike   Standard   4.75          3.00
                          Slick      5.25          2.75
            Gold Standard Slick      5.75          1.75

Unless you're going all-in on acceleration, it looks like a heavy character is the way to go; the two heaviest character classes (Wario and Donkey Kong) account for 11/15 of the Pareto-optimal configurations.

We can also look at the other main stats for each of these configurations.

In [74]:
fig, ax = plt.subplots(1,1, figsize=(8,7))
sns.heatmap(config_pareto[main_cols].sort('Speed'), annot=True, ax=ax, linewidth=1, fmt='.3g');

So there it is, if speed and acceleration are your main concerns then one of these 15 configurations is your best bet.

Sometimes an optimal configuration isn't what you're looking for though (say, because your roommate threatened to stop playing if there wasn't some sort of handicap, to choose a random example). In that case, we can explore all the possible configurations with a quick bokeh interactive graphic. I'll omit the code here, but you can find it in the notebook for this post.

In [48]:
# note: needs modifications from https://github.com/josherick/bokeh/tree/2715_add_callbacks_to_groups to work
from bokeh.io import output_notebook, show
from bokeh.plotting import figure, ColumnDataSource
from bokeh.models import HoverTool, CustomJS
from bokeh.models.widgets import CheckboxButtonGroup

from bokeh.models.widgets import Dropdown
from bokeh.io import output_file, show, vform

output_notebook()
output_file('bokeh_plot.html')