#!/usr/bin/env python
# coding: utf-8

# # Banning in League of Legends: does it matter?

# In **League of Legends**, an online battle arena game in which players choose a character (a *champion*) with unique abilities and fight together in matches against an opposing team. It is very popular, and has a competitive player base.
# 
# The mechanics of the game aren't important for understanding this demo, but one mechanic is of interest: before each match, each team can choose to ban up to three champion types from the game. Teams may do this because they perceive particular champions as broken (unfair to use), or to give themselves an advantage in the match. However, it's generally not established what, if any, advantage banning a character gives to a team. We seek to ask questions on this topic here.

# ## About the dataset

# Alyssa Adams has pulled data for each match from Riot Games' public API. For each match, we have a directory named with the match's unique ID, and this contains a couple things. For example:

# In[1]:


get_ipython().run_line_magic('ls', 'matches/2353211982')


# This match contains a CSV file giving some of the most interesting match data to Alyssa, which we can parse using `pandas`:

# In[2]:


import pandas as pd


# In[3]:


pd.read_csv('matches/2353211982/match.csv', index_col=0)


# Each row corresponds to a given player in the match, and gives information such as what champion they used, which side they were on, how many kills, deaths, etc., and whether their team won.

# In addition to this file, we also have a funny looking `Treant.*.json` file. This is a `datreant` state file, and it's pure JSON:

# In[4]:


get_ipython().run_line_magic('cat', 'matches/2353211982/Treant.c7633606-6432-4cb8-a673-1faa37f3a56b.json')


# We've used this to store information about the match as a whole, not player-specific data. This contains several `categories` (key-value pairs), giving things like the map, the game type, the duration, and, importantly for this study, which bans were chosen by each team. We could use any old JSON parser to use this file, such as the built-in `json.load` function:

# In[5]:


import json


# In[6]:


with open('matches/2353211982/Treant.c7633606-6432-4cb8-a673-1faa37f3a56b.json', 'r') as f:
    match = json.load(f)
match


# But instead we'll use this in a more refined way, using `datreant`:

# In[7]:


import datreant.core as dtr


# In[8]:


match = dtr.Treant('matches/2353211982')
match


# This file is a `datreant` state file, which stores the state for a `Treant`. This is a library that makes it easy to treat directories in the filesystem as logical units that can be introspected easily and tagged with metadata:

# In[9]:


match.draw()


# In[10]:


match['match.csv'].read()


# In[11]:


match.categories


# In[12]:


match.categories['Map']


# Having our match data in this form lets us ask lots of questions with very little friction. If you're familiar with relational databases, in a way it lets one work with directories as if they were part of a relational database. And the files inside these directories can be **anything**, not just tables like we have here.

# `Treant`s allow introspection into their own trees to any level of detail required, but we can do powerful things with them in aggregate. Let's gather all our matches up into a `Bundle`:

# In[13]:


matches = dtr.discover('matches/')


# In[14]:


len(matches)


# This `Bundle` has a few thousand `Treant`s. It's important to remember that a `Treant` is just a directory in the filesystem with a state file, which serves as:
# 1. a flag denoting the directory as a `Treant`, and
# 2. a store for identifying metadata that we choose.
# 
# The `Treant` object (that is, the Python object in memory) is itself very light. It only really amounts to a reference to the state file, which gets parsed on access of things like `tags` and `categories`.

# A `Bundle` can be used to work with a collection of `Treant`s as if it was a single logical unit:

# In[15]:


matches[:5].draw()


# We can, for example, get back any category keys present among these `Treant`s:

# In[16]:


matches.categories.any.keys()


# And perhaps get back the values for one:

# In[25]:


matches[:10].categories['Mode']


# ## If Teemo is banned, is his team more or less likely to win (or neither)?

# For starters, we'll ask a simple question. If the champion *Teemo* is banned, does this make his team more likely to win? Does it make any difference? First, we need to extract matches in which Teemo is banned. Let's keep blue and red bans separate for now, since this might make answering the question easier later.

# First, we'll get all the matches in which the blue team banned *Teemo*:

# In[18]:


blueban = dtr.Bundle([matches.categories.groupby(ban)['Teemo'] for ban in ('Blue Ban 1', 'Blue Ban 2', 'Blue Ban 3')])


# In[19]:


blueban


# And we'll do the same for the red team:

# In[20]:


redban = dtr.Bundle([matches.categories.groupby(ban)['Teemo'] for ban in ('Red Ban 1', 'Red Ban 2', 'Red Ban 3')])


# In[21]:


redban


# Because `Bundle`s behave as ordered sets, if we wanted all the matches where Teemo was banned by at least one team, then we could get it with:

# In[26]:


redban + blueban


# And if we wanted all matches where Teemo was banned by *both* teams, we could do an intersection of the two:

# In[27]:


redban & blueban


# This is expected, because both teams can't ban the same champion. This is a nice check. :)

# For red team bans, how many of those matches did the red team lose? Looking at what a given match CSV gives us:

# In[28]:


df = pd.read_csv(redban[0]['match.csv'].abspath, index_col=0)
df


# We could get this by checking if the word `'blue'` is in the `'Win'` column for rows where that column gives `True`:

# In[29]:


def blue_win(match):
    df = pd.read_csv(match['match.csv'].abspath, index_col=0)
    return 'blue' in df[df['Win'] == True]['Side'].iloc[0]


# So we could then get back a list of bools telling us which matches blue won among those where the red team banned *Teemo*:

# In[30]:


redban_blue_wins = redban.map(blue_win)
redban_blue_wins


# Did red win more often than they lost? Here are the matches red won:

# In[31]:


len(redban - redban[redban_blue_wins])


# Here are the number of matches red lost:

# In[32]:


len(redban[redban_blue_wins])


# We used this list of bools as a boolean index for our `redban` `Bundle`, here.

# ### What do we get for the case where blue bans Teemo?

# We'll do the same thing for matches in which the blue team banned *Teemo*:

# In[33]:


blueban_blue_wins = blueban.map(blue_win)
blueban_blue_wins


# Did blue win more often than they lost? Here are the matches blue won:

# In[34]:


len(blueban[blueban_blue_wins])


# Here are the number of matches blue lost:

# In[35]:


len(blueban - blueban[blueban_blue_wins])


# Hmmm...maybe not a signal here, but the number of samples is pretty small anyhow to say anything.

# # Who is the most banned character? Does banning this character make your team more likely to win?

# We want to take to task some conventional wisdom that may be floating around the community about which characters you should ban if you want to increase your chances to win. Let's see if this holds water.

# First we need to find the most banned character. Probably lots of ways to get at this, but one thing we can do is a `groupby` on each ban slot as before, for each building a dictionary giving banned champions as keys and the number of matches they were banned in as values:

# In[36]:


counts = pd.DataFrame({ban: {key: len(value) for key, value in matches.categories.groupby(ban).items()} for ban in ('Blue Ban 1', 'Blue Ban 2', 'Blue Ban 3', 'Red Ban 1', 'Red Ban 2', 'Red Ban 3')})
counts


# We can then sum across columns and sort to get a ranking of banned champions across all matches:

# In[37]:


counts.sum(axis=1).sort_values(ascending=False)


# So we have our answer. The most banned is **Rengar**.

# Out of curiosity, what's the distribution of bans look like across champions?

# In[38]:


get_ipython().run_line_magic('matplotlib', 'inline')


# In[39]:


counts.sum(axis=1).sort_values(ascending=False).plot()


# ## Does banning Rengar make you more likely to win?

# We'll do the same analysis we did for *Teemo*, but now for *Rengar*.

# In[40]:


blueban = dtr.Bundle([matches.categories.groupby(ban)['Rengar'] for ban in ('Blue Ban 1', 'Blue Ban 2', 'Blue Ban 3')])
len(blueban)


# In[41]:


redban = dtr.Bundle([matches.categories.groupby(ban)['Rengar'] for ban in ('Red Ban 1', 'Red Ban 2', 'Red Ban 3')])
len(redban)


# For red team bans, how many of those matches did the red team lose?

# Did red win more often than they lost? Here are the matches red won:

# In[42]:


len(redban - redban[redban.map(blue_win)])


# Here are the number of matches red lost:

# In[43]:


len(redban[redban.map(blue_win)])


# ### What do we get for the case where blue bans Rengar?

# Did blue win more often than they lost? Here are the matches blue won:

# In[44]:


len(blueban[blueban.map(blue_win)])


# Here are the number of matches blue lost:

# In[45]:


len(blueban - blueban[blueban.map(blue_win)])


# Hmmm...maybe not a signal here...might not be much point overall to banning Rengar. However, it may be the case that if you ban Rengar and perhaps another character this improves your chances? That's an interesting question for another time. Could it also be the case that banning Rengar levels the playing field?

# # In a game where Rengar wasn't banned, was he picked? Was the team more likely win?

# We already know that banning *Rengar* doesn't seem to make your team more likely to win, but does **not** banning him make a team with *Rengar* more likely to win? Is he even picked by a player?

# First we need matches where Rengar was not banned. We'll iterate through all ban slots, grab out the matches where the banned champion was *Rengar*, then combine these `Bundle`s into a single one. We can then subtract this `Bundle` from all the matches to get only those where *Rengar* was **not** banned:

# In[46]:


rengar_not_banned = matches - dtr.Bundle([matches.categories.groupby(ban)['Rengar'] for ban in ('Blue Ban 1', 'Blue Ban 2', 'Blue Ban 3', 'Red Ban 1', 'Red Ban 2', 'Red Ban 3')])
len(rengar_not_banned)


# So now, which matches was Rengar present in? We can get this like we did before, by checking each match CSV. this time, we just want to know if `'Rengar'` shows up in `'Champion'` column:

# In[47]:


def has_rengar(match):
    df = pd.read_csv(match['match.csv'].abspath, index_col=0)
    return len(df[df['Champion'] == 'Rengar']) == 1


# In[48]:


matches_has_rengar = rengar_not_banned[rengar_not_banned.map(has_rengar)]
len(matches_has_rengar)


# ## If a team had a Rengar, were they more likely to win?

# So, if a team has a *Rengar*, did they also tend to win? We can check the CSV files again to see if the player using *Rengar* was also on the winning team:

# In[49]:


def rengar_won(match):
    df = pd.read_csv(match['match.csv'].abspath, index_col=0)
    return bool(df[df['Champion'] == 'Rengar']['Win'].iloc[0])


# Number of wins:

# In[50]:


len(matches_has_rengar[matches_has_rengar.map(rengar_won, processes=4)])


# Number of losses:

# In[51]:


len(matches_has_rengar - matches_has_rengar[matches_has_rengar.map(rengar_won, processes=4)])


# Doesn't really look like he confers too much of an advantage, but perhaps a slight one?

# ## Other questions we can ask...

# There are tons of questions we can ask, and even complex ones are surprisingly easy to do so given the way we've structured our data, and how `datreant` allows us to work with it. Here's a smattering:
# 1. Is there *any* champion that seems to confer a systematic advantage to the team that has it? If it appears so, can we isolate the effect from other possible factors?
# 2. Is there any *combination* of champion bans that make a team more likely to win? Does it depend heavily on the team?
# 3. Which teams ban the same champions the most? Can we guess why from the data?

# ## Bonus: does having any particular champion on your team appear to give you an advantage?

# We can do a similar analysis as we did for *Rengar* on all champions, and if it's the case that having a particular champion on your team results in more wins on average, we can begin drilling down into more questions to establish *why*. First let's see what we get.

# We'll gather all dataframes into a single, big dataframe with an extra index giving the uuid of the `Treant` for unique identification. This will make writing code that gets the number of wins/losses for a particular champion faster, since we can operate on a single `DataFrame` instead of ~5000:

# In[52]:


match_dfs = [pd.read_csv(match['match.csv'].abspath, index_col=0) for match in matches]
bigdf = pd.concat(match_dfs, keys=matches.uuids, names=['uuids'])


# In[53]:


bigdf


# We'll then write two quick-and-dirty functions (probably better to combine them, but whatever for now), that tell us how many wins a champion saw or how many losses:

# In[54]:


def match_result_champion(champion, win=True):
    df = bigdf[bigdf['Win'] == win]
    df = df[df['Champion'] == champion]
    
    return len(df)


# For example:

# In[55]:


match_result_champion('Rengar', win=True)


# Let's get a full list of the available champions from our dataframe:

# In[56]:


champions = list(bigdf['Champion'].unique())
len(champions)


# Now we can get the number of wins for every champion:

# In[57]:


champion_won = {champion: match_result_champion(champion, win=True) for champion in champions}


# ...and the number of losses:

# In[58]:


champion_lost = {champion: match_result_champion(champion, win=False) for champion in champions}


# So now we can build `pandas` Series objects directly from these, for easy statistics work:

# In[59]:


num_wins = pd.Series(champion_won)


# In[60]:


num_lost = pd.Series(champion_lost)


# And perhaps we can see which characters correlate with more wins (or losses).

# In[61]:


percent_win = ((num_wins - num_lost)*100.0 / (num_wins + num_lost)).sort_values(ascending=False)
percent_win


# In[62]:


ax = percent_win.plot.bar(figsize=(40, 8))

for item in ax.get_xticklabels():
    item.set_rotation(60)


# From here we can start to ask *why* some champions appear advantageous from these match data.