Peru road to Russia World Cup 2018

world-cup-18

In [1]:
# Cargamos las librerías
import pandas as pd
import numpy as np

import seaborn as sns
import matplotlib.pyplot as plt

# Formateamos una vista legible
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

Preprocessing FIFA DATA

Load Data from SOFIFA

In [2]:
# Data públicada en Enero en Kaggle
FIFA18 = pd.read_csv('CompleteDataset.csv', low_memory=False)
FIFA18.columns
Out[2]:
Index(['Unnamed: 0', 'Name', 'Age', 'Photo', 'Nationality', 'Flag', 'Overall',
       'Potential', 'Club', 'Club Logo', 'Value', 'Wage', 'Special',
       'Acceleration', 'Aggression', 'Agility', 'Balance', 'Ball control',
       'Composure', 'Crossing', 'Curve', 'Dribbling', 'Finishing',
       'Free kick accuracy', 'GK diving', 'GK handling', 'GK kicking',
       'GK positioning', 'GK reflexes', 'Heading accuracy', 'Interceptions',
       'Jumping', 'Long passing', 'Long shots', 'Marking', 'Penalties',
       'Positioning', 'Reactions', 'Short passing', 'Shot power',
       'Sliding tackle', 'Sprint speed', 'Stamina', 'Standing tackle',
       'Strength', 'Vision', 'Volleys', 'CAM', 'CB', 'CDM', 'CF', 'CM', 'ID',
       'LAM', 'LB', 'LCB', 'LCM', 'LDM', 'LF', 'LM', 'LS', 'LW', 'LWB',
       'Preferred Positions', 'RAM', 'RB', 'RCB', 'RCM', 'RDM', 'RF', 'RM',
       'RS', 'RW', 'RWB', 'ST'],
      dtype='object')

Indicamos los features a trabajar, dejando a un lado el resto

In [3]:
interesting_columns = [
    'Name', 
    'Age',  
    'Nationality', 
    'Overall', 
    'Potential', 
    'Club', 
    'Value', 
    'Wage', 
    'Preferred Positions'
]
FIFA18 = pd.DataFrame(FIFA18, columns=interesting_columns)

Summarize Data

In [4]:
# Vista de la data
FIFA18.head()
Out[4]:
Name Age Nationality Overall Potential Club Value Wage Preferred Positions
0 Cristiano Ronaldo 32 Portugal 94 94 Real Madrid CF €95.5M €565K ST LW
1 L. Messi 30 Argentina 93 93 FC Barcelona €105M €565K RW
2 Neymar 25 Brazil 92 94 Paris Saint-Germain €123M €280K LW
3 L. Suárez 30 Uruguay 92 92 FC Barcelona €97M €510K ST
4 M. Neuer 31 Germany 92 92 FC Bayern Munich €61M €230K GK
In [5]:
# Descripción de los features
FIFA18.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17981 entries, 0 to 17980
Data columns (total 9 columns):
Name                   17981 non-null object
Age                    17981 non-null int64
Nationality            17981 non-null object
Overall                17981 non-null int64
Potential              17981 non-null int64
Club                   17733 non-null object
Value                  17981 non-null object
Wage                   17981 non-null object
Preferred Positions    17981 non-null object
dtypes: int64(3), object(6)
memory usage: 1.2+ MB

Preprocess Data

Cambiamos los vaores de "Value" & "Wage", extrayendo los miles y millones, tienendo los valores numéricos. Los cuales son almacenados en nuevos features 'ValueNum' y 'WageNum'.

In [6]:
# Supporting function for converting string values into numbers
def str2number(amount):
    if amount[-1] == 'M':
        return float(amount[1:-1])*1000000
    elif amount[-1] == 'K':
        return float(amount[1:-1])*1000
    else:
        return float(amount[1:])
    
FIFA18['ValueNum'] = FIFA18['Value'].apply(lambda x: str2number(x))
FIFA18['WageNum'] = FIFA18['Wage'].apply(lambda x: str2number(x))

Sintetizamos el feature 'Preferred Positions' extrayendo únicamente el primer cáracter, omitiendo así la doble posición del jugador

In [7]:
FIFA18['Position'] = FIFA18['Preferred Positions'].str.split().str[0]

Seleccionamos el Grupo donde se encuentra nuestro Perú, Grupo C : ['Francia','Dinamarca','Perú','Australia'].

In [8]:
GroupC = FIFA18[FIFA18.Nationality.isin(['France','Denmark','Peru','Australia'])]

Extraemos las medias de los salarios por nacionalidad para el GrupoC

In [9]:
GroupC_Average = FIFA18.groupby('Nationality').agg({'ValueNum':np.mean,'WageNum':np.mean})
GroupC_Average = GroupC_Average.reindex(['France','Denmark','Peru','Australia'])
GroupC_Average
Out[9]:
ValueNum WageNum
Nationality
France 3.340557e+06 14279.141104
Denmark 1.596257e+06 7202.312139
Peru 2.178167e+06 6500.000000
Australia 7.339868e+05 4114.537445

Data Visualization

Age

Visualizamos las edades por un histograma.

In [10]:
plt.figure(figsize=(16,8))
sns.set_style("whitegrid")
plt.title('Grouping players by Age', fontsize=30, fontweight='bold', y=1.05,)
plt.xlabel('Number of players', fontsize=25)
plt.ylabel('Players Age', fontsize=25)
sns.countplot(x="Age", data=FIFA18, palette="hls");
plt.show()

Monto Acomulado (Miles de €)

In [13]:
plt.figure(figsize=(16,8))
sns.set_style("whitegrid")
plt.title('Grouping players by Overall', fontsize=30, fontweight='bold', y=1.05,)
plt.xlabel('Number of players', fontsize=25)
plt.ylabel('Players Age', fontsize=25)
sns.countplot(x="Overall", data=FIFA18, palette="hls");
plt.show()

Preferred Position

In [14]:
plt.figure(figsize=(16,8))
sns.set_style("whitegrid")
plt.title('Grouping players by Preferred Position', fontsize=30, fontweight='bold', y=1.05,)
plt.xlabel('Number of players', fontsize=25)
plt.ylabel('Players Age', fontsize=25)
sns.countplot(x="Position", data=FIFA18, palette="hls");
plt.show()

Nationality

In [15]:
FIFA18["Nationality"].value_counts().head(25)
Out[15]:
England                1630
Germany                1140
Spain                  1019
France                  978
Argentina               965
Brazil                  812
Italy                   799
Colombia                592
Japan                   469
Netherlands             429
Republic of Ireland     417
United States           381
Chile                   375
Sweden                  368
Portugal                367
Mexico                  360
Denmark                 346
Poland                  337
Norway                  333
Korea Republic          330
Saudi Arabia            329
Russia                  306
Scotland                300
Turkey                  291
Belgium                 272
Name: Nationality, dtype: int64

Podemos destacar que una gran cantidad se centraliza en Europa, especialmente en Inglaterra, Alemania, España y Francia

Value

Buscamos y listamos a los 20 jugadores que más cobran Netamente

In [16]:
sorted_players = FIFA18.sort_values(["ValueNum"], ascending=False).head(20)
sorted_players[["Name" ,"Age" ,"Nationality" ,"Club" ,"Position" ,"Value"]].reset_index(drop=True)
Out[16]:
Name Age Nationality Club Position Value
0 Neymar 25 Brazil Paris Saint-Germain LW €123M
1 L. Messi 30 Argentina FC Barcelona RW €105M
2 L. Suárez 30 Uruguay FC Barcelona ST €97M
3 Cristiano Ronaldo 32 Portugal Real Madrid CF ST €95.5M
4 R. Lewandowski 28 Poland FC Bayern Munich ST €92M
5 E. Hazard 26 Belgium Chelsea LW €90.5M
6 K. De Bruyne 26 Belgium Manchester City RM €83M
7 T. Kroos 27 Germany Real Madrid CF CDM €79M
8 P. Dybala 23 Argentina Juventus ST €79M
9 G. Higuaín 29 Argentina Juventus ST €77M
10 A. Griezmann 26 France Atlético Madrid LW €75M
11 Thiago 26 Spain FC Bayern Munich CDM €70.5M
12 G. Bale 27 Wales Real Madrid CF RW €69.5M
13 A. Sánchez 28 Chile Arsenal RM €67.5M
14 S. Agüero 29 Argentina Manchester City ST €66.5M
15 P. Pogba 24 France Manchester United CDM €66.5M
16 C. Eriksen 25 Denmark Tottenham Hotspur LM €65M
17 De Gea 26 Spain Manchester United GK €64.5M
18 M. Verratti 24 Italy Paris Saint-Germain CDM €64.5M
19 M. Neuer 31 Germany FC Bayern Munich GK €61M

Realizamos una vista de disperción de la valoración del jugador por sus edades e ingresos:

In [11]:
plt.figure(figsize=(16,8))
sns.set_style("whitegrid")
plt.title('Players Value according to their Age and Overall', fontsize=30, fontweight='bold', y=1.05,)
plt.xlabel('Age', fontsize=25)
plt.ylabel('Overall', fontsize=25)

age = FIFA18["Age"].values
overall = FIFA18["Overall"].values
value = FIFA18["ValueNum"].values

# Subdividimos los millones para tener una vista legible mediante el área de la ganancia
plt.scatter(age, overall, s = value/100000, edgecolors='black')
plt.show()

2.6 Wage

Listamos a los 20 jugadores con más recaudación.

In [18]:
sorted_players = FIFA18.sort_values(["WageNum"], ascending=False).head(20)
sorted_players[["Name" ,"Age" ,"Nationality" ,"Club" ,"Position" ,"Wage"]].reset_index(drop=True)
Out[18]:
Name Age Nationality Club Position Wage
0 Cristiano Ronaldo 32 Portugal Real Madrid CF ST €565K
1 L. Messi 30 Argentina FC Barcelona RW €565K
2 L. Suárez 30 Uruguay FC Barcelona ST €510K
3 G. Bale 27 Wales Real Madrid CF RW €370K
4 R. Lewandowski 28 Poland FC Bayern Munich ST €355K
5 L. Modrić 31 Croatia Real Madrid CF CDM €340K
6 T. Kroos 27 Germany Real Madrid CF CDM €340K
7 S. Agüero 29 Argentina Manchester City ST €325K
8 Sergio Ramos 31 Spain Real Madrid CF CB €310K
9 E. Hazard 26 Belgium Chelsea LW €295K
10 K. Benzema 29 France Real Madrid CF ST €295K
11 K. De Bruyne 26 Belgium Manchester City RM €285K
12 Neymar 25 Brazil Paris Saint-Germain LW €280K
13 I. Rakitić 29 Croatia FC Barcelona CM €275K
14 G. Higuaín 29 Argentina Juventus ST €275K
15 A. Sánchez 28 Chile Arsenal RM €265K
16 M. Özil 28 Germany Arsenal RW €265K
17 Iniesta 33 Spain FC Barcelona LM €260K
18 Marcelo 29 Brazil Real Madrid CF LB €250K
19 J. Rodríguez 25 Colombia FC Bayern Munich RM €250K

Realizamos una vista de disperción de ingresos del jugador por sus edades y Valoración:

In [19]:
plt.figure(figsize=(16,8))
sns.set_style("whitegrid")
plt.title('Players Wage according to their Age and Overall', fontsize=30, fontweight='bold', y=1.05,)
plt.xlabel('Age', fontsize=25)
plt.ylabel('Overall', fontsize=25)

age = FIFA18["Age"].values
overall = FIFA18["Overall"].values
value = FIFA18["WageNum"].values

plt.scatter(age, overall, s = value/500, edgecolors='black', color="red")
plt.show()