Fantasy Football¶

So - this is my first year participating in a fantasy football league. I enjoy football, but I typically only keep up with a few teams, so drafting an actual team was a bit daunting. So, like most things, I relied on data to help me out. I spent some time researching strategy, looking at projections, and even simulating some drafts. Draft day came and I felt pretty good about my team...but now I am currently 0-3 for the season. Ha.

I started to get a bit curious about why I was doing so poorly. Typically my "projected" points were pretty good each week, but my team just never seemed to deliver.

Thus, here is my first post looking at the biggest out performers and the biggest misses relative to ESPN projections. All data are from ESPN. So - lets get to it!

In [1]:

import psycopg2
import pandas.io.sql as psql
import pandas as pd
from matplotlib import pyplot as plt
from __future__ import division #now division always returns a floating point number
import numpy as np
import seaborn as sns
%matplotlib inline

In [12]:

db = psycopg2.connect("dbname='fantasyfootball' host='localhost'")

def get_combined_df():
    actual_points = psql.read_sql("""
    select name, team, position, sum(total_points) as points_actual 
    from scoring_leaders_weekly
    group by name, team, position;""", con=db)
    predicted_points = psql.read_sql("""
        select name, team, position, sum(total_points) as points_predicted
        from next_week_projections
        group by name, team, position;""", con=db)
    combined_df = actual_points.merge(predicted_points, 
                                      on=['name', 'team', 'position'], how='left')
    combined_df = combined_df.dropna()
    combined_df = combined_df[combined_df['points_predicted'] > 0]
    combined_df['points_diff'] = combined_df.points_actual - combined_df.points_predicted
    combined_df['points_diff_pct'] = (combined_df.points_actual - combined_df.points_predicted) / combined_df.points_predicted
    return combined_df

def get_top_bottom(df):
    group = df.groupby('position')
    top_list = []
    bottom_list = []
    for name, data in group:
        top = data.sort('points_diff', ascending=False)
        top_list.append(top.head())
        tail = top.tail()
        tail = tail.sort('points_diff')
        bottom_list.append(tail)
    top_df = pd.concat(top_list)
    bottom_df = pd.concat(bottom_list)
    return top_df, bottom_df

def run_analysis():
    combined_df = get_combined_df()
    top, bottom = get_top_bottom(combined_df)
    return combined_df, top, bottom

RESULTS¶

For the results, I decided to show the top 5 out performers and the top 5 under performers for the cumulative season based on the absolute point difference (not the percentage). First, here are the out performers. This is pretty interesting. Travis Benjamin, for example, has in the first 3 weeks produced an extra 33.6 fantasy points relative to expectations. Not too bad.

In [13]:

combined_df, top_1, bottom_1 = run_analysis()

In [14]:

top_1

Out[14]:

	name	team	position	points_actual	points_predicted	points_diff	points_diff_pct
384	Broncos	Broncos	D	55	36.5	18.5	0.506849
281	Cardinals	Cardinals	D	48	31.4	16.6	0.528662
490	Cowboys	Cowboys	D	23	15.9	7.1	0.446541
248	Titans	Titans	D	23	16.1	6.9	0.428571
25	Jets	Jets	D	34	30.7	3.3	0.107492
255	Stephen Gostkowski	NE	K	40	26.2	13.8	0.526718
309	Josh Brown	NYG	K	39	32.4	6.6	0.203704
247	Brandon McManus	Den	K	34	30.0	4.0	0.133333
307	Steven Hauschka	Sea	K	34	30.7	3.3	0.107492
496	Mason Crosby	GB	K	32	32.1	-0.1	-0.003115
375	Tom Brady	NE	QB	77	52.2	24.8	0.475096
204	Andy Dalton	Cin	QB	69	51.5	17.5	0.339806
13	Marcus Mariota	Ten	QB	57	45.4	11.6	0.255507
488	Tyrod Taylor	Buf	QB	64	53.1	10.9	0.205273
107	Ryan Mallett	Hou	QB	37	28.6	8.4	0.293706
265	Karlos Williams	Buf	RB	37	16.9	20.1	1.189349
295	Marcel Reece	Oak	RB	20	3.3	16.7	5.060606
457	Dion Lewis	NE	RB	40	24.8	15.2	0.612903
99	DeAngelo Williams	Pit	RB	38	24.2	13.8	0.570248
134	Devonta Freeman	Atl	RB	51	37.2	13.8	0.370968
171	Rob Gronkowski	NE	TE	54	34.0	20.0	0.588235
241	Anthony Fasano	Ten	TE	17	0.3	16.7	55.666667
219	Austin Seferian-Jenkins	TB	TE	23	9.0	14.0	1.555556
319	Gary Barnidge	Cle	TE	20	6.7	13.3	1.985075
272	Travis Kelce	KC	TE	37	27.9	9.1	0.326165
258	Travis Benjamin	Cle	WR	51	17.4	33.6	1.931034
414	Larry Fitzgerald	Ari	WR	62	34.1	27.9	0.818182
174	Rishard Matthews	Mia	WR	43	15.9	27.1	1.704403
250	James Jones	GB	WR	44	26.2	17.8	0.679389
394	Julian Edelman	NE	WR	39	23.8	15.2	0.638655

Now here are the under performers. These are the people you don't want to be playing...For example, C.J. Anderson comes in at a whopping 42 points under expectation. Who is my running back you ask... Drew Brees takes the cake, though, under performing by 46.2 points on the season. He was injured, though.

In [17]:

bottom_1

Out[17]:

	name	team	position	points_actual	points_predicted	points_diff	points_diff_pct
45	Colts	Colts	D	12	34.4	-22.4	-0.651163
223	Dolphins	Dolphins	D	14	36.3	-22.3	-0.614325
213	Texans	Texans	D	12	32.1	-20.1	-0.626168
470	Lions	Lions	D	14	27.4	-13.4	-0.489051
141	Chargers	Chargers	D	10	23.1	-13.1	-0.567100
366	Adam Vinatieri	Ind	K	5	31.5	-26.5	-0.841270
24	Matt Prater	Det	K	8	30.9	-22.9	-0.741100
461	Phil Dawson	SF	K	13	35.8	-22.8	-0.636872
399	Andrew Franks	Mia	K	14	33.6	-19.6	-0.583333
193	Josh Scobee	Pit	K	16	32.1	-16.1	-0.501558
21	Drew Brees	NO	QB	28	74.2	-46.2	-0.622642
251	Andrew Luck	Ind	QB	41	69.5	-28.5	-0.410072
426	Sam Bradford	Phi	QB	27	55.5	-28.5	-0.513514
477	Teddy Bridgewater	Min	QB	28	55.5	-27.5	-0.495495
428	Peyton Manning	Den	QB	43	67.7	-24.7	-0.364845
20	C.J. Anderson	Den	RB	6	48.0	-42.0	-0.875000
276	Marshawn Lynch	Sea	RB	19	55.0	-36.0	-0.654545
184	DeMarco Murray	Phi	RB	18	50.3	-32.3	-0.642147
311	Jeremy Hill	Cin	RB	20	50.0	-30.0	-0.600000
416	Lamar Miller	Mia	RB	15	42.8	-27.8	-0.649533
16	Zach Ertz	Phi	TE	8	22.0	-14.0	-0.636364
386	Benjamin Watson	NO	TE	4	15.4	-11.4	-0.740260
226	Mychal Rivera	Oak	TE	1	12.0	-11.0	-0.916667
136	Jeff Cumberland	NYJ	TE	1	10.8	-9.8	-0.907407
227	Martellus Bennett	Chi	TE	16	25.0	-9.0	-0.360000
296	Alshon Jeffery	Chi	WR	7	37.4	-30.4	-0.812834
123	Calvin Johnson	Det	WR	24	48.7	-24.7	-0.507187
32	Andre Johnson	Ind	WR	4	27.9	-23.9	-0.856631
87	Demaryius Thomas	Den	WR	30	52.5	-22.5	-0.428571
105	Mike Evans	TB	WR	10	32.4	-22.4	-0.691358

Next, I wanted to take a look at the distribution of point differences by position. The below chart shows that the median player in all positions is under performing, except for TE which is pretty close to zero. There are a few break out WRs and quite a bunch of under performing running backs. The spread is also pretty wide for most of the positions.

In [21]:

ax = sns.boxplot(combined_df.points_diff, groupby=combined_df.position)
plt.title("Distribution of Point Differences by Position")
sns.despine()

I also looked at the distribution of actual points by position. One thing you hear in fantasy is to select RBs early because they are high variance players. Meaning that you suffer more by getting a lower ranked RB than a lower ranked QB. This is also due to the fact that a lot more RBs are getting drafted than QBs. Below is the distribution for all players and provides a general sense.

In [23]:

ax = sns.boxplot(combined_df.points_actual, groupby=combined_df.position)
plt.title("Distribution of Actual Points by Position")
sns.despine()

To see if the high variance difference is playing out, we can look at the top 12 quarterbacks and top 36 running backs so far in the season (assume 12 team league with 1 starting QB and 3 starting RBs). You can see below that indeed the RBs standard deviation is quite a bit higher (about 7 points) than the QBs.

In [34]:

combined_df[combined_df['position'] == "QB"].sort('points_actual', ascending=False).head(n=12).describe()

Out[34]:

	points_actual	points_predicted	points_diff	points_diff_pct
count	12.000000	12.000000	12.000000	12.000000
mean	60.500000	58.425000	2.075000	0.052784
std	10.706837	8.424761	12.753048	0.227619
min	48.000000	45.400000	-17.800000	-0.258721
25%	50.500000	51.475000	-5.175000	-0.089687
50%	60.500000	57.500000	0.400000	0.006114
75%	67.500000	65.750000	11.075000	0.217831
max	77.000000	69.900000	24.800000	0.475096

In [35]:

combined_df[combined_df['position'] == "QB"].sort('points_actual', ascending=False).head(n=36).describe()

Out[35]:

	points_actual	points_predicted	points_diff	points_diff_pct
count	36.000000	36.000000	3.600000e+01	36.000000
mean	41.083333	50.894444	-9.811111e+00	-0.186386
std	17.762521	15.940549	1.436052e+01	0.272579
min	9.000000	14.600000	-4.620000e+01	-0.625000
25%	28.000000	44.325000	-1.732500e+01	-0.366491
50%	41.500000	54.750000	-1.295000e+01	-0.259642
75%	49.500000	62.650000	-1.776357e-15	-0.002554
max	77.000000	74.200000	2.480000e+01	0.475096

In conclusion¶

These were just some quick analyses I did to try and get a sense of which players are doing well/poorly and how various positions are performing.

If people find this interesting, I can try and update the data as the season goes on.

I am hoping to find time to investigate the ESPN projections to see how sensical they really are. Based on the chart above, they seem to aim high, leading to many under performers. I would like to try and build my own projection model to see how well I can compare. Now that I think about it, here are the overall summary statistics below. It looks like on average ESPN is over projecting by about 4 points with a standard deviation of 10.5 points.

In [38]:

combined_df.describe()

Out[38]:

	points_actual	points_predicted	points_diff	points_diff_pct
count	408.000000	408.000000	408.000000	408.000000
mean	17.424020	21.504167	-4.080147	1.497506
std	15.399643	16.295564	10.514605	10.662482
min	0.000000	0.100000	-46.200000	-1.000000
25%	5.000000	7.875000	-10.125000	-0.500389
50%	14.000000	19.550000	-3.550000	-0.250522
75%	25.000000	30.725000	1.500000	0.180199
max	77.000000	74.200000	33.600000	119.000000