In October 2015 Five Thirty Eight published an anlysis by data journalist Walt Hickey presenting evidence that movie ticket and ratings site Fandango used a biased and dishonest ratings system. At the time, Fandango blamed a bug in their software, and promised to fix it. In this analysis, we're going to see whether they made the promised fixes, or whether there are still inconsistencies and bias.
Load libraries and configure notebook environment
# Load libraries
import ipywidgets as widgets
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display, Markdown
# Configure notebook environment
%matplotlib inline
%config InlineBackend.figure_format='retina'
#pd.options.display.max_rows = 200
#pd.options.display.max_columns = 30
fandango_score_comparison.csv
from Walt Hickey's GitHub
contains every film that has a Rotten Tomatoes rating, a RT User rating, a Metacritic score, a Metacritic User score, and IMDb score, and at least 30 fan reviews on Fandango. The data from Fandango was pulled on Aug. 24, 2015.
Column | Definition |
---|---|
FILM | The film in question |
RottenTomatoes | The Rotten Tomatoes Tomatometer score for the film |
RottenTomatoes_User | The Rotten Tomatoes user score for the film |
Metacritic | The Metacritic critic score for the film |
Metacritic_User | The Metacritic user score for the film |
IMDB | The IMDb user score for the film |
Fandango_Stars | The number of stars the film had on its Fandango movie page |
Fandango_Ratingvalue | The Fandango ratingValue for the film, as pulled from the HTML of each page. This is the actual average score the movie obtained. |
RT_norm | The Rotten Tomatoes Tomatometer score for the film , normalized to a 0 to 5 point system |
RT_user_norm | The Rotten Tomatoes user score for the film , normalized to a 0 to 5 point system |
Metacritic_norm | The Metacritic critic score for the film, normalized to a 0 to 5 point system |
Metacritic_user_nom | The Metacritic user score for the film, normalized to a 0 to 5 point system |
IMDB_norm | The IMDb user score for the film, normalized to a 0 to 5 point system |
RT_norm_round | The Rotten Tomatoes Tomatometer score for the film , normalized to a 0 to 5 point system and rounded to the nearest half-star |
RT_user_norm_round | The Rotten Tomatoes user score for the film , normalized to a 0 to 5 point system and rounded to the nearest half-star |
Metacritic_norm_round | The Metacritic critic score for the film, normalized to a 0 to 5 point system and rounded to the nearest half-star |
Metacritic_user_norm_round | The Metacritic user score for the film, normalized to a 0 to 5 point system and rounded to the nearest half-star |
IMDB_norm_round | The IMDb user score for the film, normalized to a 0 to 5 point system and rounded to the nearest half-star |
Metacritic_user_vote_count | The number of user votes the film had on Metacritic |
IMDB_user_vote_count | The number of user votes the film had on IMDb |
Fandango_votes | The number of user votes the film had on Fandango |
Fandango_Difference | The difference between the presented Fandango_Stars and the actual Fandango_Ratingvalue |
data_2015 = pd.read_csv('fandango_score_comparison.csv')
data_2015.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 146 entries, 0 to 145 Data columns (total 22 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 FILM 146 non-null object 1 RottenTomatoes 146 non-null int64 2 RottenTomatoes_User 146 non-null int64 3 Metacritic 146 non-null int64 4 Metacritic_User 146 non-null float64 5 IMDB 146 non-null float64 6 Fandango_Stars 146 non-null float64 7 Fandango_Ratingvalue 146 non-null float64 8 RT_norm 146 non-null float64 9 RT_user_norm 146 non-null float64 10 Metacritic_norm 146 non-null float64 11 Metacritic_user_nom 146 non-null float64 12 IMDB_norm 146 non-null float64 13 RT_norm_round 146 non-null float64 14 RT_user_norm_round 146 non-null float64 15 Metacritic_norm_round 146 non-null float64 16 Metacritic_user_norm_round 146 non-null float64 17 IMDB_norm_round 146 non-null float64 18 Metacritic_user_vote_count 146 non-null int64 19 IMDB_user_vote_count 146 non-null int64 20 Fandango_votes 146 non-null int64 21 Fandango_Difference 146 non-null float64 dtypes: float64(15), int64(6), object(1) memory usage: 25.2+ KB
data_2015.head()
FILM | RottenTomatoes | RottenTomatoes_User | Metacritic | Metacritic_User | IMDB | Fandango_Stars | Fandango_Ratingvalue | RT_norm | RT_user_norm | ... | IMDB_norm | RT_norm_round | RT_user_norm_round | Metacritic_norm_round | Metacritic_user_norm_round | IMDB_norm_round | Metacritic_user_vote_count | IMDB_user_vote_count | Fandango_votes | Fandango_Difference | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Avengers: Age of Ultron (2015) | 74 | 86 | 66 | 7.1 | 7.8 | 5.0 | 4.5 | 3.70 | 4.3 | ... | 3.90 | 3.5 | 4.5 | 3.5 | 3.5 | 4.0 | 1330 | 271107 | 14846 | 0.5 |
1 | Cinderella (2015) | 85 | 80 | 67 | 7.5 | 7.1 | 5.0 | 4.5 | 4.25 | 4.0 | ... | 3.55 | 4.5 | 4.0 | 3.5 | 4.0 | 3.5 | 249 | 65709 | 12640 | 0.5 |
2 | Ant-Man (2015) | 80 | 90 | 64 | 8.1 | 7.8 | 5.0 | 4.5 | 4.00 | 4.5 | ... | 3.90 | 4.0 | 4.5 | 3.0 | 4.0 | 4.0 | 627 | 103660 | 12055 | 0.5 |
3 | Do You Believe? (2015) | 18 | 84 | 22 | 4.7 | 5.4 | 5.0 | 4.5 | 0.90 | 4.2 | ... | 2.70 | 1.0 | 4.0 | 1.0 | 2.5 | 2.5 | 31 | 3136 | 1793 | 0.5 |
4 | Hot Tub Time Machine 2 (2015) | 14 | 28 | 29 | 3.4 | 5.1 | 3.5 | 3.0 | 0.70 | 1.4 | ... | 2.55 | 0.5 | 1.5 | 1.5 | 1.5 | 2.5 | 88 | 19560 | 1021 | 0.5 |
5 rows × 22 columns
This data is in good shape. There are no missing values and column types for numeric columns appear to have been identified correctly.
We should separate the title and release year before we move on.
data_2015[['title','year']] = data_2015['FILM'].str.extract(r"(.*?)\((\d*)\)", expand=True)
data_2015.year = data_2015.year.astype(int)
# reorganize and clean up columns
cols = ['title', 'year']
cols.extend(data_2015.columns.values[1:-2])
data_2015 = data_2015[cols]
data_2015[['title', 'year']].info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 146 entries, 0 to 145 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 title 146 non-null object 1 year 146 non-null int64 dtypes: int64(1), object(1) memory usage: 2.4+ KB
It appears that all rows of the FILM
column parsed into a title
and a year
, so we've got that going for us.
movie_ratings_16_17.csv
contains data from 214 of the most popular movies in 2016 and 2017. (Source)
contains movie ratings data for 214 of the most popular movies (with a significant number of votes) released in 2016 and 2017. As of March 22, 2017, the ratings were up to date.
Column | Description |
---|---|
movie | the name of the movie |
year | the release year of the movie |
metascore | the Metacritic rating of the movie (the "metascore" - critic score) |
imdb | the IMDB rating of the movie (user score) |
tmeter | the Rotten Tomatoes rating of the movie (the "tomatometer" - critic score) |
audience | the Rotten Tomatoes rating of the movie (user score) |
fandango | the Fandango rating of the movie (user score) |
n_metascore | the metascore normalized to a 0-5 scale |
n_imdb | the IMDB rating normalized to a 0-5 scale |
n_tmeter | the tomatometer normalized to a 0-5 scale |
n_audience | the Rotten Tomatoes user score normalized to a 0-5 scale |
nr_metascore | the metascore normalized to a 0-5 scale and rounded to the nearest 0.5 |
nr_imdb | the IMDB rating normalized to a 0-5 scale and rounded to the nearest 0.5 |
nr_tmeter | the tomatometer normalized to a 0-5 scale and rounded to the nearest 0.5 |
nr_audience | the Rotten Tomatoes user score normalized to a 0-5 scale and rounded to the nearest 0.5 |
data_2016 = pd.read_csv('movie_ratings_16_17.csv')
data_2016.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 214 entries, 0 to 213 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 movie 214 non-null object 1 year 214 non-null int64 2 metascore 214 non-null int64 3 imdb 214 non-null float64 4 tmeter 214 non-null int64 5 audience 214 non-null int64 6 fandango 214 non-null float64 7 n_metascore 214 non-null float64 8 n_imdb 214 non-null float64 9 n_tmeter 214 non-null float64 10 n_audience 214 non-null float64 11 nr_metascore 214 non-null float64 12 nr_imdb 214 non-null float64 13 nr_tmeter 214 non-null float64 14 nr_audience 214 non-null float64 dtypes: float64(10), int64(4), object(1) memory usage: 25.2+ KB
data_2016.head()
movie | year | metascore | imdb | tmeter | audience | fandango | n_metascore | n_imdb | n_tmeter | n_audience | nr_metascore | nr_imdb | nr_tmeter | nr_audience | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10 Cloverfield Lane | 2016 | 76 | 7.2 | 90 | 79 | 3.5 | 3.80 | 3.60 | 4.50 | 3.95 | 4.0 | 3.5 | 4.5 | 4.0 |
1 | 13 Hours | 2016 | 48 | 7.3 | 50 | 83 | 4.5 | 2.40 | 3.65 | 2.50 | 4.15 | 2.5 | 3.5 | 2.5 | 4.0 |
2 | A Cure for Wellness | 2016 | 47 | 6.6 | 40 | 47 | 3.0 | 2.35 | 3.30 | 2.00 | 2.35 | 2.5 | 3.5 | 2.0 | 2.5 |
3 | A Dog's Purpose | 2017 | 43 | 5.2 | 33 | 76 | 4.5 | 2.15 | 2.60 | 1.65 | 3.80 | 2.0 | 2.5 | 1.5 | 4.0 |
4 | A Hologram for the King | 2016 | 58 | 6.1 | 70 | 57 | 3.0 | 2.90 | 3.05 | 3.50 | 2.85 | 3.0 | 3.0 | 3.5 | 3.0 |
This dataset also appears to be clean, with no missing values.
#let's improve consistency in column names between the two datasets
data_2016 = data_2016.rename({'movie':'title'}, axis=1)
These two datasets appear to be selected with similar but not identical criteria. Both seem to be biased towards movies with more user ratings. The Fivethirtyeight dataset required movies to have at least 30 or more Fandango user reviews. The second dataset selected the movies with the largest number of ratings. The samples are likely to be biased, and in different ways from each other.
Collecting more data is outside of the scope of this project, so, we're going to press on with the data we have and see whether it casts any light on whether Fandango changed their ratings system after Hickey's analysis.
Before continuing, let's explore the idea that both datasets contain movies with over 30 fan ratings on Fandango.
Fortunately, the first dataset recorded this information.
data_2015.Fandango_votes.describe()
count 146.000000 mean 3848.787671 std 6357.778617 min 35.000000 25% 222.250000 50% 1446.000000 75% 4439.500000 max 34846.000000 Name: Fandango_votes, dtype: float64
We see that the movies do indeed have at least 30 user ratings. There is quite a range, though, one has almost 35K ratings.
Based on the stated criteria, I'd expect the second dataset will be skewed in favor of movies with larger number of ratings. We could check this by picking a random sample of movies and determining how many ratings each has.
Unfortunately we've hit a dead end. It appears that Fandango no longer shows ratings from its users. Instead it sources them from Rotten Tomatoes. Ideally we'd validate that our understanding of the data collected in 2016 is correct, but since we can't, we'll press on under our unverified assumptions.
data_2016.year.value_counts()
2016 191 2017 23 Name: year, dtype: int64
data_2015.year.value_counts()
2015 129 2014 17 Name: year, dtype: int64
Let's clean up our data a little and focus only on movies released in either 2015 or 2016.
data_2015 = data_2015[data_2015.year == 2015]
data_2016 = data_2016[data_2016.year == 2016]
In his original analysis, Hickey found that while Fandango presented averaged ratings in increments of 1/2 star, they appeared to round aggregated ratings up by 1/2 a star. He also noted that Fandango's ratings seemed to be very skewed, with no movies receiving less than three stars.
If Fandango made changes to improve the truthfullness of their ratings, we would expect the distribution of 2016 ragings to shift downwards by approximately 1/2 star.
plt.style.use('fivethirtyeight')
data_2015.Fandango_Stars.plot.kde(label='2015', legend =True, figsize=(8,8))
data_2016.fandango.plot.kde(label='2016', legend=True)
plt.title('Comparing Distribution of Fandango Star Ratings between 2015 and 2016', y = 1.03)
plt.xticks([0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0])
plt.xlim(0,5)
plt.xlabel('Stars')
plt.show()
When we visualize the distrubtions for both years we find that both the peak and leftmost value in the distribution for 2016 is indeed 1/2 star lower than they were in 2015.
We also see that while both distributions are left skewed, the distribution for 2016's ratings seems to be more symmetrical than that for 2015's ratings
Assuming that the audience expectations and movie quality remained the same between 2015 and 2016, this suggests that Fandango was manipulating ratings in 2015 and changed their practices in 2016, after the Fivethirtyeight expose was published.
freq_table = pd.DataFrame(data_2015.Fandango_Stars.value_counts(normalize=True))
freq_table = freq_table.join(data_2016.fandango.value_counts(normalize=True), how='outer')
freq_table.columns = ['2015', '2016']
freq_table.apply(lambda x: round(100*x,1))
2015 | 2016 | |
---|---|---|
2.5 | NaN | 3.1 |
3.0 | 8.5 | 7.3 |
3.5 | 17.8 | 24.1 |
4.0 | 28.7 | 40.3 |
4.5 | 38.0 | 24.6 |
5.0 | 7.0 | 0.5 |
Comparing Fandango's 2015 & 2016 movie ratings side-by-side in a frequency table supports the observations we made based on the density plots shown above.
The distribution of Fandango's 2016 movie ratings shifted 1/2 star lower compared to 2015's rathings. Moreover, 2016s ratings are more symmetrical around their peak.
stats_2015 = [data_2015.Fandango_Stars.mean(),
data_2015.Fandango_Stars.median(),
data_2015.Fandango_Stars.mode()[0]]
stats_2016 = [data_2016.fandango.mean(),
data_2016.fandango.median(),
data_2016.fandango.mode()[0]]
stats = pd.DataFrame()
stats[2015] = stats_2015
stats[2016] = stats_2016
stats.index = ['mean', 'median', 'mode']
stats
2015 | 2016 | |
---|---|---|
mean | 4.085271 | 3.887435 |
median | 4.000000 | 4.000000 |
mode | 4.500000 | 4.000000 |
In 2015 mean, median and mode differed over a range of 0.5 stars. In 2016 the values of these sample statistics were much more closely spaced, differing by just 0.12 stars. This indicates that 2106's distribution was more "normal."
stats.plot.bar(rot=0, figsize=(8,6))
plt.ylabel("Stars")
plt.ylim(0,5.5)
plt.yticks([x*0.5 for x in range(10)])
plt.legend(loc='upper center')
<matplotlib.legend.Legend at 0x1246915b0>
site_labels = ['Fandango', 'IMDB', 'Rotten Tomatoes', 'RT Audience', 'Metacritic', 'Metacritic User']
cols_2015=['Fandango_Stars', 'IMDB_norm_round', 'RT_norm_round', 'RT_user_norm_round',
'Metacritic_norm_round', 'Metacritic_user_norm_round' ]
data_2015[cols_2015].plot.kde(figsize=(8,8))
plt.title('Comparing Distribution of Star Ratings for Movie Review Sites (2015)', y = 1.03)
plt.legend(site_labels)
plt.xticks([0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0])
plt.xlim(0,5)
plt.xlabel('Stars')
plt.show()
Comparing the distribution of Fandango's star ratings for our sample of 2015's movies with various ratings from on other sites shows some interesting features:
cols_2016 = ['fandango', 'nr_imdb', 'nr_tmeter', 'nr_audience', 'nr_metascore' ]
data_2016[cols_2016].plot.kde(figsize=(8,8))
plt.title('Comparing Distribution of Star Ratings for Movie Review Sites (2016)', y = 1.03)
plt.legend(site_labels[:-1])
plt.xticks([0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0])
plt.xlim(0,5)
plt.xlabel('Stars')
plt.show()
Comparing the distribution of Fandango's star ratings for our sample of 2016's movies with various ratings from on other sites shows that Fandangos ratings are still higher than those on other sites, but their ratings distribution has become more symmetrical.
column_pairs = zip(
cols_2015,
cols_2016,
site_labels
)
#seems to help ensure that graph is shwon on all iterations
plt.show()
for a, b, rating in list(column_pairs)[1:]:
data_2015[a].plot.kde(label='2015', legend=True, figsize=(6,6))
data_2016[b].plot.kde(label='2016', legend=True, figsize=(6,6))
plt.title('Comparing {} ratings distributions 2015 to 2016'.format(rating), y = 1.03)
plt.xticks([0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0])
plt.xlim(0,5)
plt.xlabel('Stars')
plt.show();
freq_table = pd.DataFrame(data_2015[a].value_counts(normalize=True))
freq_table = freq_table.join(data_2016[b].value_counts(normalize=True), how='outer')
freq_table.columns = ['2015', '2016']
freq_table = freq_table.apply(lambda x: round(100*x,1))
stats_2015 = [data_2015[a].mean(),
data_2015[a].median(),
data_2015[a].mode()[0]]
stats_2016 = [data_2016[b].mean(),
data_2016[b].median(),
data_2016[b].mode()[0]]
sample_stats = pd.DataFrame()
sample_stats[2015] = stats_2015
sample_stats[2016] = stats_2016
sample_stats.index = ['mean', 'median', 'mode']
widget_1 = widgets.Output()
widget_2 = widgets.Output()
with widget_1:
display(freq_table.style.set_table_attributes("style='display:inline'"))
with widget_2:
display(sample_stats.style.set_table_attributes("style='display:inline'"))
hbox = widgets.HBox([widget_1, widget_2])
display(hbox)
HBox(children=(Output(), Output()))
HBox(children=(Output(), Output()))
HBox(children=(Output(), Output()))
HBox(children=(Output(), Output()))
If we compare distrbutions of 2015 and 2016 ratings for each site a few things stand out.
It's hard to say what underlies these year to year differences. However, given the population and sample sizes we are dealing with, differences in critical and audience receptions of just a handfull of films year to year could explain the differences.
The fact that all sites gave lower ratings, on average, to movies in 2016 vs 2015 casts our earlier findings about the differences in Fandango's ratings between the two years in a new light. If Fandango's ratings are lower in 2016 than 2015 for the same reasons other site's ratings are lower, then Fandango may not have corrected their ratings bias after all.
Fandango's 2016 reviews, in aggregate, are roughly 1/2 star lower than their 2015 reviews. This is as one would would expect if Fandango addressed the issues in the Fivethirtyeight expose, which found that Fandango seemed to be inflating film's average reviews by approximately 1/2 star. Or so it would seem if one only looked in the differences in distribution of Fandango's ratings in 2015 vs 2016.
If we look at the differences in the distributions or ratings on other film review sites we see that their aggregated reviews are also roughly 1/2 star lower in 2016 than they were in 2015. This suggests that 2016's films were not as well received as 2015's films, in general, and therefore that the change in Fandango's ratings was due to outside factors, rather than reforms in Fandango's ratings system.