Are Fandango Star Ratings Still 'Cooked'?¶

In October 2015 Five Thirty Eight published an anlysis by data journalist Walt Hickey presenting evidence that movie ticket and ratings site Fandango used a biased and dishonest ratings system. At the time, Fandango blamed a bug in their software, and promised to fix it. In this analysis, we're going to see whether they made the promised fixes, or whether there are still inconsistencies and bias.

Preparation¶

Load libraries and configure notebook environment

In [1]:

# Load libraries
import ipywidgets as widgets
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display, Markdown

# Configure notebook environment
%matplotlib inline
%config InlineBackend.figure_format='retina'
#pd.options.display.max_rows = 200
#pd.options.display.max_columns = 30

Prepare 2015 Data¶

fandango_score_comparison.csv from Walt Hickey's GitHub

contains every film that has a Rotten Tomatoes rating, a RT User rating, a Metacritic score, a Metacritic User score, and IMDb score, and at least 30 fan reviews on Fandango. The data from Fandango was pulled on Aug. 24, 2015.

Column	Definition
FILM	The film in question
RottenTomatoes	The Rotten Tomatoes Tomatometer score for the film
RottenTomatoes_User	The Rotten Tomatoes user score for the film
Metacritic	The Metacritic critic score for the film
Metacritic_User	The Metacritic user score for the film
IMDB	The IMDb user score for the film
Fandango_Stars	The number of stars the film had on its Fandango movie page
Fandango_Ratingvalue	The Fandango ratingValue for the film, as pulled from the HTML of each page. This is the actual average score the movie obtained.
RT_norm	The Rotten Tomatoes Tomatometer score for the film , normalized to a 0 to 5 point system
RT_user_norm	The Rotten Tomatoes user score for the film , normalized to a 0 to 5 point system
Metacritic_norm	The Metacritic critic score for the film, normalized to a 0 to 5 point system
Metacritic_user_nom	The Metacritic user score for the film, normalized to a 0 to 5 point system
IMDB_norm	The IMDb user score for the film, normalized to a 0 to 5 point system
RT_norm_round	The Rotten Tomatoes Tomatometer score for the film , normalized to a 0 to 5 point system and rounded to the nearest half-star
RT_user_norm_round	The Rotten Tomatoes user score for the film , normalized to a 0 to 5 point system and rounded to the nearest half-star
Metacritic_norm_round	The Metacritic critic score for the film, normalized to a 0 to 5 point system and rounded to the nearest half-star
Metacritic_user_norm_round	The Metacritic user score for the film, normalized to a 0 to 5 point system and rounded to the nearest half-star
IMDB_norm_round	The IMDb user score for the film, normalized to a 0 to 5 point system and rounded to the nearest half-star
Metacritic_user_vote_count	The number of user votes the film had on Metacritic
IMDB_user_vote_count	The number of user votes the film had on IMDb
Fandango_votes	The number of user votes the film had on Fandango
Fandango_Difference	The difference between the presented Fandango_Stars and the actual Fandango_Ratingvalue

In [2]:

data_2015 = pd.read_csv('fandango_score_comparison.csv')
data_2015.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 146 entries, 0 to 145
Data columns (total 22 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   FILM                        146 non-null    object 
 1   RottenTomatoes              146 non-null    int64  
 2   RottenTomatoes_User         146 non-null    int64  
 3   Metacritic                  146 non-null    int64  
 4   Metacritic_User             146 non-null    float64
 5   IMDB                        146 non-null    float64
 6   Fandango_Stars              146 non-null    float64
 7   Fandango_Ratingvalue        146 non-null    float64
 8   RT_norm                     146 non-null    float64
 9   RT_user_norm                146 non-null    float64
 10  Metacritic_norm             146 non-null    float64
 11  Metacritic_user_nom         146 non-null    float64
 12  IMDB_norm                   146 non-null    float64
 13  RT_norm_round               146 non-null    float64
 14  RT_user_norm_round          146 non-null    float64
 15  Metacritic_norm_round       146 non-null    float64
 16  Metacritic_user_norm_round  146 non-null    float64
 17  IMDB_norm_round             146 non-null    float64
 18  Metacritic_user_vote_count  146 non-null    int64  
 19  IMDB_user_vote_count        146 non-null    int64  
 20  Fandango_votes              146 non-null    int64  
 21  Fandango_Difference         146 non-null    float64
dtypes: float64(15), int64(6), object(1)
memory usage: 25.2+ KB

In [3]:

data_2015.head()

Out[3]:

	FILM	RottenTomatoes	RottenTomatoes_User	Metacritic	Metacritic_User	IMDB	Fandango_Stars	Fandango_Ratingvalue	RT_norm	RT_user_norm	...	IMDB_norm	RT_norm_round	RT_user_norm_round	Metacritic_norm_round	Metacritic_user_norm_round	IMDB_norm_round	Metacritic_user_vote_count	IMDB_user_vote_count	Fandango_votes	Fandango_Difference
0	Avengers: Age of Ultron (2015)	74	86	66	7.1	7.8	5.0	4.5	3.70	4.3	...	3.90	3.5	4.5	3.5	3.5	4.0	1330	271107	14846	0.5
1	Cinderella (2015)	85	80	67	7.5	7.1	5.0	4.5	4.25	4.0	...	3.55	4.5	4.0	3.5	4.0	3.5	249	65709	12640	0.5
2	Ant-Man (2015)	80	90	64	8.1	7.8	5.0	4.5	4.00	4.5	...	3.90	4.0	4.5	3.0	4.0	4.0	627	103660	12055	0.5
3	Do You Believe? (2015)	18	84	22	4.7	5.4	5.0	4.5	0.90	4.2	...	2.70	1.0	4.0	1.0	2.5	2.5	31	3136	1793	0.5
4	Hot Tub Time Machine 2 (2015)	14	28	29	3.4	5.1	3.5	3.0	0.70	1.4	...	2.55	0.5	1.5	1.5	1.5	2.5	88	19560	1021	0.5

5 rows × 22 columns

This data is in good shape. There are no missing values and column types for numeric columns appear to have been identified correctly.

We should separate the title and release year before we move on.

In [4]:

data_2015[['title','year']] = data_2015['FILM'].str.extract(r"(.*?)\((\d*)\)", expand=True)
data_2015.year = data_2015.year.astype(int)

In [5]:

# reorganize and clean up columns
cols = ['title', 'year']
cols.extend(data_2015.columns.values[1:-2])

data_2015 = data_2015[cols]
data_2015[['title', 'year']].info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 146 entries, 0 to 145
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   title   146 non-null    object
 1   year    146 non-null    int64 
dtypes: int64(1), object(1)
memory usage: 2.4+ KB

It appears that all rows of the FILM column parsed into a title and a year, so we've got that going for us.

Prepare 2016 Data¶

movie_ratings_16_17.csv contains data from 214 of the most popular movies in 2016 and 2017. (Source)

contains movie ratings data for 214 of the most popular movies (with a significant number of votes) released in 2016 and 2017. As of March 22, 2017, the ratings were up to date.

Column	Description
movie	the name of the movie
year	the release year of the movie
metascore	the Metacritic rating of the movie (the "metascore" - critic score)
imdb	the IMDB rating of the movie (user score)
tmeter	the Rotten Tomatoes rating of the movie (the "tomatometer" - critic score)
audience	the Rotten Tomatoes rating of the movie (user score)
fandango	the Fandango rating of the movie (user score)
n_metascore	the metascore normalized to a 0-5 scale
n_imdb	the IMDB rating normalized to a 0-5 scale
n_tmeter	the tomatometer normalized to a 0-5 scale
n_audience	the Rotten Tomatoes user score normalized to a 0-5 scale
nr_metascore	the metascore normalized to a 0-5 scale and rounded to the nearest 0.5
nr_imdb	the IMDB rating normalized to a 0-5 scale and rounded to the nearest 0.5
nr_tmeter	the tomatometer normalized to a 0-5 scale and rounded to the nearest 0.5
nr_audience	the Rotten Tomatoes user score normalized to a 0-5 scale and rounded to the nearest 0.5

In [6]:

data_2016 = pd.read_csv('movie_ratings_16_17.csv')
data_2016.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 214 entries, 0 to 213
Data columns (total 15 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   movie         214 non-null    object 
 1   year          214 non-null    int64  
 2   metascore     214 non-null    int64  
 3   imdb          214 non-null    float64
 4   tmeter        214 non-null    int64  
 5   audience      214 non-null    int64  
 6   fandango      214 non-null    float64
 7   n_metascore   214 non-null    float64
 8   n_imdb        214 non-null    float64
 9   n_tmeter      214 non-null    float64
 10  n_audience    214 non-null    float64
 11  nr_metascore  214 non-null    float64
 12  nr_imdb       214 non-null    float64
 13  nr_tmeter     214 non-null    float64
 14  nr_audience   214 non-null    float64
dtypes: float64(10), int64(4), object(1)
memory usage: 25.2+ KB

In [7]:

data_2016.head()

Out[7]:

	movie	year	metascore	imdb	tmeter	audience	fandango	n_metascore	n_imdb	n_tmeter	n_audience	nr_metascore	nr_imdb	nr_tmeter	nr_audience
0	10 Cloverfield Lane	2016	76	7.2	90	79	3.5	3.80	3.60	4.50	3.95	4.0	3.5	4.5	4.0
1	13 Hours	2016	48	7.3	50	83	4.5	2.40	3.65	2.50	4.15	2.5	3.5	2.5	4.0
2	A Cure for Wellness	2016	47	6.6	40	47	3.0	2.35	3.30	2.00	2.35	2.5	3.5	2.0	2.5
3	A Dog's Purpose	2017	43	5.2	33	76	4.5	2.15	2.60	1.65	3.80	2.0	2.5	1.5	4.0
4	A Hologram for the King	2016	58	6.1	70	57	3.0	2.90	3.05	3.50	2.85	3.0	3.0	3.5	3.0

This dataset also appears to be clean, with no missing values.

In [8]:

#let's improve consistency in column names between the two datasets
data_2016 = data_2016.rename({'movie':'title'}, axis=1)

Analysis¶

These two datasets appear to be selected with similar but not identical criteria. Both seem to be biased towards movies with more user ratings. The Fivethirtyeight dataset required movies to have at least 30 or more Fandango user reviews. The second dataset selected the movies with the largest number of ratings. The samples are likely to be biased, and in different ways from each other.

Collecting more data is outside of the scope of this project, so, we're going to press on with the data we have and see whether it casts any light on whether Fandango changed their ratings system after Hickey's analysis.

Before continuing, let's explore the idea that both datasets contain movies with over 30 fan ratings on Fandango.

Fortunately, the first dataset recorded this information.

In [9]:

data_2015.Fandango_votes.describe()

Out[9]:

count      146.000000
mean      3848.787671
std       6357.778617
min         35.000000
25%        222.250000
50%       1446.000000
75%       4439.500000
max      34846.000000
Name: Fandango_votes, dtype: float64

We see that the movies do indeed have at least 30 user ratings. There is quite a range, though, one has almost 35K ratings.

Based on the stated criteria, I'd expect the second dataset will be skewed in favor of movies with larger number of ratings. We could check this by picking a random sample of movies and determining how many ratings each has.

Unfortunately we've hit a dead end. It appears that Fandango no longer shows ratings from its users. Instead it sources them from Rotten Tomatoes. Ideally we'd validate that our understanding of the data collected in 2016 is correct, but since we can't, we'll press on under our unverified assumptions.

In [11]:

data_2016.year.value_counts()

Out[11]:

2016    191
2017     23
Name: year, dtype: int64

In [12]:

data_2015.year.value_counts()

Out[12]:

2015    129
2014     17
Name: year, dtype: int64

Let's clean up our data a little and focus only on movies released in either 2015 or 2016.

In [13]:

data_2015 = data_2015[data_2015.year == 2015]
data_2016 = data_2016[data_2016.year == 2016]

Visualizing Distributions¶

In his original analysis, Hickey found that while Fandango presented averaged ratings in increments of 1/2 star, they appeared to round aggregated ratings up by 1/2 a star. He also noted that Fandango's ratings seemed to be very skewed, with no movies receiving less than three stars.

If Fandango made changes to improve the truthfullness of their ratings, we would expect the distribution of 2016 ragings to shift downwards by approximately 1/2 star.

In [21]:

plt.style.use('fivethirtyeight')
data_2015.Fandango_Stars.plot.kde(label='2015', legend =True, figsize=(8,8))
data_2016.fandango.plot.kde(label='2016', legend=True)
plt.title('Comparing Distribution of Fandango Star Ratings between 2015 and 2016', y = 1.03)
plt.xticks([0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0])
plt.xlim(0,5)
plt.xlabel('Stars')
plt.show()

When we visualize the distrubtions for both years we find that both the peak and leftmost value in the distribution for 2016 is indeed 1/2 star lower than they were in 2015.

We also see that while both distributions are left skewed, the distribution for 2016's ratings seems to be more symmetrical than that for 2015's ratings

Assuming that the audience expectations and movie quality remained the same between 2015 and 2016, this suggests that Fandango was manipulating ratings in 2015 and changed their practices in 2016, after the Fivethirtyeight expose was published.

In [15]:

freq_table = pd.DataFrame(data_2015.Fandango_Stars.value_counts(normalize=True))
freq_table = freq_table.join(data_2016.fandango.value_counts(normalize=True), how='outer')
freq_table.columns = ['2015', '2016']
freq_table.apply(lambda x: round(100*x,1)) 

Out[15]:

	2015	2016
2.5	NaN	3.1
3.0	8.5	7.3
3.5	17.8	24.1
4.0	28.7	40.3
4.5	38.0	24.6
5.0	7.0	0.5

Comparing Fandango's 2015 & 2016 movie ratings side-by-side in a frequency table supports the observations we made based on the density plots shown above.

The distribution of Fandango's 2016 movie ratings shifted 1/2 star lower compared to 2015's rathings. Moreover, 2016s ratings are more symmetrical around their peak.

In [22]:

stats_2015 = [data_2015.Fandango_Stars.mean(), 
              data_2015.Fandango_Stars.median(), 
              data_2015.Fandango_Stars.mode()[0]]
stats_2016 = [data_2016.fandango.mean(), 
              data_2016.fandango.median(), 
              data_2016.fandango.mode()[0]]
stats = pd.DataFrame()
stats[2015] = stats_2015
stats[2016] = stats_2016
stats.index = ['mean', 'median', 'mode']
stats

Out[22]:

	2015	2016
mean	4.085271	3.887435
median	4.000000	4.000000
mode	4.500000	4.000000

In 2015 mean, median and mode differed over a range of 0.5 stars. In 2016 the values of these sample statistics were much more closely spaced, differing by just 0.12 stars. This indicates that 2106's distribution was more "normal."

In [17]:

stats.plot.bar(rot=0, figsize=(8,6))
plt.ylabel("Stars")
plt.ylim(0,5.5)
plt.yticks([x*0.5 for x in range(10)])
plt.legend(loc='upper center')

Out[17]:

<matplotlib.legend.Legend at 0x1246915b0>

Multi-site Ratings Comparisons.¶

In [18]:

site_labels = ['Fandango', 'IMDB', 'Rotten Tomatoes', 'RT Audience', 'Metacritic', 'Metacritic User']

cols_2015=['Fandango_Stars', 'IMDB_norm_round', 'RT_norm_round', 'RT_user_norm_round', 
          'Metacritic_norm_round', 'Metacritic_user_norm_round' ]
data_2015[cols_2015].plot.kde(figsize=(8,8))
plt.title('Comparing Distribution of Star Ratings for Movie Review Sites (2015)', y = 1.03)
plt.legend(site_labels)
plt.xticks([0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0])
plt.xlim(0,5)
plt.xlabel('Stars')
plt.show()

Comparing the distribution of Fandango's star ratings for our sample of 2015's movies with various ratings from on other sites shows some interesting features:

Fandango's ratings tend to be significantly higher than those on other sites.
Fandango's ratings distribution is less symmetrical than those for other sites.
Aggregated critics and users ratings on both Rotten Tomatoes and Metacritic a wide range of values whereas IMDB's (user) ratings start at 2 stars.

In [19]:

cols_2016 = ['fandango', 'nr_imdb', 'nr_tmeter', 'nr_audience', 'nr_metascore' ]
data_2016[cols_2016].plot.kde(figsize=(8,8))
plt.title('Comparing Distribution of Star Ratings for Movie Review Sites (2016)', y = 1.03)
plt.legend(site_labels[:-1])
plt.xticks([0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0])
plt.xlim(0,5)
plt.xlabel('Stars')
plt.show()

Comparing the distribution of Fandango's star ratings for our sample of 2016's movies with various ratings from on other sites shows that Fandangos ratings are still higher than those on other sites, but their ratings distribution has become more symmetrical.

Site-by-site Comparison of Ratings Between 2015 and 20166¶

In [20]:

column_pairs = zip(
    cols_2015,
    cols_2016,
    site_labels
)

#seems to help ensure that graph is shwon on all iterations
plt.show()

for a, b, rating in list(column_pairs)[1:]:
    data_2015[a].plot.kde(label='2015', legend=True, figsize=(6,6))
    data_2016[b].plot.kde(label='2016', legend=True, figsize=(6,6))
    plt.title('Comparing {} ratings distributions 2015 to 2016'.format(rating), y = 1.03)
    plt.xticks([0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0])
    plt.xlim(0,5)
    plt.xlabel('Stars')
    plt.show();
    
    freq_table = pd.DataFrame(data_2015[a].value_counts(normalize=True))
    freq_table = freq_table.join(data_2016[b].value_counts(normalize=True), how='outer')
    freq_table.columns = ['2015', '2016']
    freq_table = freq_table.apply(lambda x: round(100*x,1))

    
    stats_2015 = [data_2015[a].mean(), 
                  data_2015[a].median(), 
                  data_2015[a].mode()[0]]
    stats_2016 = [data_2016[b].mean(), 
                  data_2016[b].median(), 
                  data_2016[b].mode()[0]]
    sample_stats = pd.DataFrame()
    sample_stats[2015] = stats_2015
    sample_stats[2016] = stats_2016
    sample_stats.index = ['mean', 'median', 'mode']

    widget_1 = widgets.Output()
    widget_2 = widgets.Output()

    with widget_1:
        display(freq_table.style.set_table_attributes("style='display:inline'"))

    with widget_2:
        display(sample_stats.style.set_table_attributes("style='display:inline'"))

    hbox = widgets.HBox([widget_1, widget_2])
    display(hbox)

HBox(children=(Output(), Output()))

HBox(children=(Output(), Output()))

HBox(children=(Output(), Output()))

HBox(children=(Output(), Output()))

If we compare distrbutions of 2015 and 2016 ratings for each site a few things stand out.

IMDB's average ratings are about 1/2 star lower in 2016 than 2015 while the minimum and maximum ratings have remained roughly the same.
Rotten Tomato's 2016 ratings are also lower, by 1/4 star, though in this case, much of the distribution has shifted, rather than just the peak.
Rotten Tomato's audience ratings are roughly 1/2 star lower in 2016, across the whole distribution.
Metacritics ratings distribution for 2016 seems to have shifted about 1/4 star on the right side of the peak vs 2015. The position of the peak and left side of the distribution remains roughly the same as 2015.

It's hard to say what underlies these year to year differences. However, given the population and sample sizes we are dealing with, differences in critical and audience receptions of just a handfull of films year to year could explain the differences.

The fact that all sites gave lower ratings, on average, to movies in 2016 vs 2015 casts our earlier findings about the differences in Fandango's ratings between the two years in a new light. If Fandango's ratings are lower in 2016 than 2015 for the same reasons other site's ratings are lower, then Fandango may not have corrected their ratings bias after all.

Conclusion¶

Fandango's 2016 reviews, in aggregate, are roughly 1/2 star lower than their 2015 reviews. This is as one would would expect if Fandango addressed the issues in the Fivethirtyeight expose, which found that Fandango seemed to be inflating film's average reviews by approximately 1/2 star. Or so it would seem if one only looked in the differences in distribution of Fandango's ratings in 2015 vs 2016.

If we look at the differences in the distributions or ratings on other film review sites we see that their aggregated reviews are also roughly 1/2 star lower in 2016 than they were in 2015. This suggests that 2016's films were not as well received as 2015's films, in general, and therefore that the change in Fandango's ratings was due to outside factors, rather than reforms in Fandango's ratings system.

In [ ]: