Fandango's Movie Ratings: an Update
We are revisiting the Fandango movie rating system. Previously, the company was criticized after Hickey discovered that Fandango's ratings were biased high. Fandango had promised to resolve an underlying bug the company claimed was causing this to happen. Let's see what changes have been made.
import pandas as pd
fandango_before = pd.read_csv("fandango_score_comparison.csv")
fandango_after = pd.read_csv("movie_ratings_16_17.csv")
fandango_before.head(5)
FILM | RottenTomatoes | RottenTomatoes_User | Metacritic | Metacritic_User | IMDB | Fandango_Stars | Fandango_Ratingvalue | RT_norm | RT_user_norm | ... | IMDB_norm | RT_norm_round | RT_user_norm_round | Metacritic_norm_round | Metacritic_user_norm_round | IMDB_norm_round | Metacritic_user_vote_count | IMDB_user_vote_count | Fandango_votes | Fandango_Difference | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Avengers: Age of Ultron (2015) | 74 | 86 | 66 | 7.1 | 7.8 | 5.0 | 4.5 | 3.70 | 4.3 | ... | 3.90 | 3.5 | 4.5 | 3.5 | 3.5 | 4.0 | 1330 | 271107 | 14846 | 0.5 |
1 | Cinderella (2015) | 85 | 80 | 67 | 7.5 | 7.1 | 5.0 | 4.5 | 4.25 | 4.0 | ... | 3.55 | 4.5 | 4.0 | 3.5 | 4.0 | 3.5 | 249 | 65709 | 12640 | 0.5 |
2 | Ant-Man (2015) | 80 | 90 | 64 | 8.1 | 7.8 | 5.0 | 4.5 | 4.00 | 4.5 | ... | 3.90 | 4.0 | 4.5 | 3.0 | 4.0 | 4.0 | 627 | 103660 | 12055 | 0.5 |
3 | Do You Believe? (2015) | 18 | 84 | 22 | 4.7 | 5.4 | 5.0 | 4.5 | 0.90 | 4.2 | ... | 2.70 | 1.0 | 4.0 | 1.0 | 2.5 | 2.5 | 31 | 3136 | 1793 | 0.5 |
4 | Hot Tub Time Machine 2 (2015) | 14 | 28 | 29 | 3.4 | 5.1 | 3.5 | 3.0 | 0.70 | 1.4 | ... | 2.55 | 0.5 | 1.5 | 1.5 | 1.5 | 2.5 | 88 | 19560 | 1021 | 0.5 |
5 rows × 22 columns
fandango_after.head(5)
movie | year | metascore | imdb | tmeter | audience | fandango | n_metascore | n_imdb | n_tmeter | n_audience | nr_metascore | nr_imdb | nr_tmeter | nr_audience | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10 Cloverfield Lane | 2016 | 76 | 7.2 | 90 | 79 | 3.5 | 3.80 | 3.60 | 4.50 | 3.95 | 4.0 | 3.5 | 4.5 | 4.0 |
1 | 13 Hours | 2016 | 48 | 7.3 | 50 | 83 | 4.5 | 2.40 | 3.65 | 2.50 | 4.15 | 2.5 | 3.5 | 2.5 | 4.0 |
2 | A Cure for Wellness | 2016 | 47 | 6.6 | 40 | 47 | 3.0 | 2.35 | 3.30 | 2.00 | 2.35 | 2.5 | 3.5 | 2.0 | 2.5 |
3 | A Dog's Purpose | 2017 | 43 | 5.2 | 33 | 76 | 4.5 | 2.15 | 2.60 | 1.65 | 3.80 | 2.0 | 2.5 | 1.5 | 4.0 |
4 | A Hologram for the King | 2016 | 58 | 6.1 | 70 | 57 | 3.0 | 2.90 | 3.05 | 3.50 | 2.85 | 3.0 | 3.0 | 3.5 | 3.0 |
fandango_before.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 146 entries, 0 to 145 Data columns (total 22 columns): FILM 146 non-null object RottenTomatoes 146 non-null int64 RottenTomatoes_User 146 non-null int64 Metacritic 146 non-null int64 Metacritic_User 146 non-null float64 IMDB 146 non-null float64 Fandango_Stars 146 non-null float64 Fandango_Ratingvalue 146 non-null float64 RT_norm 146 non-null float64 RT_user_norm 146 non-null float64 Metacritic_norm 146 non-null float64 Metacritic_user_nom 146 non-null float64 IMDB_norm 146 non-null float64 RT_norm_round 146 non-null float64 RT_user_norm_round 146 non-null float64 Metacritic_norm_round 146 non-null float64 Metacritic_user_norm_round 146 non-null float64 IMDB_norm_round 146 non-null float64 Metacritic_user_vote_count 146 non-null int64 IMDB_user_vote_count 146 non-null int64 Fandango_votes 146 non-null int64 Fandango_Difference 146 non-null float64 dtypes: float64(15), int64(6), object(1) memory usage: 25.2+ KB
fandango_after.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 214 entries, 0 to 213 Data columns (total 15 columns): movie 214 non-null object year 214 non-null int64 metascore 214 non-null int64 imdb 214 non-null float64 tmeter 214 non-null int64 audience 214 non-null int64 fandango 214 non-null float64 n_metascore 214 non-null float64 n_imdb 214 non-null float64 n_tmeter 214 non-null float64 n_audience 214 non-null float64 nr_metascore 214 non-null float64 nr_imdb 214 non-null float64 nr_tmeter 214 non-null float64 nr_audience 214 non-null float64 dtypes: float64(10), int64(4), object(1) memory usage: 25.2+ KB
fan_before = fandango_before.loc[:, ['FILM', 'Fandango_Stars', 'Fandango_Ratingvalue', 'Fandango_votes', 'Fandango_Difference']]
fan_after = fandango_after.loc[:, ['movie', 'year', 'fandango']]
fan_before.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 146 entries, 0 to 145 Data columns (total 5 columns): FILM 146 non-null object Fandango_Stars 146 non-null float64 Fandango_Ratingvalue 146 non-null float64 Fandango_votes 146 non-null int64 Fandango_Difference 146 non-null float64 dtypes: float64(3), int64(1), object(1) memory usage: 5.8+ KB
fan_before.head(3)
FILM | Fandango_Stars | Fandango_Ratingvalue | Fandango_votes | Fandango_Difference | |
---|---|---|---|---|---|
0 | Avengers: Age of Ultron (2015) | 5.0 | 4.5 | 14846 | 0.5 |
1 | Cinderella (2015) | 5.0 | 4.5 | 12640 | 0.5 |
2 | Ant-Man (2015) | 5.0 | 4.5 | 12055 | 0.5 |
fan_after.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 214 entries, 0 to 213 Data columns (total 3 columns): movie 214 non-null object year 214 non-null int64 fandango 214 non-null float64 dtypes: float64(1), int64(1), object(1) memory usage: 5.1+ KB
fan_after.head(3)
movie | year | fandango | |
---|---|---|---|
0 | 10 Cloverfield Lane | 2016 | 3.5 |
1 | 13 Hours | 2016 | 4.5 |
2 | A Cure for Wellness | 2016 | 3.0 |
Since we want to know if there have been changes since Hickey's analysis, the population of interest would be all of Fandango's ratings.
The before analysis data (2015) contains movies that were rated by Metacritic, Rotten Tomatoes, IMDB, and Fandango, with at least 30 fan reviews from the latter. Additionally, they were movies with tickets on sale in 2015. These criteria mean that movies that did not meet the conditions stated were excluded. Hence the before analysis data is not representative.
The 2016/2017 (after analysis) data was only of the most popular movies, defined as those films with a significant number of votes. The films were those released in 2016 and 2017. These conditions are also not representative of the population.
Because the data is not representative of films in general, we will focus on determining if Fandango's rating system has improved for popular films only.
Hickey set 30 fan reviews as a benchmark for including a film in his analysis. We will use this amount as the minimum required for a film to be deemed popular.
print(sum(fan_before['Fandango_votes'] >= 30))
146
In the before data, all 146 films have at least 30 reviews.
Unfortunately, the after analysis data does not include the number of reviews. We can randomly select twenty films and get their number of reviews from Fandango. If at least 90% of the films had 30 or more reviews, we will consider the movies in this dataset as popular.
sample = fan_after['movie'].sample(n=20, random_state=1)
print(sample)
108 Mechanic: Resurrection 206 Warcraft 106 Max Steel 107 Me Before You 51 Fantastic Beasts and Where to Find Them 33 Cell 59 Genius 152 Sully 4 A Hologram for the King 31 Captain America: Civil War 118 Mr. Church 39 Crouching Tiger, Hidden Dragon: Sword of Destiny 93 Kung Fu Panda 3 69 Hidden Figures 161 The Autopsy of Jane Doe 112 Misconduct 94 La La Land 97 Live by Night 151 Suicide Squad 38 Criminal Name: movie, dtype: object
print(fan_after.iloc[106])
movie Max Steel year 2016 fandango 3.5 Name: 106, dtype: object
The number of reviews on Fandango is actually powered by Rotten Tomatoes. Below are the results.
sample_ratings = [25630, 31612, 6835, 30684, 87952, 3832,
7, 48533, 10228, 180138, 4649, 12095,
99917, 58476, 12250, 1148, 71277, 13332,
145910, 20594]
sample['num_reviews'] = sample_ratings
print(sample.head(3))
108 Mechanic: Resurrection 206 Warcraft 106 Max Steel Name: movie, dtype: object
print(sum(sample['num_reviews'] < 30))
TypeErrorTraceback (most recent call last) <ipython-input-26-457521aa49cb> in <module>() ----> 1 print(sum(sample['num_reviews'] < 30)) TypeError: unorderable types: list() < int()
1:20 (5%) of the films had less than thirty ratings. The results of our random sampling indicate the dataset is representative of the population.
movies_15 = fan_before[fan_before["FILM"].str.contains("2015")]
print(movies_15.head(3))
FILM Fandango_Stars Fandango_Ratingvalue \ 0 Avengers: Age of Ultron (2015) 5.0 4.5 1 Cinderella (2015) 5.0 4.5 2 Ant-Man (2015) 5.0 4.5 Fandango_votes Fandango_Difference 0 14846 0.5 1 12640 0.5 2 12055 0.5