A sentiment(al) analysis of why Red Dwarf is no longer funny (to me)

A Jupyter [Data] Mining Corp Investigation

What follows is a faux analysis of the humour in Red Dwarf and why it seems to be waning in recent series. While undoubtedly tounge-in-cheek, I have endeavoured to bring modern method, tools and techniques to bear on the question of whether Red Dwarf is less amusing now than when it started.

This investigation uses F# as the central language for data analysis and employs the following packages:

In [1]:
#load "Paket.fsx"

Paket.Dependencies.Install """
frameworks: net45
source https://nuget.org/api/v2
nuget FSharp.Data
nuget XPlot.GoogleCharts
"""

Paket.LoadingScripts.ScriptGeneration.generateScriptsForRootFolder 
   Paket.LoadingScripts.ScriptGeneration.FSharp
   (Paket.FrameworkIdentifier.DotNetFramework Paket.FrameworkVersion.V4_5)
   (System.IO.DirectoryInfo __SOURCE_DIRECTORY__)
   
#load "paket-files/include-scripts/net45/include.main.group.fsx"
//#load "XPlot.Plotly.Paket.fsx"
//#load "XPlot.Plotly.fsx"

open System
open System.IO
open FSharp.Data
open FSharp.Data.JsonExtensions
open XPlot
open XPlot.GoogleCharts

I also (temporarily) need a custom display printer to support XPlot.GoogleCharts (pending the resolution of an issue)

In [2]:
open IfSharp.Kernel.App

@"<script src=""https://www.google.com/jsapi""></script>" |> Util.Html |> Display

type XPlot.GoogleCharts.GoogleChart with
  member __.GetContentHtml() =
    let html = __.GetInlineHtml()
    html
      .Replace ("google.setOnLoadCallback(drawChart);", "google.load('visualization', '1.0', { packages: ['corechart'], callback: drawChart })")

type XPlot.GoogleCharts.Chart with
  static member Content (chart : GoogleChart) =
    { ContentType = "text/html"; Data = chart.GetContentHtml() }

AddDisplayPrinter (fun (plot: XPlot.GoogleCharts.GoogleChart) -> { ContentType = "text/html"; Data = plot.GetContentHtml() })
Out[2]:
<null>

Observation

When I read that there would be new series of Red Dwarf I was filled with both excitment and dread. Excitement by the thought of new episodes of one of my all time favourite shows and dread that it would be as bad - if not worse - than the series 8, 9 and (shiver) "X".

And so it was that on 23rd September 2016, I sat with 'bated breath as the opening credits of Red Dwarf XI began to roll. Half an hour and just a couple of chuckles later, the best I could say about it was "Meh".

Unfortunately, the opening show seemed to be the highlight of the series and, as I ground my way through episode 4, I formed the central observation of this investigation, namely: "Red Dwarf just isn't funny as funny as it used to be".

Formulation

But then it occured to me, Red Dwarf's crew weren't the only ones to have aged. I am also older and my tastes have changed significantly since being the spotty pre-teen who first started watching the show back in '88. Perhaps Red Dwarf is as funny now as it always was and I'm the one who is no longer funny.

So: "Is Red Dwarf still funny and has my transformation into an adult and parent simply robbed me of my ability to appreciate it? Or has it truely become as humourless as I found it?"

Amused by the idea of trying to objectively answer this question, I added a card to the "Ideas" column of my public trello board and promptly forgot about it. That was until recently when I encountered Project Jupyter and, consequently, Azure Notebooks. The play-on words I could use to form the lead of this investigating were just too alluring and so it was that I moved my trello card to "in-progress" and began the process of answering this question.

Information

As I form a central part of the question, I needed to remove myself from the equation and endeavour to find an objective measure (well, as objective as possible where humour is involved) of the humourousness of each episode of Red Dwarf. Imdb was the obvious first point of call and, sure enough, they provide a wealth of information about each episode of Red Dwarf including release date and rating. As pretty much every other source of information re tv/movies is able to access information by Imdb rating, I defined the following collection:

In [3]:
type EpisodeInformation = {
    Id : string;
    Season : int
    Episode : int
}

let episodeInformation = [
    { Id = "tt0684181"; Season = 1; Episode = 1 };
    { Id = "tt0684157"; Season = 1; Episode = 2 };
    { Id = "tt0684145"; Season = 1; Episode = 3 };
    { Id = "tt0684186"; Season = 1; Episode = 4 };
    { Id = "tt0684151"; Season = 1; Episode = 5 };
    { Id = "tt0684165"; Season = 1; Episode = 6 };
    { Id = "tt0684161"; Season = 2; Episode = 1 };
    { Id = "tt0684146"; Season = 2; Episode = 2 };
    { Id = "tt0684180"; Season = 2; Episode = 3 };
    { Id = "tt0684177"; Season = 2; Episode = 4 };
    { Id = "tt0684175"; Season = 2; Episode = 5 };
    { Id = "tt0684169"; Season = 2; Episode = 6 };
    { Id = "tt0684144"; Season = 3; Episode = 1 };
    { Id = "tt0767232"; Season = 3; Episode = 2 };
    { Id = "tt0684172"; Season = 3; Episode = 3 };
    { Id = "tt0684148"; Season = 3; Episode = 4 };
    { Id = "tt0684185"; Season = 3; Episode = 5 };
    { Id = "tt0684183"; Season = 3; Episode = 6 };
    { Id = "tt0684149"; Season = 4; Episode = 1 };
    { Id = "tt0684152"; Season = 4; Episode = 2 };
    { Id = "tt0684160"; Season = 4; Episode = 3 };
    { Id = "tt0684187"; Season = 4; Episode = 4 };
    { Id = "tt0684153"; Season = 4; Episode = 5 };
    { Id = "tt0684164"; Season = 4; Episode = 6 };
    { Id = "tt0684159"; Season = 5; Episode = 1 };
    { Id = "tt0684182"; Season = 5; Episode = 2 };
    { Id = "tt0684179"; Season = 5; Episode = 3 };
    { Id = "tt0684174"; Season = 5; Episode = 4 };
    { Id = "tt0756588"; Season = 5; Episode = 5 };
    { Id = "tt0684143"; Season = 5; Episode = 6 };
    { Id = "tt0684173"; Season = 6; Episode = 1 };
    { Id = "tt0684163"; Season = 6; Episode = 2 };
    { Id = "tt0684158"; Season = 6; Episode = 3 };
    { Id = "tt0684155"; Season = 6; Episode = 4 };
    { Id = "tt0684176"; Season = 6; Episode = 5 };
    { Id = "tt0756589"; Season = 6; Episode = 6 };
    { Id = "tt0684184"; Season = 7; Episode = 1 };
    { Id = "tt0684178"; Season = 7; Episode = 2 };
    { Id = "tt0684168"; Season = 7; Episode = 3 };
    { Id = "tt0684154"; Season = 7; Episode = 4 };
    { Id = "tt0756587"; Season = 7; Episode = 5 };
    { Id = "tt0684147"; Season = 7; Episode = 6 };
    { Id = "tt0684156"; Season = 7; Episode = 7 };
    { Id = "tt0684166"; Season = 7; Episode = 8 };
    { Id = "tt0684140"; Season = 8; Episode = 1 };
    { Id = "tt0684141"; Season = 8; Episode = 2 };
    { Id = "tt0684142"; Season = 8; Episode = 3 };
    { Id = "tt0684150"; Season = 8; Episode = 4 };
    { Id = "tt0684162"; Season = 8; Episode = 5 };
    { Id = "tt0684170"; Season = 8; Episode = 6 };
    { Id = "tt0684171"; Season = 8; Episode = 7 };
    { Id = "tt0684167"; Season = 8; Episode = 8 };
    { Id = "tt1365540"; Season = 9; Episode = 1 };
    { Id = "tt1371606"; Season = 9; Episode = 2 };
    { Id = "tt1400975"; Season = 9; Episode = 3 };
    { Id = "tt1997038"; Season = 10; Episode = 1 };
    { Id = "tt1999714"; Season = 10; Episode = 2 };
    { Id = "tt1999715"; Season = 10; Episode = 3 };
    { Id = "tt1999716"; Season = 10; Episode = 4 };
    { Id = "tt1999717"; Season = 10; Episode = 5 };
    { Id = "tt1999718"; Season = 10; Episode = 6 };
    { Id = "tt5218244"; Season = 11; Episode = 1 };
    { Id = "tt5218254"; Season = 11; Episode = 2 };
    { Id = "tt5218266"; Season = 11; Episode = 3 };
    { Id = "tt5218284"; Season = 11; Episode = 4 };
    { Id = "tt5218308"; Season = 11; Episode = 5 };
    { Id = "tt5218316"; Season = 11; Episode = 6 }
]

Unfortunately, Imdb doesn't provide any public API for accessing information from their website (apart from some patchy flat file "reports") so I looked at using Omdb and WeMakeSites's IMDB api but found both these sources returned patchy data. Finally I turned to the The Movie Db (TMDB) which, once you're registered and have an API key, provided a much better API and very consistent show information.

Using the code below, I retrieved all information Tmdb had available for each episode of Red Dwarf and saved it locally for future use.

open System.IO;

let writeFile path id (json : string) =
  let fileName = "./Data/" + path + "/" + id + ".json"
  use streamWriter = new StreamWriter(fileName, false)
  streamWriter.WriteLine(json)

episodeInformation
|> Seq.map (fun es -> System.Threading.Thread.Sleep(1000); es)
|> Seq.map (fun es -> (es.Id, Http.RequestString ("https://api.themoviedb.org/3/tv/326/season/" + es.Season.ToString() + "/episode/" + es.Episode.ToString(), query=["api_key", "<API KEY>"])))
|> Seq.iter (fun (id, json) -> writeFile "TheMovieDb" id json)

I thought I'd be able to save this data locally within this library but neither the above script nor attempting to manually upload the data to this library succeeded. Reading the guidance here I decided to host the data in Github instead

With the Tmdb data stored locally, I could then use the JsonValue parser from FSharp.Data to dynamically query the content. For example, with the code below:

In [4]:
let loadTmdbJson id =
  let fileName = "https://raw.githubusercontent.com/ibebbs/RedDwarfAnalysis/master/TheMovieDb/" + id + ".json"
  JsonValue.Load(fileName)
  
type TmdbEpisode = {
  Name : string;
  Season : int;
  Episode : int;
  AirDate : DateTime;
  Overview : string;
}

let parseTmdbJson (json : JsonValue) = {
  Name = json?name.AsString(); 
  Season = json?season_number.AsInteger(); 
  Episode = json?episode_number.AsInteger(); 
  AirDate = json?air_date.AsDateTime();
  Overview = json?overview.AsString()
}

I can parse any episode's information as shown here:

In [5]:
"tt0684181" |> loadTmdbJson |> parseTmdbJson
Out[5]:
{Name = "The End";
 Season = 1;
 Episode = 1;
 AirDate = 02/15/1988 00:00:00;
 Overview =
  "Third technician Dave Lister wakes from stasis to find himself alone aboard the mining ship Red Dwarf, three million years after the end of humanity. His new existence isn’t entirely lonely: also aboard are cowardly hologram Arnold J Rimmer, the ship’s senile computer Holly, and an evolved, self-absorbed descendant of his pet cat. Unfortunately, there’s little left to do in the universe but bicker with his unlikely new friends, make a pig sty out of his bunk and, unwittingly, eat the powdered remains of his former crewmates.";}

While Tmdb provides some rating information, to answer the fundamental question of this investigation I needed to find detailed rating information, at a show level, ideally with a demographic breakdown. In this it seemed that Imdb was the only game in town so, although against their terms of service, I decided to scrape the information from the site.

As before I wrote the code below to download all rating information Imdb had for each episode of Red Dwarf into a local store:

episodeInformation
|> Seq.map (fun es -> System.Threading.Thread.Sleep(1000); es)
|> Seq.map (fun es -> (es.Id, Http.RequestString ("http://www.imdb.com/title/" + es.Id + "/ratings")))
|> Seq.iter (fun (id, json) -> writeHtml "Ratings" id json)

I then wrote the following to parse the vote/rating information for each category of demographic and for a specific episode:

In [6]:
let ratingCategoryNames = [
  "Males";
  "Females";
  "Aged under 18";
  "Males under 18";
  "Aged 18-29";
  "Males Aged 18-29";
  "Females Aged 18-29";
  "Aged 30-44";
  "Males Aged 30-44";
  "Females Aged 30-44";
  "Aged 45+";
  "Males Aged 45+";
  "Females Aged 45+";
  "Top 1000 voters";
  "US users";
  "Non-US users";
]

type RatingCategory =
  | ``Males`` = 0
  | ``Females`` = 1
  | ``Aged under 18`` = 2
  | ``Males under 18`` = 3
  | ``Aged 18-29`` = 4
  | ``Males Aged 18-29`` = 5
  | ``Females Aged 18-29`` = 6
  | ``Aged 30-44`` = 7
  | ``Males Aged 30-44`` = 8
  | ``Females Aged 30-44`` = 9
  | ``Aged 45`` = 10
  | ``Males Aged 45`` = 11
  | ``Females Aged 45`` = 12
  | ``Top 1000 voters`` = 13
  | ``US users`` = 14
  | ``Non-US users`` = 15

type EpisodeRatings = {
    Id : string;
    Category : RatingCategory;
    Votes : int;
    Rating : decimal
}

let parseCategory c =
  let index = Seq.tryFindIndex (fun cn -> cn = c) ratingCategoryNames
  match index with
  | Some x -> Some (enum<RatingCategory>(x))
  | None -> None

let parseRatings id =
  let title (node : HtmlNode) =
      node.Descendants["a"]
      |> Seq.map (fun d -> d.InnerText())
  
  let votes (node : HtmlNode) =
      [ node.InnerText() ]
  
  let rating (node : HtmlNode) =
      [ node.InnerText() ]
  let document = HtmlDocument.Load("https://raw.githubusercontent.com/ibebbs/RedDwarfAnalysis/master/Ratings/" + id + ".html")
  let content = document.CssSelect("#tn15content").[0]
  let tables = 
    content.Descendants["table"]
    |> Seq.toArray
  let rows =
    tables.[1].Descendants["tr"]
    |> Seq.map (fun row -> (row, row.Descendants["td"] |> Seq.toArray))
    |> Seq.where (fun (row, data) -> data.Length = 3)
    |> Seq.map (fun (row, data) -> ( (title data.[0]), (votes data.[1]), (rating data.[2])))
    |> Seq.collect (fun (t, v, r) -> Seq.zip3 t v r)
    |> Seq.map (fun (t, v, r) -> ((parseCategory t), System.Int32.Parse(v.Trim()), System.Decimal.Parse(r.Trim())))
    |> Seq.where (fun (t, v, r) -> t.IsSome)
    |> Seq.map (fun (t, v, r) -> { Id = id; Category = t.Value; Votes = v; Rating = r })
  rows
  
type EpisodeRating = {
  ``Males`` : decimal option;
  ``Females`` : decimal option;
  ``Aged under 18`` : decimal option;
  ``Males under 18`` : decimal option;
  ``Aged 18-29`` : decimal option;
  ``Males Aged 18-29`` : decimal option;
  ``Females Aged 18-29`` : decimal option;
  ``Aged 30-44`` : decimal option;
  ``Males Aged 30-44`` : decimal option;
  ``Females Aged 30-44`` : decimal option;
  ``Aged 45`` : decimal option;
  ``Males Aged 45`` : decimal option;
  ``Females Aged 45`` : decimal option;
  ``Top 1000 voters`` : decimal option;
  ``US users`` : decimal option;
  ``Non-US users`` : decimal option;
}

let tryFind (dict : System.Collections.Generic.IDictionary<'a,'b>) (key : 'a) =
  let containsKey = dict.ContainsKey(key)
  match containsKey with
  | true -> Some dict.[key]
  | false -> None

let pivotRatings (ratings : EpisodeRatings seq) =
   let dictionary = 
     ratings
     |> Seq.map (fun r -> (r.Category, r.Rating))
     |> dict
   let rating = {
     ``Males`` = (tryFind dictionary RatingCategory.``Males``);
     ``Females`` = (tryFind dictionary RatingCategory.``Females``);
     ``Aged under 18`` = (tryFind dictionary RatingCategory.``Aged under 18``);
     ``Males under 18`` = (tryFind dictionary RatingCategory.``Males under 18``);
     ``Aged 18-29`` = (tryFind dictionary RatingCategory.``Aged 18-29``);
     ``Males Aged 18-29`` = (tryFind dictionary RatingCategory.``Males Aged 18-29``);
     ``Females Aged 18-29`` = (tryFind dictionary RatingCategory.``Females Aged 18-29``);
     ``Aged 30-44`` = (tryFind dictionary RatingCategory.``Aged 30-44``);
     ``Males Aged 30-44`` = (tryFind dictionary RatingCategory.``Males Aged 30-44``);
     ``Females Aged 30-44`` = (tryFind dictionary RatingCategory.``Females Aged 30-44``);
     ``Aged 45`` = (tryFind dictionary RatingCategory.``Aged 45``);
     ``Males Aged 45`` = (tryFind dictionary RatingCategory.``Males Aged 45``);
     ``Females Aged 45`` = (tryFind dictionary RatingCategory.``Females Aged 45``);
     ``Top 1000 voters`` = (tryFind dictionary RatingCategory.``Top 1000 voters``);
     ``US users`` = (tryFind dictionary RatingCategory.``US users``);
     ``Non-US users`` = (tryFind dictionary RatingCategory.``Non-US users``)
   }
   rating

let loadRatings id =
  let ratings = parseRatings id
  let rating = pivotRatings ratings
  rating

Which lets me retrieve ratings for each demographic per episode. For example, "The End" (Series 1, Episode 1) gives:

In [7]:
loadRatings "tt0684181"
Out[7]:
{Males = Some 8.0M;
 Females = Some 8.3M;
 Aged under 18 = Some 8.0M;
 Males under 18 = Some 8.0M;
 Aged 18-29 = Some 8.1M;
 Males Aged 18-29 = Some 8.2M;
 Females Aged 18-29 = Some 7.2M;
 Aged 30-44 = Some 8.1M;
 Males Aged 30-44 = Some 8.0M;
 Females Aged 30-44 = Some 8.8M;
 Aged 45 = Some 7.8M;
 Males Aged 45 = Some 7.8M;
 Females Aged 45 = Some 7.4M;
 Top 1000 voters = Some 7.5M;
 US users = Some 7.9M;
 Non-US users = Some 8.1M;}

Finally I'll combine these two data sources into a tuple so they can be used together:

In [8]:
let loadData id =
  let episode = id |> loadTmdbJson |> parseTmdbJson
  let ratings = id |> loadRatings
  (episode, ratings)

With these two sources in hand I can finally start digging into the data.

Exploration

To evaluate whether it's Red Dwarf or me who has changed, I will look at the "Top 1000 voters" rating for each episode to see if there is a trend.

In [9]:
let ratingsByDate = 
  episodeInformation
  |> Seq.map (fun ei -> loadData ei.Id)
  |> Seq.map (fun (episode, ratings) -> (episode.AirDate, ratings.``Top 1000 voters``))
  |> Seq.where (fun (date, rating) -> rating.IsSome)
  |> Seq.map (fun (date, rating) -> (date, rating.Value))
  |> Seq.sortBy (fun (date, rating) -> date)
  |> Seq.toList
In [10]:
let options = Options(pointSize=3, colors=[|"#3B8FCC"|], trendlines=[|Trendline(opacity=0.5,lineWidth=5,color="#C0D9EA")|], hAxis=Axis(title="Date"), vAxis=Axis(title="Rating"))
Chart.Scatter(ratingsByDate) |> Chart.WithOptions (options)
Out[10]:

As we can see, there is a definite downward trend in rating which suggests that the show is getting less funny, but it's not as stark as I was expecting. Indeed, a rating of 7.184 (based on the trend value at the time of the last show) would still place Red Dwarf in IMDB's Top 1000 TV Shows which, to me, seems incongruous given how poor the most recent episodes have been.

I therefore wonder whether reviews are being buoyed up by younger viewers rating the show highly without an appreciation of how inferior they are (in my opinion) to the earlier series.

I'll therefore add a few more datapoints to the chart here based on each age category's rating:

In [11]:
let ratingsByDateAndAgeCategory = 
  episodeInformation
  |> Seq.map (fun ei -> loadData ei.Id)
  |> Seq.collect (fun (episode, ratings) -> [| (episode.AirDate, RatingCategory.``Aged under 18``, ratings.``Aged under 18``); (episode.AirDate, RatingCategory.``Aged 18-29``, ratings.``Aged 18-29``); (episode.AirDate, RatingCategory.``Aged 30-44``, ratings.``Aged 30-44``); (episode.AirDate, RatingCategory.``Aged 45``, ratings.``Aged 45``)|])
  |> Seq.where (fun (date, category, rating) -> rating.IsSome)
  |> Seq.map (fun (date, category, rating) -> (date, category, rating.Value))
  |> Seq.groupBy (fun (date, category, rating) -> category)
  |> Seq.map (fun (key, values) -> values |> Seq.map (fun (date, category, rating) -> (date, rating)) |> Seq.sortBy (fun (date, rating) -> date))
  |> Seq.toList

let options = 
  Options(
    pointSize=3, 
    colors=[|"#6AA590"; "#7DE6C1"; "#57E6B3"; "#60A6D0"; "#3B8FCC"|], 
    trendlines=[|
      Trendline(opacity=0.5,lineWidth=5,color="#6AA590");
      Trendline(opacity=0.5,lineWidth=5,color="#7DE6C1");
      Trendline(opacity=0.5,lineWidth=5,color="#57E6B3");
      Trendline(opacity=0.5,lineWidth=5,color="#60A6D0");
      Trendline(opacity=0.5,lineWidth=5,color="#3B8FCC")|],      
    hAxis=Axis(title="Date"),
    vAxis=Axis(title="Rating"))

Chart.Scatter(ratingsByDateAndAgeCategory, [|"Aged under 18"; "Aged 18-29";"Aged 30-44";"Aged 45+";"????"|])
|> Chart.WithOptions(options) 
|> Chart.WithLegend(true)
Out[11]:

And here's the kicker. Turns out that, while there is a general downward trend across all age groups, it's actually the oldest demographic that is rating the new series the highest. The younger generation are observing the biggest deterioration in rating (albeit from a lower initial rating) which is completely opposite to my expectations.

Conclusion

This investigation has been quite interesting. While most demographics (including my own) observe a slight decline in rating, the decline has been nowhere near what I was expecting and the latter series have been rating much higher than I would place them.

I guess the only conclusion I can make from this is that either I'm extremely young at heart or my missus is correct: I'm a miserably old git.