I like watching the Phillies. I do not have cable. Some Phillies games are broadcast on national television. This is how I made a list of those games.
Pandas is a data analysis tool for the Python programming language. It can do a tremendous amount of really powerful data analysis and visualization. It's a gun in this CSV knife fight.
import pandas as pd
A downloadable CSV schedule is available from mlb.com. Here is a direct link to the Phillies schedule.
The CSV schedule will be used to instantiate a Pandas DataFrame object.
schedule = pd.DataFrame.from_csv("phillies.csv")
schedule.info()
162 games and 16 columns of data for each game.
schedule.head()
The DESCRIPTION
column contains the broadcast information. Less interesting columns can be removed.
schedule.drop(["REMINDER_OFF",
"REMINDER_ON",
"START_TIME_ET",
"END_DATE",
"END_DATE_ET",
"END_TIME",
"END_TIME_ET",
"REMINDER_TIME",
"REMINDER_TIME_ET",
"SHOWTIMEAS_FREE",
"SHOWTIMEAS_BUSY",
"REMINDER_DATE"], axis=1, inplace=True)
schedule.head()
The DESCRIPTION
column is nice because it mentions the stations that games are broadcast on. Sometimes a game is broadcast on two channels at once. There is also radio broadcast information that I'm not interested in right now.
schedule.DESCRIPTION.head()
DESCRIPTION
¶Thankfully, the DESCRIPTION
column data is parseable. Getting a list of television broadcast stations for each game is not difficult. Picking a game that is broadcast on multiple channels should cover all cases.
description = schedule.DESCRIPTION[2]
print description
def tv_stations_from_description(description):
"""Return a list of television stations embedded in the given description."""
return [station.strip() for station in description.split(":")[1].split("-----")[0].split("--")]
result = tv_stations_from_description(description)
print result
assert(len(result) == 2)
Picking a game broadcast on a single channel to test the parsing function.
description = schedule.DESCRIPTION[0]
print description
result = tv_stations_from_description(description)
print result
assert(len(result) == 1)
Applying this function to the DataFrame yields a Series
of all television stations on which the Phillies are broadcast this season.
stations_series = schedule.DESCRIPTION.apply(
lambda description: [station.strip() for station in
description.split(":")[1].split("-----")[0].split("--")])
stations_series
Creating a set
of stations from that Series
will yield a concise list of distinct television broadcast stations.
set([station for stations in stations_series.values for station in stations])
The 162 regular season Phillies games are broadcast on 7 television channels. Unfortunately only 2 of those 7 stations are available without a cable television subscription. This means that I can only watch games on NBC and FOX.
Filtering the DESCRIPTION
column to national television broadcast stations yields only the games which I can watch over the air with my HD antenna.
schedule[(schedule.DESCRIPTION.str.contains("NBC 10")) |
(schedule.DESCRIPTION.str.contains("FOX"))]
This means that I have the possibility to watch 13 out of 162 regular season Phillies games this season which is roughly 8%.