I like watching the Phillies. I do not have cable. Some Phillies games are broadcast on national television. This is how I made a list of those games.
Pandas is a data analysis tool for the Python programming language. It can do a tremendous amount of really powerful data analysis and visualization. It's a gun in this CSV knife fight.
import pandas as pd
A downloadable CSV schedule is available from mlb.com. Here is a direct link to the Phillies schedule.
The CSV schedule will be used to instantiate a Pandas DataFrame object.
schedule = pd.DataFrame.from_csv("phillies-2015.csv")
schedule.info()
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 197 entries, 2015-03-01 00:00:00 to 2015-10-04 00:00:00 Data columns (total 16 columns): START_TIME 197 non-null object START_TIME_ET 197 non-null object SUBJECT 197 non-null object LOCATION 197 non-null object DESCRIPTION 193 non-null object END_DATE 197 non-null object END_DATE_ET 197 non-null object END_TIME 197 non-null object END_TIME_ET 197 non-null object REMINDER_OFF 197 non-null bool REMINDER_ON 197 non-null bool REMINDER_DATE 197 non-null object REMINDER_TIME 197 non-null object REMINDER_TIME_ET 197 non-null object SHOWTIMEAS_FREE 197 non-null object SHOWTIMEAS_BUSY 197 non-null object dtypes: bool(2), object(14)
197 games and 16 columns of data for each game.
schedule.head()
START_TIME | START_TIME_ET | SUBJECT | LOCATION | DESCRIPTION | END_DATE | END_DATE_ET | END_TIME | END_TIME_ET | REMINDER_OFF | REMINDER_ON | REMINDER_DATE | REMINDER_TIME | REMINDER_TIME_ET | SHOWTIMEAS_FREE | SHOWTIMEAS_BUSY | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
START_DATE | ||||||||||||||||
2015-03-01 | 01:05 PM | 01:05 PM | Spartans at Phillies | Bright House Field | NaN | 03/01/15 | 03/01/15 | 04:05 PM | 04:05 PM | False | True | 03/01/15 | 12:05 PM | 12:05 PM | FREE | BUSY |
2015-03-03 | 01:05 PM | 01:05 PM | Yankees at Phillies | Bright House Field | Local TV: MLBN (delay) -- CSN ----- Local Radi... | 03/03/15 | 03/03/15 | 04:05 PM | 04:05 PM | False | True | 03/03/15 | 12:05 PM | 12:05 PM | FREE | BUSY |
2015-03-04 | 01:05 PM | 01:05 PM | Phillies at Yankees | George M. Steinbrenner Field | Local TV: MLBN ----- Local Radio: MLB.com | 03/04/15 | 03/04/15 | 04:05 PM | 04:05 PM | False | True | 03/04/15 | 12:05 PM | 12:05 PM | FREE | BUSY |
2015-03-05 | 01:05 PM | 01:05 PM | Phillies at Astros | Osceola County Stadium | Local Radio: MLB.com | 03/05/15 | 03/05/15 | 04:05 PM | 04:05 PM | False | True | 03/05/15 | 12:05 PM | 12:05 PM | FREE | BUSY |
2015-03-06 | 01:05 PM | 01:05 PM | Yankees at Phillies | Bright House Field | Local TV: TCN -- MLBN | 03/06/15 | 03/06/15 | 04:05 PM | 04:05 PM | False | True | 03/06/15 | 12:05 PM | 12:05 PM | FREE | BUSY |
5 rows × 16 columns
The DESCRIPTION
column contains the broadcast information. Less interesting columns can be removed.
schedule.drop(["REMINDER_OFF",
"REMINDER_ON",
"START_TIME_ET",
"END_DATE",
"END_DATE_ET",
"END_TIME",
"END_TIME_ET",
"REMINDER_TIME",
"REMINDER_TIME_ET",
"SHOWTIMEAS_FREE",
"SHOWTIMEAS_BUSY",
"REMINDER_DATE"], axis=1, inplace=True)
schedule.head()
START_TIME | SUBJECT | LOCATION | DESCRIPTION | |
---|---|---|---|---|
START_DATE | ||||
2015-03-01 | 01:05 PM | Spartans at Phillies | Bright House Field | NaN |
2015-03-03 | 01:05 PM | Yankees at Phillies | Bright House Field | Local TV: MLBN (delay) -- CSN ----- Local Radi... |
2015-03-04 | 01:05 PM | Phillies at Yankees | George M. Steinbrenner Field | Local TV: MLBN ----- Local Radio: MLB.com |
2015-03-05 | 01:05 PM | Phillies at Astros | Osceola County Stadium | Local Radio: MLB.com |
2015-03-06 | 01:05 PM | Yankees at Phillies | Bright House Field | Local TV: TCN -- MLBN |
5 rows × 4 columns
The DESCRIPTION
column is nice because it mentions the stations that games are broadcast on. Sometimes a game is broadcast on two channels at once. There is also radio broadcast information that I'm not interested in right now.
schedule.DESCRIPTION.head()
START_DATE 2015-03-01 NaN 2015-03-03 Local TV: MLBN (delay) -- CSN ----- Local Radi... 2015-03-04 Local TV: MLBN ----- Local Radio: MLB.com 2015-03-05 Local Radio: MLB.com 2015-03-06 Local TV: TCN -- MLBN Name: DESCRIPTION, dtype: object
DESCRIPTION
¶Thankfully, the DESCRIPTION
column data is parseable. Getting a list of television broadcast stations for each game is not too difficult.
description = schedule.DESCRIPTION[1]
print description
Local TV: MLBN (delay) -- CSN ----- Local Radio: 94 WIP -- 1210 WPHT
def tv_stations_from_description(description):
"""Return a list of television stations embedded in the given description."""
return [station.strip() for station in description.split(":")[1].split("-----")[0].split("--")] if str(description) != "nan" else []
result = tv_stations_from_description(description)
print result
assert(len(result) == 2)
['MLBN (delay)', 'CSN']
Picking a game broadcast on a single channel to test the parsing function.
description = schedule.DESCRIPTION[1]
print description
result = tv_stations_from_description(description)
print result
assert(len(result) == 2)
Local TV: MLBN (delay) -- CSN ----- Local Radio: 94 WIP -- 1210 WPHT ['MLBN (delay)', 'CSN']
Applying this function to the DataFrame yields a Series
of all television stations on which the Phillies are broadcast this season.
stations_series = schedule.DESCRIPTION.apply(
lambda description: [station.strip() for station in description.split(":")[1].split("-----")[0].split("--")] if str(description) != "nan" else [])
stations_series
START_DATE 2015-03-01 [] 2015-03-03 [MLBN (delay), CSN] 2015-03-04 [MLBN] 2015-03-05 [MLB.com] 2015-03-06 [TCN, MLBN] 2015-03-07 [TCN, MLBN] 2015-03-08 [94 WIP, 1210 WPHT] 2015-03-09 [MLB.com] 2015-03-10 [TCN, MLBN] 2015-03-11 [TCN] 2015-03-12 [MLB.com] 2015-03-13 [MLBN (delay), TCN] 2015-03-14 [94 WIP, 1210 WPHT] 2015-03-15 [] 2015-03-15 [CSN] ... 2015-09-18 [CSN] 2015-09-19 [CSN] 2015-09-20 [CSN] 2015-09-22 [CSN] 2015-09-23 [CSN] 2015-09-24 [CSN] 2015-09-25 [CSN] 2015-09-26 [CSN] 2015-09-27 [CSN] 2015-09-29 [CSN] 2015-09-30 [CSN] 2015-10-01 [CSN] 2015-10-02 [CSN] 2015-10-03 [CSN] 2015-10-04 [CSN] Name: DESCRIPTION, Length: 197
Creating a set
of stations from that Series
will yield a concise list of distinct television broadcast stations.
set([station for stations in stations_series.values for station in stations])
{'1210 WPHT', '94 WIP', 'CSN', 'ESPN', 'MLB.com', 'MLBN', 'MLBN (delay)', 'NBC 10', 'SBP 1480', 'TCN'}
The 197 Phillies games are broadcast on 10 television channels. Unfortunately only 2 of those 7 stations are available without a cable television subscription. This means that I can only watch games on NBC and FOX.
Filtering the DESCRIPTION
column to national television broadcast stations yields only the games which I can watch over the air with my HD antenna.
schedule[(schedule.DESCRIPTION.str.contains("NBC 10")) |
(schedule.DESCRIPTION.str.contains("FOX"))]
START_TIME | SUBJECT | LOCATION | DESCRIPTION | |
---|---|---|---|---|
START_DATE | ||||
2015-04-13 | 01:10 PM | Phillies at Mets | Citi Field | Local TV: NBC 10 ----- Local Radio: SBP 1480 |
2015-05-16 | 07:05 PM | D-backs at Phillies | Citizens Bank Park | Local TV: NBC 10 ----- Local Radio: SBP 1480 |
2015-05-22 | 07:05 PM | Phillies at Nationals | Nationals Park | Local TV: NBC 10 ----- Local Radio: SBP 1480 |
2015-05-29 | 07:05 PM | Rockies at Phillies | Citizens Bank Park | Local TV: NBC 10 ----- Local Radio: SBP 1480 |
2015-06-05 | 07:05 PM | Giants at Phillies | Citizens Bank Park | Local TV: NBC 10 ----- Local Radio: SBP 1480 |
2015-06-19 | 07:05 PM | Cardinals at Phillies | Citizens Bank Park | Local TV: NBC 10 ----- Local Radio: SBP 1480 |
2015-06-24 | 01:05 PM | Phillies at Yankees | Yankee Stadium | Local TV: NBC 10 ----- Local Radio: SBP 1480 |
2015-07-18 | 07:05 PM | Marlins at Phillies | Citizens Bank Park | Local TV: NBC 10 ----- Local Radio: SBP 1480 |
2015-08-01 | 07:05 PM | Braves at Phillies | Citizens Bank Park | Local TV: NBC 10 ----- Local Radio: SBP 1480 |
2015-08-15 | 07:10 PM | Phillies at Brewers | Miller Park | Local TV: NBC 10 ----- Local Radio: SBP 1480 |
2015-08-29 | 07:05 PM | Padres at Phillies | Citizens Bank Park | Local TV: NBC 10 ----- Local Radio: SBP 1480 |
11 rows × 4 columns
This means that I have the possibility to watch 13 out of 197 Phillies games this season which is roughly 7%.