Slides | YouTube-Data-API Package | NB-Viewer
To import packages for this demo:
pip install -r requirements.txt
#These are the packages we will use in the demonstration
import os
import json
import pandas as pd
import datetime
from youtube_api import YoutubeDataApi
key = os.environ.get('YT_KEY')
yt = YoutubeDataApi(key)
#Formats json items to print for readability
def dump(doc):
def default_handler(o):
if isinstance(o, datetime.datetime):
return o.isoformat()
print(json.dumps(doc, sort_keys=True, indent=4, default=default_handler))
You may start with a channel name like 'LastWeekTonight' or 'TheNewYorkTimes'. Any data collected about channels must be collected using the channel ID, not the channel name.
The channel ID can be pulled by running yt.get_channel_id_from_user(CHANNEL_ID)
# get the channel ID for the TV show Last Week Tonight
channel_id = yt.get_channel_id_from_user('LastWeekTonight')
channel_id
From this chnanel ID, we can get a wide variety of data including (but not limited to):
# get channel metadata for the channel ID we pulled for Last Week Tonight
channel_meta = yt.get_channel_metadata(channel_id)
dump(channel_meta)
In addition to channel metadata, we can pull relational data for channels like the people they subscribe to or feature
# Get the channels that Last Week Tonight subscribes to
subscriptions = yt.get_subscriptions(channel_id)
dump(subscriptions[:2])
YouTube is consructed such that the video uploads by a user are stored in a playlist based on the user's channel ID. We can use this detail to generate the Upload Playlist ID for a given user and collect all videos posted by them.
# Install some utility functions from the YouTube package
from youtube_api import youtube_api_utils as utils
# get the playlist ID for Last Week Tonight's uploads
playlist_id = utils.get_upload_playlist_id(channel_id)
playlist_id
playlist_id
¶The function yt.get_videos_from_playlist_id(PLAYLIST_ID)
returns a list of videos from the playlist ID, in this case, the uploads. This returns a list of videos, their channels, and the publishing date.
# get the videos posted by Last Week Tonight and display the last 5
videos = yt.get_videos_from_playlist_id(playlist_id)
df = pd.DataFrame(videos[:5])
df
From the list of videos from the uploads playlist, we can pull more detailed information about videos using the video IDs using the function yt.get_video_metadata(VIDEO_ID)
. This function can handle a single video ID or a list of video IDs.
For the video IDs passed, the package gets:
# Get the video metadata for Last Week Tonight's videos
video_meta = yt.get_video_metadata(df.video_id.tolist())
dump(video_meta[:1])
This is the functional equivalent of going onto YouTube and typing a phrase into the search bar and seeing the results
# pull search results for 'John Oliver'
searches = yt.search('john oliver', max_results=5)
dump(searches[:2])
These are the videos listed to the side or below YouTube videos while they play. They are YouTube's best guess as to what you may like based on the video you are currently watching. You can get the recommended
# get recommended videos for Last Week Tonight's video on the WWE
recommendations = yt.get_recommended_videos('m8UQ4O7UiDs')
dump(recommendations[:2])