-----20 Movies Script Visualizations will be done------This could be done for the 1000+ movie script I segmented--- I just randomly chose 20 movie scripts from the segmented movies....
Web scraping of the movie scripts (Over 1000+ movies were scraped from imsdb website)
Movies segmentation into Scenes --> Scene Location, Scene Action/Description, Scene Dialogues, Scene Characters (All the movies scraped were segmented except those that do not follow the "Screenplay format i.e. INT / EXT)"
Character extraction and appearances plot ---> Here, characters were plotted based on how many times they appeared and spoke in each scene and across the movie.
Character Interaction Mapping --> We mapped out the connection between all the characters in the movie and also the interaction between the Top 10 characters in the movie.
Here, we looked at the Most mentioned character based on the Scene dialogues and also the characters each character mention the most in their conversation.
Similar to Number 5., Here looked at who a specific character talks with the most in the Movie.
Emotional and Sentiment Analysis across the whole movie and for each individual character, However for this project we limited it to only the Top 10 characters. ---> This gives us the character's emotion when he/she appears in the movie.
Additional Scene Informations --> Exact Scene Locations, Scenes with dialogs and no dialogs, Scenes that occurred during the Day or in the Night, Scenes location based on Outdoor or Indoor appearances.
Gender Distribution in the movie
*(python Code) Modules for this project: imsbd_moviescript_scraper_AND_Scene_Segmentation.py, dialogue_appearance.py, characters_extract.py, xter_interaction.py, characters_mt.py, emotions.py, movie_info.py, gend_distribution_plot.py*
Tools: Python libraries
#Import all the necessary python modules needed for this analysis
from characters_extract import extract_characters
from dialogue_appearance import scene_dialogues
from xter_interaction import interaction
from emotions import emotions_sentiments
from characters_mt import character_mentions
from gend_distribution_plot import gender
from movie_info import scene_info_plots
import glob
import random
import secrets
import re
import cufflinks as cf
import networkx as net
import itertools
# plotly
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import pandas as pd
import chart_studio.plotly as py
import plotly.graph_objs as go
from plotly import tools
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)
import plotly.express as px
films = []
for f in glob.glob('Films/*'):
film_name = re.sub(r'.pkl|Films\\', '', f)
films.append(film_name)
#Number of Films we segmented into scenes, scene_actions, characters, and characters dialogue
print('Number of films available for Analysis: ', len(films), ' Movies')
Number of films available for Analysis: 1037 Movies
#Random 10 films from the 1000 films scraped from the internet
films_list = random.sample(films, 10)
#Randomly select film to analyze
film = secrets.choice(films_list)
print(film)
Social-Network,-The
##load the scenes, dialogues, characters into dataframe
df_film = pd.read_pickle('Films/' + film + '.pkl')
df_film_dialogue = pd.read_pickle('Dialogues/' + film + '.pkl')
df_film_characters = pd.read_pickle('Characters/' + film + '.pkl')
#Randomly generate 10 scenes from the movie script
df_film.sample(10)
Scene_Names | Scene_action | Scene_Characters | Scene_Dialogue | Contents | |
---|---|---|---|---|---|
115 | INT. CLASSROOM NIGHT | where'60 or so STUDENTS are in a semicircle, f... | [EDUARDO, EDUARDO, MARK, EDUARDO, MARK, EDUARD... | [Mark? MARY, Yeah., What's goin' on?, They hav... | where'60 or so STUDENTS are in a semicircle, ... |
88 | INT. MARK'S DORM ROOM NIGHT | [EDUARDO] | [the lack of hardware we had to deal with, the... | EDUARDO the lack of hardware we had to deal w... | |
71 | INT. EDUARDO'S DORM ROOM NIGHT | EDUARDO's studying at his desk but this time w... | None | None | EDUARDO's studying at his desk but this time ... |
82 | INT. SECOND DEPOSITION ROOM DAY | [GAGE, MARK, GAGE, MARK, GAGE, MARK, GAGE, SY,... | [During the time when you say you had this ide... | GAGE During the time when you say you had thi... | |
109 | INT. FIRST DEPOSITION ROOM DAY | [EDUARDO, MARK, EDUARDO, SY, ARTICLE, MARK, DU... | [I had gotten into the Phoenix. I had been acc... | EDUARDO I had gotten into the Phoenix. I had ... | |
59 | INT. SECOND DEPOSITION ROOM DAY | EDUARDO's in different.clothes and being quest... | [GAGE, EDUARDO, GAGE, EDUARDO, GAGE, EDUARDO, ... | [We recognize that you are a plaintiff in one ... | EDUARDO's in different.clothes and being ques... |
16 | INT. ANOTHER DORM ROOM NIGHT | THREE MALE STUDENTS AT A COMPUTER | [ALL] | [On the right. ] | THREE MALE STUDENTS AT A COMPUTER ALL On the ... |
55 | INT. FIRST DEPOSITION ROOM DAY | [GRETCHEN, EDUARDO, GRETCHEN, EDUARDO] | [And you said?, I said Let's do it., Okay. Did... | GRETCHEN And you said? EDUARDO I said Let's d... | |
147 | INT. BEDROOM CONTINUOUS | leaving the door open. | [SEAN, SEAN, SEAN, POLICEMAN, SEAN, SEAN, POLI... | [It's the cops. And they all spring into actio... | leaving the door open. SEAN It's the cops. An... |
140 | INT. CONFERENCE ROOM CONTINUOUS | [EDUARDO, EDUARDO, LAWYER, EDUARDO, LAWYER] | [At first I thought he was joking, giving me m... | EDUARDO At first I thought he was joking, gi... |
#check how many scenes the movie script has
df_film.shape
(151, 5)
#Randomly select characters and their corresponding dialogues
df_film_dialogue.sample(10)
characters | Character_dialogue | |
---|---|---|
1456 | MARK | Okay. MARK heads into the office building. SEA... |
258 | EDUARDO | It's alright. |
337 | GRETCHEN | What's that? |
1173 | MARK | Hang on. MARK's scratching something out on a ... |
591 | GAGE | It's not important that you be sure why I am a... |
877 | AMY | YEAH |
410 | WE | TNT. MARK'S DORM ROOM NIGHT Every available w... |
1454 | MARK | You sure aboutthis? |
164 | ALL | On the left. |
33 | MARK | Hm. |
ext = extract_characters(df_film, df_film_dialogue, df_film_characters, film)
movie_characters = ext.extract_character_plot()
dia = scene_dialogues(df_film, film)
df_xter_app = dia.character_appearances(movie_characters)
#Movie characters....some names may not be characters....
print('Movie Characters: \n', movie_characters)
Movie Characters: ['MARK', 'EDUARDO', 'SEAN', 'CAMERON', 'DIVYA', 'ERICA', 'TYLER', 'GRETCHEN', 'GAGE', 'AMY', 'JENNY', 'SUMMERS', 'MARYLIN', 'SY', 'DUSTIN', 'GIRL', 'LAWYER', 'ASHLEIGH', 'POLICEMAN', 'COX', 'MARX', 'ALL', 'MARY', 'ADMINISTRATOR', 'SENIOR', 'MR. WINKLEVOSS', 'CHRIS', 'SPEAKER', 'PRINCE ALBERT', 'PAUL YOUNG', 'GUY', 'STUDENTS', 'PROFESSOR', 'BOB', 'SECRETARY', 'KENWRIGHT', 'MART', 'STUART', 'WE', 'COMMON PEOPLE', 'CAN', 'GRAD STUDENT', 'MARE', 'AND WE', 'SOPHOMORE', 'ALICE', 'ANNE', 'WAITRESS', 'CANDIDATE', 'INTERN', 'ERIC', 'SHARON', 'WOMAN', 'THIEL']
df_1st_count, df_1st_dialogue = dia.xter_count_perscene(movie_characters[0])
dia.scene_dialogue_plot(df_1st_count)
df_2nd_count, df_2nd_dialogue = dia.xter_count_perscene(movie_characters[1])
dia.scene_dialogue_plot(df_2nd_count)
df_third_count, df_third_dialogue = dia.xter_count_perscene(movie_characters[2])
dia.scene_dialogue_plot(df_third_count)
df_2_count, df_2_dialogue = dia.xter_count_perscene(movie_characters[:2])
dia.scene_dialogue_plot(df_2_count)
df_3_count, df_3_dialogue = dia.xter_count_perscene(movie_characters[1:3])
dia.scene_dialogue_plot(df_3_count)
interact = interaction(df_film, film)
graph_list = interact.character_interaction()
# G = net.MultiGraph()
# for scene in graph_list:
# nodes = list(itertools.combinations(scene,2))
# for pair in nodes:
# G.add_edges_from([pair])
# page_ranked_nodes = net.pagerank_numpy(G,0.95)
# net.enumerate_all_cliques(G)
# between_nodes = net.betweenness_centrality(G, normalized=True, endpoints=True)
interact.character_interaction_plot(G, page_ranked_nodes)
#Remember to Re-run the above multigraph code aafter running this code line
graph_list = interact.top10_character_interaction(movie_characters[:10])
interact.character_interaction_plot(G, page_ranked_nodes)
xtr = character_mentions(df_film, movie_characters, film)
xter_mentions = xtr.most_mentioned()
xtr.top_xters_mentions(xter_mentions, 5)
print(movie_characters)
['MARK', 'EDUARDO', 'SEAN', 'CAMERON', 'DIVYA', 'ERICA', 'TYLER', 'GRETCHEN', 'GAGE', 'AMY', 'JENNY', 'SUMMERS', 'MARYLIN', 'SY', 'DUSTIN', 'GIRL', 'LAWYER', 'ASHLEIGH', 'POLICEMAN', 'COX', 'MARX', 'ALL', 'MARY', 'ADMINISTRATOR', 'SENIOR', 'MR. WINKLEVOSS', 'CHRIS', 'SPEAKER', 'PRINCE ALBERT', 'PAUL YOUNG', 'GUY', 'STUDENTS', 'PROFESSOR', 'BOB', 'SECRETARY', 'KENWRIGHT', 'MART', 'STUART', 'WE', 'COMMON PEOPLE', 'CAN', 'GRAD STUDENT', 'MARE', 'AND WE', 'SOPHOMORE', 'ALICE', 'ANNE', 'WAITRESS', 'CANDIDATE', 'INTERN', 'ERIC', 'SHARON', 'WOMAN', 'THIEL']
df_mk = xtr.talk_about_xters(df_film_dialogue, 'MARK')
df_ed = xtr.talk_about_xters(df_film_dialogue, 'EDUARDO')
df_sean = xtr.talk_about_xters(df_film_dialogue, 'SEAN')
df_mk = xtr.most_talked_with('MARK')
df_ed = xtr.most_talked_with(movie_characters[1])
etn = emotions_sentiments(df_film, film)
df_film_sentiment = etn.film_sentiment('darkslategray')
df_film_emotion = etn.film_emotional_arc()
df_top10_emotions = etn.emotional_content_plot(df_film_dialogue, movie_characters, 11)
df_mark_emotions = etn.emotional_arc_xter_plot(df_film_emotion, 'MARK')
df_ed_emotions = etn.emotional_arc_xter_plot(df_film_emotion, 'EDUARDO')
df_sean_emotions = etn.emotional_arc_xter_plot(df_film_emotion, 'SEAN')
info = scene_info_plots(df_film, film)
info.extract_scene_locations()
info.pie_plots()
gd = gender(movie_characters, film)
df_gender = gd.gender_types(px.colors.sequential.Viridis)
[nltk_data] Downloading package names to C:\Users\Adeboye [nltk_data] Adeniyi\AppData\Roaming\nltk_data... [nltk_data] Package names is already up-to-date!