"A Crisp analysis over available data from students registrations and attendees information from GoToMeeting"
The following notebook is an analysis of an online webinar organised by Sathyabama Coding Club
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline
gotom_data = pd.read_excel("Developer 101_ Data Science Uncovered Attendees.xls")
reg_data = pd.read_excel("Developer 101 _ Data Science Uncovered (Responses).xlsx")
gotom_data.head(5)
Developer 101: Data Science Uncovered Attendees | Unnamed: 1 | Unnamed: 2 | Unnamed: 3 | Unnamed: 4 | Unnamed: 5 | GoToMeeting | |
---|---|---|---|---|---|---|---|
0 | Summary | NaN | NaN | NaN | NaN | NaN | NaN |
1 | Meeting Date | Meeting Duration | Number of Attendees | Meeting ID | NaN | NaN | NaN |
2 | May 9, 2020 7:06 AM PDT | 90 minutes | 54 | 905-245-013 | NaN | NaN | NaN |
3 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | Details | NaN | NaN | NaN | NaN | NaN | NaN |
attn_data = gotom_data.iloc[6:,:5]
attn_data.columns = gotom_data.iloc[5:,:5].iloc[[0]].values.reshape(5,)
attn_data.reset_index(drop=True, inplace=True)
attn_data.head()
Name | Email Address | Join Time | Leave Time | Time in Session (minutes) | |
---|---|---|---|---|---|
0 | #ReligiousCorona | lol@gmail.com | 7:51 AM | 8:07 AM | 15 |
1 | #ReligiousCorona | lol@gmail.com | 8:07 AM | 8:33 AM | 25 |
2 | AJ | NaN | 7:30 AM | 7:30 AM | 0 |
3 | AJ | NaN | 7:30 AM | 8:33 AM | 62 |
4 | ANKITKUMAR SINGH | ankitk.as51@gmail.com | 7:30 AM | 7:31 AM | 1 |
SESSION_DURATION = int(attn_data['Time in Session (minutes)'].max())
print("Session Duration is Minutes: ", SESSION_DURATION)
Session Duration is Minutes: 90
no_of_regs = len(reg_data)
REG_COUNT = no_of_regs
print("No of Registrations : ", REG_COUNT)
No of Registrations : 186
No of registrations: 73
reg_data.Batch.value_counts()
2021 76 Professional 69 2022 36 2023 5 Name: Batch, dtype: int64
sns.countplot(x="Batch", data=reg_data)
plt.title("Barplot on Academic year participation")
Text(0.5, 1.0, 'Barplot on Academic year participation')
Students from 2022 are higher than 2021 and 2023 - 2021 > Professional >> 2022 >> 2023
sns.countplot(x="Have you ever worked with Data Science before?", data=reg_data)
plt.title("Barplot over the background of participants in Data Science")
print(reg_data['Have you ever worked with Data Science before?'].value_counts())
No 115 Yes 71 Name: Have you ever worked with Data Science before?, dtype: int64
web_mob_yes = reg_data["Have you ever worked with Data Science before?"].value_counts()[1]
web_mob_no = reg_data["Have you ever worked with Data Science before?"].value_counts()[0]
print("The percentage of people joined the webinar who has worked with Data Science bofore :",
(web_mob_yes/no_of_regs)*100)
print("The percentage of people joined the webinar who has never worked with Data Scinece before :",
(web_mob_no/no_of_regs)*100)
The percentage of people joined the webinar who has worked with Data Science bofore : 38.17204301075269 The percentage of people joined the webinar who has never worked with Data Scinece before : 61.82795698924731
sns.countplot(x="Knowledge of Python Programming language ",
data=reg_data)
<matplotlib.axes._subplots.AxesSubplot at 0x1a0f9995108>
sns.countplot(x="Do you think Math is Required for Machine Learning?",
data=reg_data)
<matplotlib.axes._subplots.AxesSubplot at 0x1a0fb994ac8>
reg_data["Where do you wish to use Data Science skills?"].value_counts().plot(kind='barh', figsize=(10,10))
<matplotlib.axes._subplots.AxesSubplot at 0x1a0fd177408>
Observation:
attn_data.head()
Name | Email Address | Join Time | Leave Time | Time in Session (minutes) | |
---|---|---|---|---|---|
0 | #ReligiousCorona | lol@gmail.com | 7:51 AM | 8:07 AM | 15 |
1 | #ReligiousCorona | lol@gmail.com | 8:07 AM | 8:33 AM | 25 |
2 | AJ | NaN | 7:30 AM | 7:30 AM | 0 |
3 | AJ | NaN | 7:30 AM | 8:33 AM | 62 |
4 | ANKITKUMAR SINGH | ankitk.as51@gmail.com | 7:30 AM | 7:31 AM | 1 |
ATTENDEES_COUNT = len(attn_data['Name'].value_counts())
ATTENDEES_COUNT
54
Number of attendees without duplicates: 54
len(attn_data.groupby(by=attn_data.Name, axis=1).sum())
65
attn_data.groupby(['Name', 'Time in Session (minutes)']).sum().iloc[:,:0].head(20)
Name | Time in Session (minutes) |
---|---|
#ReligiousCorona | 15 |
25 | |
AJ | 0 |
62 | |
ANKITKUMAR SINGH | 1 |
Abhiram | 1 |
64 | |
Abhishek's Mac Book Pro | 15 |
Aditya | 18 |
Aditya Gowrish Menti | 68 |
Akash M | 60 |
Alok Kumar | 61 |
Amit | 1 |
Anand | 50 |
Anonymous | 4 |
5 | |
BVN PRANEETH | 53 |
Bhavesh | 89 |
Chetan | 9 |
Deepansh | 0 |
# Converting the 'Time in Session (minutes)' column values to int
attn_data['Time in Session (minutes)'] = pd.to_numeric(attn_data['Time in Session (minutes)'])
type(attn_data['Time in Session (minutes)'].iloc[0])
numpy.int64
(attn_data['Time in Session (minutes)'] == attn_data['Time in Session (minutes)'].iloc[0]).all()
False
def time_agg(group_series):
if (group_series==group_series.iloc[0]).all():
return group_series.iloc[0]
else:
return group_series.sum()
attn_data.groupby('Name', as_index=False).agg(time_agg)[['Name', 'Join Time', 'Leave Time', 'Time in Session (minutes)']]
Name | Join Time | Leave Time | Time in Session (minutes) | |
---|---|---|---|---|
0 | #ReligiousCorona | 7:51 AM8:07 AM | 8:07 AM8:33 AM | 40 |
1 | AJ | 7:30 AM | 7:30 AM8:33 AM | 62 |
2 | ANKITKUMAR SINGH | 7:30 AM | 7:31 AM | 1 |
3 | Abhiram | 7:31 AM7:30 AM | 8:36 AM7:31 AM | 65 |
4 | Abhishek's Mac Book Pro | 8:01 AM | 8:17 AM | 15 |
5 | Aditya | 7:30 AM | 7:49 AM | 18 |
6 | Aditya Gowrish Menti | 7:29 AM | 8:37 AM | 68 |
7 | Akash M | 7:22 AM | 8:23 AM | 60 |
8 | Alok Kumar | 7:36 AM | 8:37 AM | 61 |
9 | Amit | 7:49 AM | 7:51 AM | 1 |
10 | Anand | 7:47 AM | 8:37 AM | 50 |
11 | Anonymous | 7:07 AM7:23 AM | 7:13 AM7:28 AM | 9 |
12 | BVN PRANEETH | 7:44 AM | 8:37 AM | 53 |
13 | Bhavesh | 7:07 AM | 8:37 AM | 89 |
14 | Chetan | 7:38 AM | 7:47 AM | 9 |
15 | Deepansh | 7:35 AM | 8:37 AM7:35 AM | 62 |
16 | Devyash Bordia | 7:27 AM | 8:11 AM | 43 |
17 | Dikshita Basu | 7:19 AM | 8:37 AM | 78 |
18 | Dinesh L | 7:30 AM | 8:32 AM | 61 |
19 | Dr Zakir Naik | 7:43 AM | 7:51 AM | 7 |
20 | Fireflies.ai Notetaker | 7:28 AM | 8:30 AM | 61 |
21 | Gaurav | 7:27 AM | 8:29 AM | 61 |
22 | Gupta, Anuj | 7:32 AM | 7:42 AM | 9 |
23 | HK | 7:45 AM | 8:14 AM | 29 |
24 | Hardik Gupta | 7:40 AM7:38 AM | 8:37 AM7:39 AM | 58 |
25 | Himanshu Tamboli | 7:10 AM7:28 AM | 7:16 AM7:50 AM | 26 |
26 | Kajjal | 7:19 AM | 8:37 AM | 78 |
27 | Kamal Sharma | 8:00 AM | 8:37 AM | 37 |
28 | Kav | 7:47 AM | 8:03 AM | 16 |
29 | Mohammed Faraz | 7:52 AM | 8:24 AM | 32 |
30 | Mostlyinsane | 7:27 AM | 8:20 AM | 53 |
31 | Mugunthan | 7:33 AM | 8:33 AM | 60 |
32 | NIKHIL | 7:39 AM7:45 AM | 7:44 AM7:53 AM | 12 |
33 | Neeraj Jayaram | 7:33 AM | 8:34 AM | 61 |
34 | Pruthvi Shetty | 8:04 AM | 8:37 AM | 32 |
35 | Ranjith | 7:26 AM | 7:34 AM | 8 |
36 | Rehan Razak | 7:15 AM7:18 AM7:31 AM | 7:17 AM7:18 AM8:37 AM | 68 |
37 | Revanth | 7:10 AM | 7:15 AM | 5 |
38 | Roshan Pandey | 7:21 AM | 7:56 AM | 34 |
39 | Sagar Parida | 7:26 AM | 8:09 AM | 42 |
40 | Sanjana Birari | 7:39 AM | 7:41 AM | 1 |
41 | Santhosh | 7:29 AM | 7:46 AM | 16 |
42 | Santosh Kumar | 7:36 AM | 8:37 AM | 61 |
43 | Sourav Kumar | 7:26 AM7:31 AM | 7:31 AM8:37 AM | 70 |
44 | Sri Harish | 7:06 AM | 8:37 AM | 90 |
45 | Suryanshu Singh | 7:16 AM | 8:22 AM | 65 |
46 | Teja Kummarikuntla | 7:06 AM | 8:37 AM | 90 |
47 | keshav | 7:40 AM | 8:37 AM | 57 |
48 | reconnecting.... | 7:30 AM | 8:37 AM | 67 |
49 | sahib pratap singh | 7:29 AM | 8:37 AM | 68 |
50 | sateesh sabbineni | 8:18 AM | 8:32 AM | 13 |
51 | sneha gupta | 7:23 AM | 8:37 AM | 74 |
52 | sourabh kumar_KOLKATA_ID_3420 | 7:32 AM | 7:43 AM | 10 |
53 | user | 7:36 AM | 8:18 AM | 41 |
atten_group_df = attn_data[['Name', 'Time in Session (minutes)', 'Email Address']].groupby('Name', as_index=False).agg(time_agg)
atten_group_df.sort_values(by=['Time in Session (minutes)'],ascending=False, inplace=True)
sns.factorplot(x="Name", y="Time in Session (minutes)",
data=atten_group_df, kind="bar",
size = 15, aspect=2,
palette = "muted")
# for value in plot:
# height = value.get_height()
# plt.text(value.get_x() + value.get_width()/2.,
# 1.002*height,'%d' % int(height), ha='center', va='bottom')
plt.xticks(rotation=45);
Individual time spent analysis of attendes
sns.factorplot(x="Name", y="Time in Session (minutes)",
data=atten_group_df[atten_group_df["Time in Session (minutes)"] >= SESSION_DURATION//2],
kind="bar",
size = 8, aspect=2,
palette = "muted")
plt.xticks(rotation=45);
atten_group_df[atten_group_df["Time in Session (minutes)"] >= SESSION_DURATION//2][['Name', 'Time in Session (minutes)']].set_index('Name')
Time in Session (minutes) | |
---|---|
Name | |
Sri Harish | 90 |
Teja Kummarikuntla | 90 |
Bhavesh | 89 |
Dikshita Basu | 78 |
Kajjal | 78 |
sneha gupta | 74 |
Sourav Kumar | 70 |
Rehan Razak | 68 |
sahib pratap singh | 68 |
Aditya Gowrish Menti | 68 |
reconnecting.... | 67 |
Suryanshu Singh | 65 |
Abhiram | 65 |
AJ | 62 |
Deepansh | 62 |
Neeraj Jayaram | 61 |
Dinesh L | 61 |
Alok Kumar | 61 |
Fireflies.ai Notetaker | 61 |
Gaurav | 61 |
Santosh Kumar | 61 |
Mugunthan | 60 |
Akash M | 60 |
Hardik Gupta | 58 |
keshav | 57 |
Mostlyinsane | 53 |
BVN PRANEETH | 53 |
Anand | 50 |
atten_group_df
Name | Time in Session (minutes) | |
---|---|---|
44 | Sri Harish | 90 |
46 | Teja Kummarikuntla | 90 |
13 | Bhavesh | 89 |
17 | Dikshita Basu | 78 |
26 | Kajjal | 78 |
51 | sneha gupta | 74 |
43 | Sourav Kumar | 70 |
36 | Rehan Razak | 68 |
49 | sahib pratap singh | 68 |
6 | Aditya Gowrish Menti | 68 |
48 | reconnecting.... | 67 |
45 | Suryanshu Singh | 65 |
3 | Abhiram | 65 |
1 | AJ | 62 |
15 | Deepansh | 62 |
33 | Neeraj Jayaram | 61 |
18 | Dinesh L | 61 |
8 | Alok Kumar | 61 |
20 | Fireflies.ai Notetaker | 61 |
21 | Gaurav | 61 |
42 | Santosh Kumar | 61 |
31 | Mugunthan | 60 |
7 | Akash M | 60 |
24 | Hardik Gupta | 58 |
47 | keshav | 57 |
30 | Mostlyinsane | 53 |
12 | BVN PRANEETH | 53 |
10 | Anand | 50 |
16 | Devyash Bordia | 43 |
39 | Sagar Parida | 42 |
53 | user | 41 |
0 | #ReligiousCorona | 40 |
27 | Kamal Sharma | 37 |
38 | Roshan Pandey | 34 |
29 | Mohammed Faraz | 32 |
34 | Pruthvi Shetty | 32 |
23 | HK | 29 |
25 | Himanshu Tamboli | 26 |
5 | Aditya | 18 |
28 | Kav | 16 |
41 | Santhosh | 16 |
4 | Abhishek's Mac Book Pro | 15 |
50 | sateesh sabbineni | 13 |
32 | NIKHIL | 12 |
52 | sourabh kumar_KOLKATA_ID_3420 | 10 |
14 | Chetan | 9 |
11 | Anonymous | 9 |
22 | Gupta, Anuj | 9 |
35 | Ranjith | 8 |
19 | Dr Zakir Naik | 7 |
37 | Revanth | 5 |
9 | Amit | 1 |
40 | Sanjana Birari | 1 |
2 | ANKITKUMAR SINGH | 1 |
len(atten_group_df[atten_group_df["Time in Session (minutes)"] >= SESSION_DURATION//2].set_index('Name')['Time in Session (minutes)'])
28
registerd_attendes_ratio = (ATTENDEES_COUNT/REG_COUNT) * 100
print("Percentage of Students registered and attended the session {}".format(registerd_attendes_ratio))
Percentage of Students registered and attended the session 29.03225806451613
Registration Data Analysis
Name of the Event: Developer 101 | Data Science Uncovered
No of registrations: 186
Registration Count with Batch filter
No of registrations With out prior knowledge of Data Scinece : 50 [61.82795698924731%]
No of registrations With prior knowledge on Data Science : 71 [38.17204301075269%]
No of registrations who are Beginners in Python Programming language : 76
No of registrations who are Intermediate in Python Programming language : 98
No of registrations who are Advanced in Python Programming language: 12
Registrations wish to use Data Science in
Webinar Attendes Data Analysis