Who came, with what level, and how many stayed to the end.
import pandas as pd
import matplotlib.pyplot as plt
Set some default parameters for the matplotlib figures. I want to use the seaborn style sheet with larger text, all figures to have the same size, and have a white (instead of transparent) background. Also use a higher DPI when saving figures for better quality images.
plt.style.use(['seaborn', 'seaborn-talk'])
plt.rc('figure', facecolor='#ffffff')
plt.rc('figure', figsize=(10, 5))
plt.rc('figure', dpi=150)
Data from the Google forms sign up. I exported the data (without names and emails) to the file demographics.csv
in this repository.
I'll use pandas
library to load and manipulate the data.
data = pd.read_csv('demographics.csv')
data
os | experience | languages | position | field | |
---|---|---|---|---|---|
0 | Mac (OSX) | Did a little bit of shell scripting | Python, Matlab/Octave | Graduate student | Geophysics |
1 | Mac (OSX) | Took a basic programming course, Self-taught w... | C/C++, Fortran, Matlab/Octave | Faculty | Plate Tectonics |
2 | Mac (OSX) | Took a basic programming course, Took intermed... | C/C++, Matlab/Octave, JAVA, VRML, MEL, UnrealS... | Graduate student | GG (Hydrology) |
3 | Windows | Took a basic programming course | Matlab/Octave | Staff | Geology and Geophysics |
4 | Mac (OSX) | Took a basic programming course | C/C++, Matlab/Octave | Graduate student | Geochemistry |
5 | Mac (OSX) | Took a basic programming course | Matlab/Octave, I use matlab from time to time ... | Graduate student | Coastal Geology & sea-level rise |
6 | Windows | Self-taught with moderate/advanced experience | Fortran, Matlab/Octave, R | Postdoc | Watershed hydrology |
7 | Windows | Did a little bit of shell scripting | Matlab/Octave | Graduate student | Mineral Physics |
8 | Mac (OSX) | Took a basic programming course, Took intermed... | Python, C/C++, Matlab/Octave | Faculty | Mineral Physics |
9 | Mac (OSX) | Took intermediate/advanced programming course | Matlab/Octave, R | Graduate student | geophysics |
10 | Windows | Self-taught with little experience | Python | Staff | Geomorphology |
11 | Mac (OSX) | Never programmed before | None (I told I've never programmed before) | Graduate student | Hydrology modeling |
12 | Mac (OSX) | Took intermediate/advanced programming course,... | C/C++, Matlab/Octave, R, Java | Faculty | Coastal Geology |
13 | Mac (OSX) | Never programmed before, Self-taught with litt... | Javascript | Undergraduate student | Geophysics |
14 | Windows | Took a basic programming course, Self-taught w... | Matlab/Octave, IDL, google engine | Graduate student | volcanology |
15 | Windows | Took a basic programming course, Self-taught w... | Matlab/Octave | Faculty | Volcanology |
16 | Mac (OSX) | Took a basic programming course | Python, Matlab/Octave | Graduate student | Coastal Hydrology |
17 | Windows | Never programmed before | None (I told I've never programmed before) | Graduate student | Hydrology |
18 | Mac (OSX) | Took a basic programming course, Self-taught w... | Python | Graduate student | Voclanology/Remote Sensing |
19 | Mac (OSX) | Took a basic programming course | javascript, html | Undergraduate student | Geology and Geophysics |
20 | Mac (OSX) | Self-taught with moderate/advanced experience | Matlab/Octave, R | Graduate student | Planetary sciences |
21 | Mac (OSX) | Never programmed before, Did a little bit of s... | Python | Undergraduate student | Conservation |
22 | Windows | Took a basic programming course, Self-taught w... | C/C++, R | Graduate student | Environmental science |
23 | Windows | Self-taught with little experience | Python, Matlab/Octave | Graduate student | Electrical Engineering |
24 | Windows | Took a basic programming course | Fortran, Matlab/Octave, R | Graduate student | natural resources and environmental management |
25 | GNU/Linux | Self-taught with moderate/advanced experience | Python, Matlab/Octave, R, Bash Shell(?) | Marine Geophysical Data Tech | Marine Geophysics |
26 | Windows | Self-taught with moderate/advanced experience | Python, Matlab/Octave, R | Graduate student | ecohydrology (in the Geography Department) |
27 | Mac (OSX) | Never programmed before | None (I told I've never programmed before) | Interested citizen | Film production and education |
28 | Mac (OSX) | Self-taught with little experience | Python | Graduate student | Geography |
29 | GNU/Linux | Took intermediate/advanced programming course | Python, C/C++, Fortran | Na Kupuna student | civil engineering |
30 | Mac (OSX) | Did a little bit of shell scripting, Took a ba... | Fortran, BASIC (yes, really) | Faculty | Geo Oceanography -- a lot of seafloor mapping |
31 | Windows | Never programmed before | None (I told I've never programmed before) | Graduate student | Geochemistry |
32 | Mac (OSX) | Took a basic programming course | Matlab/Octave | Graduate student | Mineral physics |
33 | Windows | Did a little bit of shell scripting | R, bit of awk/sed for GMT tasks | gov't researcher | marine geology |
34 | Mac (OSX) | Never programmed before | None (I told I've never programmed before) | Undergraduate student | Geology |
First lets see the number of people who signed up:
len(data)
35
From the shared notes, I also know how many people showed up after each day.
attendance = pd.Series([len(data), 31, 21, 21], index=['Signed up', 'Day 1', 'Day 2', 'Day 3'])
plt.figure()
ax = plt.subplot(111)
attendance.plot.barh(ax=ax)
ax.invert_yaxis()
ax.set_title('Attendance for the workshop')
ax.set_xlabel('Participants')
plt.tight_layout()
plt.savefig('figures/attendance.jpg')
plt.show()
We had a bit of a drop off after the first day. The people who came the seconda day also came on the third. Not too bad.
Let's first look at the programming experience declared by the participants. I'll keep only the highest level declared and count how many people declared that level.
# Keep only the last (highest) element in the declared experience
data['highest_experience'] = pd.Series([e.split(', ')[-1] for e in data['experience']])
plt.figure()
ax = plt.subplot(111)
data.highest_experience.value_counts().plot.barh(ax=ax)
ax.invert_yaxis()
ax.set_title('Highest experience declared')
ax.set_xlabel('Participants')
plt.tight_layout()
plt.savefig('figures/education.jpg')
plt.show()
The workshop participants have very mixed experience levels, from people who have never programmed before to self-taught programmers. That was expected and it's also not surprising that very few people indicated that they reached their level by taking an advanced course.
Let's look at the programming language experience. I'll focus on a few main languages and group the others into "Others".
main_languages = ['Python', 'Matlab/Octave', 'C/C++', 'Fortran', 'R', "None (I told I've never programmed before)"]
Make a function to mark any other language as "Other" and simplify the "None" entry.
def only_main(lang):
if lang not in main_languages:
lang = 'Other'
if lang == "None (I told I've never programmed before)":
lang = 'None'
return lang
languages = pd.Series([only_main(lang)
for entry in data['languages']
for lang in entry.split(', ')]).value_counts()
languages
Matlab/Octave 20 Other 15 Python 11 R 9 C/C++ 7 Fortran 5 None 5 dtype: int64
plt.figure()
ax = plt.subplot(111)
languages.plot.barh(ax=ax)
ax.invert_yaxis()
ax.set_title('Programming language experience')
ax.set_xlabel('Participants')
ax.set_xticks(range(0, 22, 3))
plt.tight_layout()
plt.savefig('figures/programming-languages.jpg')
plt.show()
The dominant language in the department is clearly Matlab, though a few people also know R. Fortran is very low here apparently.
Finally, we can look at what the participants do.
plt.figure()
ax = plt.subplot(111)
data['position'].value_counts().plot.barh(ax=ax)
ax.invert_yaxis()
ax.set_title('Position')
ax.set_xlabel('Participants')
ax.set_xticks(range(0, 22, 3))
plt.tight_layout()
plt.savefig('figures/position.jpg')
plt.show()
Not surprisingly, a majority are grad students. I was very pleased to have someone from the Na Kupuna program and a not insignificant number of faculty.