By R. Stuart Geiger (@staeiou), Berkeley Institute for Data Science
This notebook analyzes the 2017 Open Source Survey, conducted by staff at GitHub, Inc. and other collaborators (see https://opensourcesurvey.org/2017 and https://github.com/github/open-source-survey). The survey was run in 2017, asking over 50 questions on a variety of topics. The survey's designers explain the motivation, design, and distribution of the survey:
In collaboration with researchers from academia, industry, and the community, GitHub designed a survey to gather high quality and novel data on open source software development practices and communities. We collected responses from 5,500 randomly sampled respondents sourced from over 3,800 open source repositories on GitHub.com, and over 500 responses from a non-random sample of communities that work on other platforms. The results are an open data set about the attitudes, experiences, and backgrounds of those who use, build, and maintain open source software."
The GitHub survey team presented analyses of some questions when releasing the survey, but there were many more questions asked that are relevant to researchers and community members. This report is an exploratory analysis of all questions asked in the survey, providing a basic summary of the responses to each question. This report presents and plots summary statistics -- mostly frequency counts, proportions, then a frequency or proportion bar graph -- of all questions asked in the survey. Most questions are presented individually, with panel questions grouped together as appropriate. There are no correlations, regressions, or descriptive breakouts between subgroups. Likert-style questions (e.g. Strongly agree <-> strongly disagree) have not been recoded to numerical, scalar values. There are no discussions or interpretations of results. This is left for future work.
The purpose of this notebook is to facilitate future research on this dataset by giving an overview of the kinds of questions asked in the survey, as well as serve as the basis for a PDF report, published on SocArXiv and OSF at https://osf.io/preprints/socarxiv/qps53/. The notebook is public on GitHub at https://github.com/staeiou/github-survey-analysis and others are encouraged to extend it as they see fit.
!pip install pandas seaborn
Requirement already satisfied: pandas in /home/staeiou/conda/lib/python3.5/site-packages Requirement already satisfied: seaborn in /home/staeiou/conda/lib/python3.5/site-packages Requirement already satisfied: python-dateutil>=2 in /home/staeiou/conda/lib/python3.5/site-packages (from pandas) Requirement already satisfied: pytz>=2011k in /home/staeiou/conda/lib/python3.5/site-packages (from pandas) Requirement already satisfied: numpy>=1.7.0 in /home/staeiou/conda/lib/python3.5/site-packages (from pandas) Requirement already satisfied: six>=1.5 in /home/staeiou/conda/lib/python3.5/site-packages (from python-dateutil>=2->pandas)
import pandas as pd
import matplotlib, matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
%matplotlib inline
pd.options.display.float_format = '{:.2f}%'.format # add % to all floats, all floats here are percentages
## For making pretty tables when nbconverting to latex
pd.set_option('display.notebook_repr_html', True)
def _repr_latex_(self):
return "\centering{%s}" % self.to_latex()
pd.DataFrame._repr_latex_ = _repr_latex_ # monkey patch pandas DataFrame
!unzip -o data_for_public_release.zip
Archive: data_for_public_release.zip creating: data_for_public_release/ inflating: data_for_public_release/negative_incidents.csv inflating: __MACOSX/data_for_public_release/._negative_incidents.csv inflating: data_for_public_release/notes.txt inflating: data_for_public_release/questionnaire.txt inflating: __MACOSX/data_for_public_release/._questionnaire.txt inflating: data_for_public_release/README.txt inflating: __MACOSX/data_for_public_release/._README.txt inflating: data_for_public_release/survey_data.csv inflating: __MACOSX/data_for_public_release/._survey_data.csv
!ls data_for_public_release/
negative_incidents.csv questionnaire.txt survey_data.csv notes.txt README.txt
pd.options.display.max_rows = 500
survey_df = pd.read_csv("data_for_public_release/survey_data.csv")
print("survey_data.csv length:", len(survey_df))
survey_data.csv length: 6029
survey_complete_df = survey_df.query("STATUS == 'Complete'")
print("survey_data.csv completed responses:", len(survey_complete_df))
survey_data.csv completed responses: 3746
survey_complete_df[0:3].transpose()
3 | 4 | 6 | |
---|---|---|---|
RESPONSE.ID | 48 | 49 | 51 |
DATE.SUBMITTED | 3/21/17 15:42 | 3/21/17 15:38 | 3/21/17 15:41 |
STATUS | Complete | Complete | Complete |
PARTICIPATION.TYPE.FOLLOW | 1 | 1 | 1 |
PARTICIPATION.TYPE.USE.APPLICATIONS | 1 | 1 | 1 |
PARTICIPATION.TYPE.USE.DEPENDENCIES | 1 | 1 | 1 |
PARTICIPATION.TYPE.CONTRIBUTE | 1 | 1 | 0 |
PARTICIPATION.TYPE.OTHER | 0 | 0 | 0 |
CONTRIBUTOR.TYPE.CONTRIBUTE.CODE | Frequently | Occasionally | NaN |
CONTRIBUTOR.TYPE.CONTRIBUTE.DOCS | Rarely | Rarely | NaN |
CONTRIBUTOR.TYPE.PROJECT.MAINTENANCE | Frequently | Rarely | NaN |
CONTRIBUTOR.TYPE.FILE.BUGS | Frequently | Frequently | NaN |
CONTRIBUTOR.TYPE.FEATURE.REQUESTS | Frequently | Frequently | NaN |
CONTRIBUTOR.TYPE.COMMUNITY.ADMIN | Never | Occasionally | NaN |
EMPLOYMENT.STATUS | Employed full time | Full time student | Employed full time |
PROFESSIONAL.SOFTWARE | Frequently | NaN | Frequently |
FUTURE.CONTRIBUTION.INTEREST | Very interested | Very interested | Very interested |
FUTURE.CONTRIBUTION.LIKELIHOOD | Very likely | Very likely | Somewhat unlikely |
OSS.USER.PRIORITIES.LICENSE | Very important to have | Very important to have | Very important to have |
OSS.USER.PRIORITIES.CODE.OF.CONDUCT | Somewhat important not to have | Somewhat important to have | Not important either way |
OSS.USER.PRIORITIES.CONTRIBUTING.GUIDE | Somewhat important to have | Very important to have | Somewhat important to have |
OSS.USER.PRIORITIES.CLA | Not important either way | Very important to have | Don't know what this is |
OSS.USER.PRIORITIES.ACTIVE.DEVELOPMENT | Somewhat important to have | Very important to have | Very important to have |
OSS.USER.PRIORITIES.RESPONSIVE.MAINTAINERS | Somewhat important to have | Very important to have | Very important to have |
OSS.USER.PRIORITIES.WELCOMING.COMMUNITY | Very important to have | Very important to have | Somewhat important to have |
OSS.USER.PRIORITIES.WIDESPREAD.USE | Somewhat important to have | Not important either way | Somewhat important to have |
OSS.CONTRIBUTOR.PRIORITIES.LICENSE | Not important either way | NaN | NaN |
OSS.CONTRIBUTOR.PRIORITIES.CODE.OF.CONDUCT | Somewhat important not to have | NaN | NaN |
OSS.CONTRIBUTOR.PRIORITIES.CONTRIBUTING.GUIDE | Not important either way | NaN | NaN |
OSS.CONTRIBUTOR.PRIORITIES.CLA | Not important either way | NaN | NaN |
OSS.CONTRIBUTOR.PRIORITIES.ACTIVE.DEVELOPMENT | Somewhat important to have | NaN | NaN |
OSS.CONTRIBUTOR.PRIORITIES.RESPONSIVE.MAINTAINERS | Somewhat important to have | NaN | NaN |
OSS.CONTRIBUTOR.PRIORITIES.WELCOMING.COMMUNITY | Somewhat important to have | NaN | NaN |
OSS.CONTRIBUTOR.PRIORITIES.WIDESPREAD.USE | Somewhat important to have | NaN | NaN |
SEEK.OPEN.SOURCE | Sometimes | Always | Always |
OSS.UX | Generally easier to use | About the same | Generally easier to use |
OSS.SECURITY | Generally more secure | Generally more secure | About the same |
OSS.STABILITY | About the same | Generally less stable | About the same |
INTERNAL.EFFICACY | Strongly agree | Strongly agree | Strongly agree |
EXTERNAL.EFFICACY | Strongly agree | Strongly agree | Neither agree nor disagree |
OSS.IDENTIFICATION | Neither agree nor disagree | Strongly agree | Neither agree nor disagree |
USER.VALUES.STABILITY | Moderately important | Extremely important | Extremely important |
USER.VALUES.INNOVATION | Not at all important | Very important | Moderately important |
USER.VALUES.REPLICABILITY | Very important | Very important | Moderately important |
USER.VALUES.COMPATIBILITY | Very important | Very important | Extremely important |
USER.VALUES.SECURITY | Very important | Very important | Extremely important |
USER.VALUES.COST | Very important | Not at all important | Very important |
USER.VALUES.TRANSPARENCY | Very important | Extremely important | Extremely important |
USER.VALUES.USER.EXPERIENCE | Extremely important | Moderately important | Very important |
USER.VALUES.CUSTOMIZABILITY | Extremely important | Very important | Extremely important |
USER.VALUES.SUPPORT | Slightly important | Moderately important | Not at all important |
USER.VALUES.TRUSTED.PRODUCER | Very important | Slightly important | Moderately important |
TRANSPARENCY.PRIVACY.BELIEFS | People should be able to contribute code witho... | People should be able to contribute code witho... | People should be able to contribute code witho... |
INFO.AVAILABILITY | A lot of information about me | A lot of information about me | A little information about me |
INFO.JOB | Yes | No | No |
TRANSPARENCY.PRIVACY.PRACTICES.GENERAL | I include my real name. | I include my real name. | I don't publish this kind of content online. |
TRANSPARENCY.PRIVACY.PRACTICES.OSS | I include my real name. | I include my real name. | NaN |
RECEIVED.HELP | Yes | Yes | Yes |
FIND.HELPER | Other - Please describe | I asked for help in a public forum (e.g. in a ... | I asked a specific person for help. |
HELPER.PRIOR.RELATIONSHIP | We knew each other well. | Total strangers, I didn't know of them previou... | We knew each other well. |
RECEIVED.HELP.TYPE | Writing code or otherwise implementing ideas. | Installing or using an application. | Installing or using an application. |
PROVIDED.HELP | Yes | Yes | Yes |
FIND.HELPEE | I reached out to them to offer unsolicited help. | They asked for help in a public forum (e.g. in... | They asked me directly for help. |
HELPEE.PRIOR.RELATIONSHIP | Total strangers, I didn't know of them previou... | Total strangers, I didn't know of them previou... | We knew each other well. |
PROVIDED.HELP.TYPE | Writing code or otherwise implementing ideas. | Installing or using an application. | Installing or using an application. |
DISCOURAGING.BEHAVIOR.LACK.OF.RESPONSE | Yes | Yes | Yes |
DISCOURAGING.BEHAVIOR.REJECTION.WOUT.EXPLANATION | Yes | No | No |
DISCOURAGING.BEHAVIOR.DISMISSIVE.RESPONSE | Yes | Yes | Yes |
DISCOURAGING.BEHAVIOR.BAD.DOCS | Yes | Yes | Yes |
DISCOURAGING.BEHAVIOR.CONFLICT | Yes | Yes | No |
DISCOURAGING.BEHAVIOR.UNWELCOMING.LANGUAGE | No | No | No |
OSS.AS.JOB | Yes, directly- some or all of my work duties ... | NaN | NaN |
OSS.AT.WORK | Frequently | NaN | Frequently |
OSS.IP.POLICY | I am free to contribute without asking for per... | NaN | I'm not sure. |
EMPLOYER.POLICY.APPLICATIONS | Use of open source applications is acceptable ... | NaN | Use of open source applications is encouraged. |
EMPLOYER.POLICY.DEPENDENCIES | Use of open source dependencies is acceptable ... | NaN | Use of open source dependencies is encouraged. |
OSS.HIRING | Very important | NaN | NaN |
IMMIGRATION | No, I live in the country where I was born. | No, I live in the country where I was born. | Yes, and I intend to stay permanently. |
MINORITY.HOMECOUNTRY | NaN | NaN | No |
MINORITY.CURRENT.COUNTRY | No | No | No |
GENDER | Man | Man | Man |
TRANSGENDER.IDENTITY | No | No | No |
SEXUAL.ORIENTATION | No | Yes | No |
WRITTEN.ENGLISH | Very well | Very well | Very well |
AGE | 35 to 44 years | 17 or younger | 35 to 44 years |
FORMAL.EDUCATION | Bachelor's degree | Secondary (high) school graduate or equivalent | Vocational/trade program or apprenticeship |
PARENTS.FORMAL.EDUCATION | Bachelor's degree | Master's degree | Bachelor's degree |
AGE.AT.FIRST.COMPUTER.INTERNET | 13 - 17 years old | Younger than 13 years old | 13 - 17 years old |
LOCATION.OF.FIRST.COMPUTER.INTERNET | At home (belonging to me or a family member) | At home (belonging to me or a family member) | At home (belonging to me or a family member) |
PARTICIPATION.TYPE.ANY.REPONSE | 1 | 1 | 1 |
POPULATION | github | github | github |
OFF.SITE.ID | NaN | NaN | NaN |
TRANSLATED | 0 | 0 | 0 |
participation_type_vars = ['PARTICIPATION.TYPE.FOLLOW',
'PARTICIPATION.TYPE.USE.APPLICATIONS',
'PARTICIPATION.TYPE.USE.DEPENDENCIES', 'PARTICIPATION.TYPE.CONTRIBUTE',
'PARTICIPATION.TYPE.OTHER']
contrib_type_vars = ['CONTRIBUTOR.TYPE.CONTRIBUTE.CODE',
'CONTRIBUTOR.TYPE.CONTRIBUTE.DOCS',
'CONTRIBUTOR.TYPE.PROJECT.MAINTENANCE', 'CONTRIBUTOR.TYPE.FILE.BUGS',
'CONTRIBUTOR.TYPE.FEATURE.REQUESTS', 'CONTRIBUTOR.TYPE.COMMUNITY.ADMIN']
contrib_other_vars = ['EMPLOYMENT.STATUS', 'PROFESSIONAL.SOFTWARE',
'FUTURE.CONTRIBUTION.INTEREST', 'FUTURE.CONTRIBUTION.LIKELIHOOD']
contrib_ident_vars = participation_type_vars + contrib_type_vars + contrib_other_vars
user_pri_vars = ['OSS.USER.PRIORITIES.LICENSE', 'OSS.USER.PRIORITIES.CODE.OF.CONDUCT',
'OSS.USER.PRIORITIES.CONTRIBUTING.GUIDE', 'OSS.USER.PRIORITIES.CLA',
'OSS.USER.PRIORITIES.ACTIVE.DEVELOPMENT',
'OSS.USER.PRIORITIES.RESPONSIVE.MAINTAINERS',
'OSS.USER.PRIORITIES.WELCOMING.COMMUNITY',
'OSS.USER.PRIORITIES.WIDESPREAD.USE']
contrib_pri_vars = ['OSS.CONTRIBUTOR.PRIORITIES.LICENSE',
'OSS.CONTRIBUTOR.PRIORITIES.CODE.OF.CONDUCT',
'OSS.CONTRIBUTOR.PRIORITIES.CONTRIBUTING.GUIDE',
'OSS.CONTRIBUTOR.PRIORITIES.CLA',
'OSS.CONTRIBUTOR.PRIORITIES.ACTIVE.DEVELOPMENT',
'OSS.CONTRIBUTOR.PRIORITIES.RESPONSIVE.MAINTAINERS',
'OSS.CONTRIBUTOR.PRIORITIES.WELCOMING.COMMUNITY',
'OSS.CONTRIBUTOR.PRIORITIES.WIDESPREAD.USE']
oss_values_vars = [ 'SEEK.OPEN.SOURCE',
'OSS.UX', 'OSS.SECURITY', 'OSS.STABILITY', 'INTERNAL.EFFICACY',
'EXTERNAL.EFFICACY', 'OSS.IDENTIFICATION']
user_values_vars = ['USER.VALUES.STABILITY',
'USER.VALUES.INNOVATION', 'USER.VALUES.REPLICABILITY',
'USER.VALUES.COMPATIBILITY', 'USER.VALUES.SECURITY', 'USER.VALUES.COST',
'USER.VALUES.TRANSPARENCY', 'USER.VALUES.USER.EXPERIENCE',
'USER.VALUES.CUSTOMIZABILITY', 'USER.VALUES.SUPPORT',
'USER.VALUES.TRUSTED.PRODUCER']
values_pri_vars = user_pri_vars + contrib_pri_vars + user_values_vars + oss_values_vars
privacy_transp_vars = ['TRANSPARENCY.PRIVACY.BELIEFS',
'INFO.AVAILABILITY', 'INFO.JOB',
'TRANSPARENCY.PRIVACY.PRACTICES.GENERAL',
'TRANSPARENCY.PRIVACY.PRACTICES.OSS']
help_vars = ['RECEIVED.HELP', 'FIND.HELPER',
'HELPER.PRIOR.RELATIONSHIP', 'RECEIVED.HELP.TYPE', 'PROVIDED.HELP',
'FIND.HELPEE', 'HELPEE.PRIOR.RELATIONSHIP', 'PROVIDED.HELP.TYPE']
paid_work_vars = ['OSS.AS.JOB',
'OSS.AT.WORK', 'OSS.IP.POLICY', 'EMPLOYER.POLICY.APPLICATIONS',
'EMPLOYER.POLICY.DEPENDENCIES', 'OSS.HIRING']
discouraging_vars = ['DISCOURAGING.BEHAVIOR.LACK.OF.RESPONSE',
'DISCOURAGING.BEHAVIOR.REJECTION.WOUT.EXPLANATION',
'DISCOURAGING.BEHAVIOR.DISMISSIVE.RESPONSE',
'DISCOURAGING.BEHAVIOR.BAD.DOCS', 'DISCOURAGING.BEHAVIOR.CONFLICT',
'DISCOURAGING.BEHAVIOR.UNWELCOMING.LANGUAGE']
demographic_vars = ['IMMIGRATION',
'MINORITY.HOMECOUNTRY', 'MINORITY.CURRENT.COUNTRY', 'GENDER',
'TRANSGENDER.IDENTITY', 'SEXUAL.ORIENTATION', 'WRITTEN.ENGLISH', 'AGE',
'FORMAL.EDUCATION', 'PARENTS.FORMAL.EDUCATION',
'AGE.AT.FIRST.COMPUTER.INTERNET', 'LOCATION.OF.FIRST.COMPUTER.INTERNET',
'PARTICIPATION.TYPE.ANY.REPONSE', 'POPULATION', 'OFF.SITE.ID',
'TRANSLATED']
survey_vars = [contrib_ident_vars, values_pri_vars, privacy_transp_vars, \
help_vars, paid_work_vars, discouraging_vars, demographic_vars]
neg_df = pd.read_csv("data_for_public_release/negative_incidents.csv")
print("negative_incidents.csv length:", len(survey_df))
negative_incidents.csv length: 6029
neg_df[0:3].transpose()
0 | 1 | 2 | |
---|---|---|---|
NEGATIVE.WITNESS.RUDENESS | 1 | 1 | 0 |
NEGATIVE.WITNESS.NAME.CALLING | 1 | 0 | 0 |
NEGATIVE.WITNESS.THREATS | 0 | 0 | 0 |
NEGATIVE.WITNESS.IMPERSONATION | 0 | 0 | 1 |
NEGATIVE.WITNESS.SUSTAINED.HARASSMENT | 0 | 0 | 0 |
NEGATIVE.WITNESS.CROSS.PLATFORM.HARASSMENT | 0 | 0 | 0 |
NEGATIVE.WITNESS.STALKING | 0 | 0 | 0 |
NEGATIVE.WITNESS.SEXUAL.ADVANCES | 0 | 0 | 0 |
NEGATIVE.WITNESS.STEREOTYPING | 0 | 0 | 0 |
NEGATIVE.WITNESS.DOXXING | 0 | 0 | 1 |
NEGATIVE.WITNESS.OTHER | 0 | 0 | 0 |
NEGATIVE.WITNESS.NONE.OF.THE.ABOVE | 0 | 0 | 0 |
NEGATIVE.EXPERIENCE.RUDENESS | 0 | 1 | 0 |
NEGATIVE.EXPERIENCE.NAME.CALLING | 0 | 0 | 0 |
NEGATIVE.EXPERIENCE.THREATS | 0 | 0 | 0 |
NEGATIVE.EXPERIENCE.IMPERSONATION | 0 | 0 | 0 |
NEGATIVE.EXPERIENCE.SUSTAINED.HARASSMENT | 0 | 0 | 0 |
NEGATIVE.EXPERIENCE.CROSS.PLATFORM.HARASSMENT | 0 | 0 | 0 |
NEGATIVE.EXPERIENCE.STALKING | 0 | 0 | 0 |
NEGATIVE.EXPERIENCE.SEXUAL.ADVANCES | 0 | 0 | 0 |
NEGATIVE.EXPERIENCE.STEREOTYPING | 0 | 0 | 0 |
NEGATIVE.EXPERIENCE.DOXXING | 0 | 0 | 0 |
NEGATIVE.EXPERIENCE.OTHER | 0 | 0 | 0 |
NEGATIVE.EXPERIENCE.NONE.OF.THE.ABOVE | 1 | 0 | 1 |
NEGATIVE.RESPONSE.ASKED.USER.TO.STOP | 0 | 0 | 0 |
NEGATIVE.RESPONSE.SOLICITED.COMMUNITY.SUPPORT | 0 | 0 | 0 |
NEGATIVE.RESPONSE.BLOCKED.USER | 0 | 0 | 0 |
NEGATIVE.RESPONSE.REPORTED.TO.MAINTAINERS | 0 | 0 | 0 |
NEGATIVE.RESPONSE.REPORTED.TO.HOST.OR.ISP | 0 | 0 | 0 |
NEGATIVE.RESPONSE.CONSULTED.LEGAL.COUNSEL | 0 | 0 | 0 |
NEGATIVE.RESPONSE.CONTACTED.LAW.ENFORCEMENT | 0 | 0 | 0 |
NEGATIVE.RESPONSE.OTHER | 0 | 0 | 0 |
NEGATIVE.RESPONSE.IGNORED | 0 | 1 | 0 |
RESPONSE.EFFECTIVENESS.ASKED.USER.TO.STOP | NaN | NaN | NaN |
RESPONSE.EFFECTIVENESS.SOLICITED.COMMUNITY.SUPPORT | NaN | NaN | NaN |
RESPONSE.EFFECTIVENESS.BLOCKED.USER | NaN | NaN | NaN |
RESPONSE.EFFECTIVENESS.REPORTED.TO.MAINTAINERS | NaN | NaN | NaN |
RESPONSE.EFFECTIVENESS.REPORTED.TO.HOST.OR.ISP | NaN | NaN | NaN |
RESPONSE.EFFECTIVENESS.CONSULTED.LEGAL.COUNSEL | NaN | NaN | NaN |
RESPONSE.EFFECTIVENESS.CONTACTED.LAW.ENFORCEMENT | NaN | NaN | NaN |
RESPONSE.EFFECTIVENESS.OTHER | NaN | NaN | NaN |
NEGATIVE.CONSEQUENCES.STOPPED.CONTRIBUTING | 0 | 0 | 1 |
NEGATIVE.CONSEQUENCES.PSEUDONYM | 0 | 0 | 0 |
NEGATIVE.CONSEQUENCES.WORK.IN.PRIVATE | 0 | 0 | 0 |
NEGATIVE.CONSEQUENCES.CHANGE.USERNAME | 0 | 0 | 0 |
NEGATIVE.CONSEQUENCES.CHANGE.ONLINE.PRESENCE | 0 | 0 | 0 |
NEGATIVE.CONSEQUENCES.SUGGEST.COC | 0 | 0 | 0 |
NEGATIVE.CONSEQUENCES.PRIVATE.COMMUNITY.DISCUSSION | 0 | 0 | 0 |
NEGATIVE.CONSEQUENCES.PUBLIC.COMMUNITY.DISCUSSION | 0 | 1 | 0 |
NEGATIVE.CONSEQUENCES.OFFLINE.CHANGES | 0 | 0 | 0 |
NEGATIVE.CONSEQUENCES.OTHER | 0 | 0 | 0 |
NEGATIVE.CONSEQUENCES.NONE.OF.THE.ABOVE | 1 | 0 | 0 |
NEGATIVE.WITNESS.ANY.RESPONSE | 1 | 1 | 1 |
NEGATIVE.EXPERIENCE.ANY.RESPONSE | 1 | 1 | 1 |
NEGATIVE.RESPONSE.ANY.RESPONSE | 0 | 1 | 0 |
NEGATIVE.CONSEQUENCES.ANY.RESPONSE | 1 | 1 | 1 |
POPULATION | github | github | github |
neg_witness_vars = ['NEGATIVE.WITNESS.RUDENESS', 'NEGATIVE.WITNESS.NAME.CALLING',
'NEGATIVE.WITNESS.THREATS', 'NEGATIVE.WITNESS.IMPERSONATION',
'NEGATIVE.WITNESS.SUSTAINED.HARASSMENT',
'NEGATIVE.WITNESS.CROSS.PLATFORM.HARASSMENT',
'NEGATIVE.WITNESS.STALKING', 'NEGATIVE.WITNESS.SEXUAL.ADVANCES',
'NEGATIVE.WITNESS.STEREOTYPING', 'NEGATIVE.WITNESS.DOXXING',
'NEGATIVE.WITNESS.OTHER', 'NEGATIVE.WITNESS.NONE.OF.THE.ABOVE', 'NEGATIVE.WITNESS.ANY.RESPONSE']
neg_exp_vars = ['NEGATIVE.EXPERIENCE.RUDENESS', 'NEGATIVE.EXPERIENCE.NAME.CALLING',
'NEGATIVE.EXPERIENCE.THREATS', 'NEGATIVE.EXPERIENCE.IMPERSONATION',
'NEGATIVE.EXPERIENCE.SUSTAINED.HARASSMENT',
'NEGATIVE.EXPERIENCE.CROSS.PLATFORM.HARASSMENT',
'NEGATIVE.EXPERIENCE.STALKING', 'NEGATIVE.EXPERIENCE.SEXUAL.ADVANCES',
'NEGATIVE.EXPERIENCE.STEREOTYPING', 'NEGATIVE.EXPERIENCE.DOXXING',
'NEGATIVE.EXPERIENCE.OTHER', 'NEGATIVE.EXPERIENCE.NONE.OF.THE.ABOVE', 'NEGATIVE.EXPERIENCE.ANY.RESPONSE']
neg_resp_vars = ['NEGATIVE.RESPONSE.ASKED.USER.TO.STOP',
'NEGATIVE.RESPONSE.SOLICITED.COMMUNITY.SUPPORT',
'NEGATIVE.RESPONSE.BLOCKED.USER',
'NEGATIVE.RESPONSE.REPORTED.TO.MAINTAINERS',
'NEGATIVE.RESPONSE.REPORTED.TO.HOST.OR.ISP',
'NEGATIVE.RESPONSE.CONSULTED.LEGAL.COUNSEL',
'NEGATIVE.RESPONSE.CONTACTED.LAW.ENFORCEMENT',
'NEGATIVE.RESPONSE.OTHER', 'NEGATIVE.RESPONSE.IGNORED', 'NEGATIVE.RESPONSE.ANY.RESPONSE']
neg_effect_vars = ['RESPONSE.EFFECTIVENESS.ASKED.USER.TO.STOP',
'RESPONSE.EFFECTIVENESS.SOLICITED.COMMUNITY.SUPPORT',
'RESPONSE.EFFECTIVENESS.BLOCKED.USER',
'RESPONSE.EFFECTIVENESS.REPORTED.TO.MAINTAINERS',
'RESPONSE.EFFECTIVENESS.REPORTED.TO.HOST.OR.ISP',
'RESPONSE.EFFECTIVENESS.CONSULTED.LEGAL.COUNSEL',
'RESPONSE.EFFECTIVENESS.CONTACTED.LAW.ENFORCEMENT',
'RESPONSE.EFFECTIVENESS.OTHER']
neg_conseq_vars = ['NEGATIVE.CONSEQUENCES.STOPPED.CONTRIBUTING',
'NEGATIVE.CONSEQUENCES.PSEUDONYM',
'NEGATIVE.CONSEQUENCES.WORK.IN.PRIVATE',
'NEGATIVE.CONSEQUENCES.CHANGE.USERNAME',
'NEGATIVE.CONSEQUENCES.CHANGE.ONLINE.PRESENCE',
'NEGATIVE.CONSEQUENCES.SUGGEST.COC',
'NEGATIVE.CONSEQUENCES.PRIVATE.COMMUNITY.DISCUSSION',
'NEGATIVE.CONSEQUENCES.PUBLIC.COMMUNITY.DISCUSSION',
'NEGATIVE.CONSEQUENCES.OFFLINE.CHANGES', 'NEGATIVE.CONSEQUENCES.OTHER',
'NEGATIVE.CONSEQUENCES.NONE.OF.THE.ABOVE', 'NEGATIVE.CONSEQUENCES.ANY.RESPONSE']
neg_anyresp_vars = ['NEGATIVE.WITNESS.ANY.RESPONSE', 'NEGATIVE.EXPERIENCE.ANY.RESPONSE',
'NEGATIVE.RESPONSE.ANY.RESPONSE', 'NEGATIVE.CONSEQUENCES.ANY.RESPONSE']
sns.set(font_scale=1.5)
Choose all that apply.
participation_type_resp= survey_df[participation_type_vars].apply(pd.Series.value_counts).transpose()
participation_type_resp.columns = ["No", "Yes"]
participation_type_resp
No | Yes | |
---|---|---|
PARTICIPATION.TYPE.FOLLOW | 1287 | 4742 |
PARTICIPATION.TYPE.USE.APPLICATIONS | 454 | 5575 |
PARTICIPATION.TYPE.USE.DEPENDENCIES | 946 | 5083 |
PARTICIPATION.TYPE.CONTRIBUTE | 1722 | 4307 |
PARTICIPATION.TYPE.OTHER | 5742 | 287 |
participation_type_prop = survey_df[participation_type_vars].mean() * 100
participation_type_prop = participation_type_prop.sort_values()
pd.DataFrame(participation_type_prop, columns=["percent"])
percent | |
---|---|
PARTICIPATION.TYPE.OTHER | 4.76% |
PARTICIPATION.TYPE.CONTRIBUTE | 71.44% |
PARTICIPATION.TYPE.FOLLOW | 78.65% |
PARTICIPATION.TYPE.USE.DEPENDENCIES | 84.31% |
PARTICIPATION.TYPE.USE.APPLICATIONS | 92.47% |
ax = participation_type_prop.plot(kind='barh')
labels = []
for l in ax.get_yticklabels():
title_text = l.get_text()[19:].replace(".", " ") # cut off "CONTRIBUTOR.TYPE"
labels.append(title_text)
plt.xlim(0,100)
ax.set_yticklabels(labels)
ax.set_xlabel("Percent of respondents")
t = plt.title("% of people who participate in the following activities:")
contrib_type_responses = survey_df[contrib_type_vars].apply(pd.Series.value_counts).transpose()
#contrib_type_responses.columns = ["Not at all important", "Slightly important","Don't know", "Somewhat important", "Very important"]
contrib_type_responses = contrib_type_responses[["Never", "Rarely", "Occasionally", "Frequently"]]
contrib_type_responses = contrib_type_responses[["Frequently", "Occasionally", "Rarely", "Never"]]
contrib_type_responses = contrib_type_responses.sort_values(by='Frequently')
contrib_type_responses
Frequently | Occasionally | Rarely | Never | |
---|---|---|---|---|
CONTRIBUTOR.TYPE.COMMUNITY.ADMIN | 287 | 417 | 867 | 2412 |
CONTRIBUTOR.TYPE.CONTRIBUTE.DOCS | 460 | 1214 | 1665 | 661 |
CONTRIBUTOR.TYPE.FEATURE.REQUESTS | 573 | 1625 | 1346 | 451 |
CONTRIBUTOR.TYPE.PROJECT.MAINTENANCE | 996 | 944 | 974 | 1090 |
CONTRIBUTOR.TYPE.FILE.BUGS | 1067 | 2073 | 768 | 106 |
CONTRIBUTOR.TYPE.CONTRIBUTE.CODE | 1160 | 1383 | 1301 | 189 |
sns.set(style="whitegrid", font_scale=1.75)
fig, ax = plt.subplots()
cmap=matplotlib.cm.Blues_r
contrib_type_responses.plot.barh(stacked=True, ax=ax, figsize=[12,6], cmap=cmap, edgecolor='black', linewidth=1)
labels = []
for l in ax.get_yticklabels():
title_text = l.get_text()[17:].replace(".", " ") # cut off "CONTRIBUTOR.TYPE"
labels.append(title_text)
ax.set_yticklabels(labels)
plt.title("How often do you engage in each of the following activities?")
plt.xlabel("Number of responses")
legend = plt.legend(fancybox=True, loc='upper center', bbox_to_anchor=(.5, -.13), ncol=4, shadow=True)
legend.get_frame().set_edgecolor('b')
legend.get_frame().set_facecolor('white')
EMPLOYMENT.STATUS
prop_df = pd.DataFrame((survey_df['EMPLOYMENT.STATUS'].value_counts()))
prop_df.columns=["count"]
prop_df
count | |
---|---|
Employed full time | 3615 |
Full time student | 1048 |
Employed part time | 349 |
Temporarily not working | 314 |
Other - please describe | 184 |
Retired or permanently not working (e.g. due to disability) | 90 |
prop_df = pd.DataFrame((survey_df['EMPLOYMENT.STATUS'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
Employed full time | 64.55% |
Full time student | 18.71% |
Employed part time | 6.23% |
Temporarily not working | 5.61% |
Other - please describe | 3.29% |
Retired or permanently not working (e.g. due to disability) | 1.61% |
ax = pd.DataFrame(survey_df['EMPLOYMENT.STATUS'].value_counts()).plot(kind='barh')
plt.suptitle("Employment status")
t = ax.set_xlabel("Count of responses")
PROFESSIONAL.SOFTWARE
prop_df = pd.DataFrame((survey_df['PROFESSIONAL.SOFTWARE'].value_counts()))
prop_df.columns=["count"]
prop_df
count | |
---|---|
Frequently | 2747 |
Occasionally | 542 |
Rarely | 339 |
Never | 279 |
prop_df = pd.DataFrame((survey_df['PROFESSIONAL.SOFTWARE'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
Frequently | 70.31% |
Occasionally | 13.87% |
Rarely | 8.68% |
Never | 7.14% |
ax = pd.DataFrame(survey_df['PROFESSIONAL.SOFTWARE'].value_counts()).plot(kind='barh')
plt.title("In your main job, how often do you write or\notherwise directly contribute to producing software?")
t = ax.set_xlabel("Count of responses")
FUTURE.CONTRIBUTION.INTEREST
prop_df = pd.DataFrame((survey_df['FUTURE.CONTRIBUTION.INTEREST'].value_counts()))
prop_df.columns=["count"]
prop_df
count | |
---|---|
Very interested | 3929 |
Somewhat interested | 1430 |
Not too interested | 125 |
Not at all interested | 24 |
prop_df = pd.DataFrame((survey_df['FUTURE.CONTRIBUTION.INTEREST'].value_counts(normalize=True).round(4).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
Very interested | 71.33% |
Somewhat interested | 25.96% |
Not too interested | 2.27% |
Not at all interested | 0.44% |
ax = pd.DataFrame(survey_df['FUTURE.CONTRIBUTION.INTEREST'].value_counts()).plot(kind='barh')
plt.title("How interested are you in contributing\nto open source projects in the future?")
t = ax.set_xlabel("Count of responses")
prop_df = pd.DataFrame((survey_df['FUTURE.CONTRIBUTION.LIKELIHOOD'].value_counts()))
prop_df.columns=["count"]
prop_df
count | |
---|---|
Very likely | 3271 |
Somewhat likely | 1719 |
Somewhat unlikely | 440 |
Very unlikely | 81 |
prop_df = pd.DataFrame((survey_df['FUTURE.CONTRIBUTION.LIKELIHOOD'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
Very likely | 59.35% |
Somewhat likely | 31.19% |
Somewhat unlikely | 7.98% |
Very unlikely | 1.47% |
ax = pd.DataFrame(survey_df['FUTURE.CONTRIBUTION.LIKELIHOOD'].value_counts()).plot(kind='barh')
plt.title("How likely are you to contribute to\nopen source projects in the future?")
t = ax.set_xlabel("Count of responses")
OSS.USER.PRIORITIES.*
user_pri_responses = survey_df[user_pri_vars].apply(pd.Series.value_counts).transpose()
#contrib_type_responses.columns = ["Not at all important", "Slightly important","Don't know", "Somewhat important", "Very important"]
user_pri_responses = user_pri_responses[["Very important to have",
"Somewhat important to have",
"Not important either way",
"Somewhat important not to have",
"Very important not to have",
"Don't know what this is"]]
user_pri_responses = user_pri_responses.sort_values(by="Very important to have")
idx = []
for i in user_pri_responses.index:
idx.append(i[20:])
idx = pd.Series(idx)
user_pri_responses.set_index(idx)
Very important to have | Somewhat important to have | Not important either way | Somewhat important not to have | Very important not to have | Don't know what this is | |
---|---|---|---|---|---|---|
CLA | 490 | 1024 | 2282 | 336 | 157 | 488 |
CODE.OF.CONDUCT | 848 | 1461 | 1993 | 166 | 120 | 209 |
WIDESPREAD.USE | 984 | 2067 | 1576 | 114 | 47 | 28 |
CONTRIBUTING.GUIDE | 1212 | 1866 | 1516 | 95 | 62 | 62 |
WELCOMING.COMMUNITY | 2062 | 1822 | 812 | 67 | 33 | 18 |
RESPONSIVE.MAINTAINERS | 2575 | 1850 | 302 | 31 | 35 | 20 |
ACTIVE.DEVELOPMENT | 2768 | 1722 | 267 | 30 | 31 | 16 |
LICENSE | 3125 | 1160 | 435 | 31 | 33 | 47 |
user_pri_responses_prop = survey_df[user_pri_vars].apply(pd.Series.value_counts, normalize=True).round(4).transpose()
#contrib_type_responses.columns = ["Not at all important", "Slightly important","Don't know", "Somewhat important", "Very important"]
user_pri_responses_prop = user_pri_responses_prop[["Very important to have",
"Somewhat important to have",
"Not important either way",
"Somewhat important not to have",
"Very important not to have",
"Don't know what this is"]]
user_pri_responses_prop = user_pri_responses_prop.sort_values(by="Very important to have")
user_pri_responses_prop = user_pri_responses_prop * 100
idx = []
for i in user_pri_responses_prop.index:
idx.append(i[20:])
idx = pd.Series(idx)
user_pri_responses_prop.set_index(idx)
Very important to have | Somewhat important to have | Not important either way | Somewhat important not to have | Very important not to have | Don't know what this is | |
---|---|---|---|---|---|---|
CLA | 10.26% | 21.44% | 47.77% | 7.03% | 3.29% | 10.22% |
CODE.OF.CONDUCT | 17.68% | 30.46% | 41.55% | 3.46% | 2.50% | 4.36% |
WIDESPREAD.USE | 20.43% | 42.92% | 32.72% | 2.37% | 0.98% | 0.58% |
CONTRIBUTING.GUIDE | 25.18% | 38.77% | 31.50% | 1.97% | 1.29% | 1.29% |
WELCOMING.COMMUNITY | 42.83% | 37.85% | 16.87% | 1.39% | 0.69% | 0.37% |
RESPONSIVE.MAINTAINERS | 53.50% | 38.44% | 6.27% | 0.64% | 0.73% | 0.42% |
ACTIVE.DEVELOPMENT | 57.26% | 35.62% | 5.52% | 0.62% | 0.64% | 0.33% |
LICENSE | 64.69% | 24.01% | 9.00% | 0.64% | 0.68% | 0.97% |
sns.set(style="whitegrid", font_scale=1.75)
fig, ax = plt.subplots()
cmap=matplotlib.cm.coolwarm
colors = ["xkcd:darkblue", "xkcd:lightblue", "xkcd:beige", "xkcd:salmon", "xkcd:crimson", "xkcd:green"]
user_pri_responses.plot.barh(stacked=True, ax=ax, figsize=[12,8], color=colors)
labels = []
for l in ax.get_yticklabels():
title_text = l.get_text()[20:].replace(".", " ") # cut off "OSS.USER.PRIORITIES."
labels.append(title_text)
ax.set_yticklabels(labels)
plt.title("When thinking about whether to *use* open source software,\n how important are the following things?")
plt.xlabel("Number of responses")
legend = plt.legend(fancybox=True, loc='upper center', bbox_to_anchor=(.5, -.1), ncol=2, shadow=True)
legend.get_frame().set_edgecolor('b')
legend.get_frame().set_facecolor('white')
OSS.CONTRIBUTOR.PRIORITIES.*
contrib_pri_responses = survey_df[contrib_pri_vars].apply(pd.Series.value_counts).transpose()
#contrib_type_responses.columns = ["Not at all important", "Slightly important","Don't know", "Somewhat important", "Very important"]
contrib_pri_responses = contrib_pri_responses[["Very important to have",
"Somewhat important to have",
"Not important either way",
"Somewhat important not to have",
"Very important not to have",
"Don't know what this is"]]
contrib_pri_responses = contrib_pri_responses.sort_values(by="Very important to have")
idx = []
for i in contrib_pri_responses.index:
idx.append(i[27:])
idx = pd.Series(idx)
contrib_pri_responses.set_index(idx)
Very important to have | Somewhat important to have | Not important either way | Somewhat important not to have | Very important not to have | Don't know what this is | |
---|---|---|---|---|---|---|
WIDESPREAD.USE | 387 | 1016 | 1666 | 70 | 30 | 12 |
CLA | 419 | 712 | 1266 | 327 | 166 | 280 |
CODE.OF.CONDUCT | 655 | 1145 | 1085 | 119 | 84 | 96 |
CONTRIBUTING.GUIDE | 1198 | 1396 | 500 | 41 | 18 | 24 |
ACTIVE.DEVELOPMENT | 1368 | 1333 | 448 | 21 | 18 | 5 |
WELCOMING.COMMUNITY | 1533 | 1199 | 411 | 21 | 15 | 7 |
RESPONSIVE.MAINTAINERS | 1994 | 1022 | 138 | 7 | 16 | 7 |
LICENSE | 2199 | 610 | 337 | 16 | 15 | 18 |
contrib_pri_responses_prop = survey_df[contrib_pri_vars].apply(pd.Series.value_counts, normalize=True).round(4).transpose()
#contrib_type_responses.columns = ["Not at all important", "Slightly important","Don't know", "Somewhat important", "Very important"]
contrib_pri_responses_prop = contrib_pri_responses_prop[["Very important to have",
"Somewhat important to have",
"Not important either way",
"Somewhat important not to have",
"Very important not to have",
"Don't know what this is"]]
contrib_pri_responses_prop = contrib_pri_responses_prop.sort_values(by="Very important to have")
contrib_pri_responses_prop = contrib_pri_responses_prop * 100
idx = []
for i in contrib_pri_responses_prop.index:
idx.append(i[27:])
idx = pd.Series(idx)
contrib_pri_responses_prop.set_index(idx)
Very important to have | Somewhat important to have | Not important either way | Somewhat important not to have | Very important not to have | Don't know what this is | |
---|---|---|---|---|---|---|
WIDESPREAD.USE | 12.17% | 31.94% | 52.37% | 2.20% | 0.94% | 0.38% |
CLA | 13.22% | 22.46% | 39.94% | 10.32% | 5.24% | 8.83% |
CODE.OF.CONDUCT | 20.57% | 35.96% | 34.08% | 3.74% | 2.64% | 3.02% |
CONTRIBUTING.GUIDE | 37.71% | 43.94% | 15.74% | 1.29% | 0.57% | 0.76% |
ACTIVE.DEVELOPMENT | 42.84% | 41.75% | 14.03% | 0.66% | 0.56% | 0.16% |
WELCOMING.COMMUNITY | 48.12% | 37.63% | 12.90% | 0.66% | 0.47% | 0.22% |
RESPONSIVE.MAINTAINERS | 62.63% | 32.10% | 4.33% | 0.22% | 0.50% | 0.22% |
LICENSE | 68.83% | 19.09% | 10.55% | 0.50% | 0.47% | 0.56% |
sns.set(style="whitegrid", font_scale=1.75)
fig, ax = plt.subplots()
cmap=matplotlib.cm.coolwarm
colors = ["xkcd:darkblue", "xkcd:lightblue", "xkcd:beige", "xkcd:salmon", "xkcd:crimson", "xkcd:green"]
contrib_pri_responses.plot.barh(stacked=True, ax=ax, figsize=[12,8], color=colors)
labels = []
for l in ax.get_yticklabels():
title_text = l.get_text()[27:].replace(".", " ") # cut off "OSS.USER.PRIORITIES."
labels.append(title_text)
ax.set_yticklabels(labels)
plt.title("When thinking about whether to *contribute* to an open source project,\nhow important are the following things?")
plt.xlabel("Number of responses")
legend = plt.legend(fancybox=True, loc='upper center', bbox_to_anchor=(.5, -.1), ncol=2, shadow=True)
legend.get_frame().set_edgecolor('b')
legend.get_frame().set_facecolor('white')
SEEK.OPEN.SOURCE
count_df = pd.DataFrame(data=survey_df['SEEK.OPEN.SOURCE'].value_counts())
count_df.columns = ["count"]
count_df
count | |
---|---|
Always | 3407 |
Sometimes | 1111 |
Rarely | 100 |
Never | 25 |
prop_df = pd.DataFrame((survey_df['SEEK.OPEN.SOURCE'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
Always | 73.38% |
Sometimes | 23.93% |
Rarely | 2.15% |
Never | 0.54% |
ax = pd.DataFrame(survey_df['SEEK.OPEN.SOURCE'].value_counts()).plot(kind='barh')
plt.title("How often do you try to find open\nsource options over other kinds of software?")
t = ax.set_xlabel("Count of responses")
OSS.UX: Do you believe that open source software is generally easier to use than closed source (proprietary) software, harder to use, or about the same?
count_df = pd.DataFrame(data=survey_df['OSS.UX'].value_counts())
count_df.columns = ["count"]
count_df
count | |
---|---|
About the same | 2027 |
Generally easier to use | 1597 |
Generally harder to use | 897 |
prop_df = pd.DataFrame((survey_df['OSS.UX'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
About the same | 44.84% |
Generally easier to use | 35.32% |
Generally harder to use | 19.84% |
ax = pd.DataFrame(survey_df['OSS.UX'].value_counts()).plot(kind='barh')
plt.title("Do you believe that open source software is generally\neasier to use than closed source (proprietary)\nsoftware, harder to use, or about the same?")
t = ax.set_xlabel("Count of responses")
OSS.SECURITY: Do you believe that open source software is generally more secure than closed source (proprietary) software, less secure, or about the same?
count_df = pd.DataFrame(data=survey_df['OSS.SECURITY'].value_counts())
count_df.columns = ["count"]
count_df
count | |
---|---|
Generally more secure | 2688 |
About the same | 1537 |
Generally less secure | 295 |
prop_df = pd.DataFrame((survey_df['OSS.SECURITY'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
Generally more secure | 59.47% |
About the same | 34.00% |
Generally less secure | 6.53% |
ax = pd.DataFrame(survey_df['OSS.SECURITY'].value_counts()).plot(kind='barh')
plt.title("Do you believe that open source software is\ngenerally more secure than closed source (proprietary)\nsoftware, less secure, or about the same?")
t = ax.set_xlabel("Count of responses")
OSS.STABILITY: Do you believe that open source software is generally more stable than closed source (proprietary) software, less stable, or about the same?
count_df = pd.DataFrame(data=survey_df['OSS.STABILITY'].value_counts())
count_df.columns = ["count"]
count_df
count | |
---|---|
About the same | 2240 |
Generally more stable | 1399 |
Generally less stable | 877 |
prop_df = pd.DataFrame((survey_df['OSS.STABILITY'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
About the same | 49.60% |
Generally more stable | 30.98% |
Generally less stable | 19.42% |
pd.DataFrame(survey_df['OSS.STABILITY'].value_counts()).plot(kind='barh')
plt.title("Do you believe that open source software is\ngenerally more stable than closed source\n(proprietary), less stable, or about the same?")
t = ax.set_xlabel("Count of responses")
How much do you agree or disagree with the following statements:
oss_id_vars = ["INTERNAL.EFFICACY", "EXTERNAL.EFFICACY", "OSS.IDENTIFICATION"]
oss_id_responses = survey_df[oss_id_vars].apply(pd.Series.value_counts).transpose()
#contrib_type_responses.columns = ["Not at all important", "Slightly important","Don't know", "Somewhat important", "Very important"]
oss_id_responses = oss_id_responses[["Strongly agree",
"Somewhat agree",
"Neither agree nor disagree",
"Somewhat disagree",
"Strongly disagree"]]
oss_id_responses = oss_id_responses.sort_values(by="Strongly agree")
oss_id_responses
Strongly agree | Somewhat agree | Neither agree nor disagree | Somewhat disagree | Strongly disagree | |
---|---|---|---|---|---|
EXTERNAL.EFFICACY | 1518 | 1610 | 1116 | 150 | 58 |
OSS.IDENTIFICATION | 1579 | 1513 | 863 | 351 | 150 |
INTERNAL.EFFICACY | 2052 | 1685 | 418 | 240 | 62 |
oss_id_responses_prop = survey_df[oss_id_vars].apply(pd.Series.value_counts, normalize=True).round(4) * 100
oss_id_responses_prop.transpose()
Neither agree nor disagree | Somewhat agree | Somewhat disagree | Strongly agree | Strongly disagree | |
---|---|---|---|---|---|
INTERNAL.EFFICACY | 9.38% | 37.81% | 5.38% | 46.04% | 1.39% |
EXTERNAL.EFFICACY | 25.07% | 36.16% | 3.37% | 34.10% | 1.30% |
OSS.IDENTIFICATION | 19.37% | 33.95% | 7.88% | 35.44% | 3.37% |
sns.set(style="whitegrid", font_scale=1.75)
fig, ax = plt.subplots()
cmap=matplotlib.cm.coolwarm
colors = ["xkcd:darkblue", "xkcd:lightblue", "xkcd:beige", "xkcd:salmon", "xkcd:crimson"]
oss_id_responses.plot.barh(stacked=True, ax=ax, figsize=[12,5], cmap=matplotlib.cm.coolwarm, edgecolor='black', linewidth=1)
#print(str(ax.get_yticklabels()))
ax.set_yticklabels(["The open source community values\ncontributions from people like me.",
"I consider myself to be a member\nof the open source (and/or the\nFree/Libre software) community.",
"I have the skills and understanding\nnecessary to make meaningful\ncontributions to open source projects."])
plt.title("How much do you agree or disagree with the following statements:")
plt.xlabel("Number of responses")
legend = plt.legend(fancybox=True, loc='upper center', bbox_to_anchor=(.5, -.25), ncol=2, shadow=True)
legend.get_frame().set_edgecolor('b')
legend.get_frame().set_facecolor('white')
TRANSPARENCY.PRIVACY.BELIEFS: Which of the following statements is closest to your beliefs about attribution in software development?
counts_df = pd.DataFrame(survey_df['TRANSPARENCY.PRIVACY.BELIEFS'].value_counts())
counts_df.columns=["count"]
counts_df
count | |
---|---|
People should be able to contribute code without attribution, if they wish to remain anonymous. | 2454 |
Records of authorship should be required so that end users know who created the source code they are working with. | 1594 |
prop_df = pd.DataFrame((survey_df['TRANSPARENCY.PRIVACY.BELIEFS'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
People should be able to contribute code without attribution, if they wish to remain anonymous. | 60.62% |
Records of authorship should be required so that end users know who created the source code they are working with. | 39.38% |
ax = pd.DataFrame(survey_df['TRANSPARENCY.PRIVACY.BELIEFS'].value_counts()).plot(kind='barh', figsize=[10,6])
plt.title("Which of the following statements is closest to your\nbeliefs about attribution in software development?")
ax.set_yticklabels(["People should be able to contribute\ncode without attribution, if\nthey wish to remain anonymous.",
"Records of authorship should be\nrequired so that end users know\nwho created the source code they are working with."])
t = ax.set_xlabel("Count of responses")
INFO.AVAILABILITY
count_df = pd.DataFrame(survey_df['INFO.AVAILABILITY'].value_counts())
count_df.columns=["count"]
count_df
count | |
---|---|
Some information about me | 1776 |
A little information about me | 1133 |
A lot of information about me | 1011 |
No information at all about me | 140 |
prop_df = pd.DataFrame((survey_df['INFO.AVAILABILITY'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
Some information about me | 43.74% |
A little information about me | 27.91% |
A lot of information about me | 24.90% |
No information at all about me | 3.45% |
ax = pd.DataFrame(survey_df['INFO.AVAILABILITY'].value_counts()).plot(kind='barh')
plt.title("In general, how much information about\nyou is publicly available online?")
t = ax.set_xlabel("Count of responses")
INFO.JOB
count_df = pd.DataFrame(survey_df['INFO.JOB'].value_counts())
count_df.columns = ["count"]
count_df
count | |
---|---|
Yes | 2327 |
No | 1638 |
prop_df = pd.DataFrame((survey_df['INFO.JOB'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
Yes | 58.69% |
No | 41.31% |
ax = pd.DataFrame(survey_df['INFO.JOB'].value_counts()).plot(kind='barh', figsize=[10,6])
plt.title("Do you feel that you need to make information available\nabout yourself online for professional reasons?")
t = ax.set_xlabel("Count of responses")
TRANSPARENCY.PRIVACY.PRACTICES.GENERAL
"Which of the following best describes your practices around publishing content online, such as posts on social media (e.g. Facebook, Instagram, Twitter, etc.), blogs, and other platforms (not including contributions to open source projects)?" (single choice)
counts_df = pd.DataFrame(survey_df['TRANSPARENCY.PRIVACY.PRACTICES.GENERAL'].value_counts())
counts_df.columns = ["count"]
counts_df
count | |
---|---|
I include my real name. | 1718 |
I usually use a consistent pseudonym that is easily linked to my real name online. | 1141 |
I don't publish this kind of content online. | 517 |
I usually use a consistent pseudonym that is not linked anywhere with my real name online. | 363 |
I take precautions to use different pseudonymns on different platforms. | 270 |
prop_df = pd.DataFrame((survey_df['TRANSPARENCY.PRIVACY.PRACTICES.GENERAL'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
I include my real name. | 42.85% |
I usually use a consistent pseudonym that is easily linked to my real name online. | 28.46% |
I don't publish this kind of content online. | 12.90% |
I usually use a consistent pseudonym that is not linked anywhere with my real name online. | 9.05% |
I take precautions to use different pseudonymns on different platforms. | 6.73% |
plot_counts_df = pd.DataFrame(survey_df['TRANSPARENCY.PRIVACY.PRACTICES.GENERAL'].value_counts())
idx = ['I include my real name.',
'I usually use a consistent pseudonym that\nis easily linked to my real name online.',
'I don\'t publish this kind of content online.',
'I usually use a consistent pseudonym that\nis not linked anywhere with my real name online.',
'I take precautions to use different\npseudonymns on different platforms.']
plot_counts_df.index = idx
ax = plot_counts_df.plot(kind='barh', figsize=[12,6])
plt.title("Which of the following best describes your\npractices around publishing content online [...] \nnot including contributions to open source projects?")
t = ax.set_xlabel("Count of responses")
"Which of the following best describes your practices when making open source contributions?"
counts_df = pd.DataFrame(survey_df['TRANSPARENCY.PRIVACY.PRACTICES.OSS'].value_counts())
counts_df.columns = ["count"]
counts_df
count | |
---|---|
I include my real name. | 1845 |
I usually contribute using a consistent pseudonym that is easily linked to my real name online. | 766 |
I usually contribute using a consistent pseudonym that is not linked anywhere with my real name online. | 273 |
I take precautions to use different usernames in different projects. | 42 |
prop_df = pd.DataFrame((survey_df['TRANSPARENCY.PRIVACY.PRACTICES.OSS'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
I include my real name. | 63.06% |
I usually contribute using a consistent pseudonym that is easily linked to my real name online. | 26.18% |
I usually contribute using a consistent pseudonym that is not linked anywhere with my real name online. | 9.33% |
I take precautions to use different usernames in different projects. | 1.44% |
plot_counts_df = pd.DataFrame(survey_df['TRANSPARENCY.PRIVACY.PRACTICES.OSS'].value_counts())
idx = ['I include my real name.',
'I usually use a consistent pseudonym that\nis easily linked to my real name online.',
'I usually use a consistent pseudonym that\nis not linked anywhere with my real name online.',
'I take precautions to use different\npseudonymns on different platforms.']
plot_counts_df.index = idx
ax = plot_counts_df.plot(kind='barh', figsize=[12,6])
plt.title("Which of the following best describes your\npractices when making open source contributions?")
t = ax.set_xlabel("Count of responses")
RECEIVED.HELP
counts_df = pd.DataFrame(survey_df['RECEIVED.HELP'].value_counts())
counts_df.columns = ["count"]
counts_df
count | |
---|---|
Yes | 2845 |
No | 1064 |
prop_df = pd.DataFrame((survey_df['RECEIVED.HELP'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
Yes | 72.78% |
No | 27.22% |
ax = pd.DataFrame(survey_df['RECEIVED.HELP'].value_counts()).plot(kind='barh')
plt.title("Have you ever received any kind of help from other people\nrelated to using or contributing to an open source project?")
t = ax.set_xlabel("Count of responses")
counts_df = pd.DataFrame(survey_df['FIND.HELPER'].value_counts())
counts_df.columns = ["count"]
counts_df
count | |
---|---|
I asked for help in a public forum (e.g. in a GitHub Issue, project mailing list, etc.) and someone responded. | 2057 |
I asked a specific person for help. | 403 |
Someone offered me unsolicited help. | 272 |
Other - Please describe | 64 |
prop_df = pd.DataFrame((survey_df['FIND.HELPER'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
I asked for help in a public forum (e.g. in a GitHub Issue, project mailing list, etc.) and someone responded. | 73.57% |
I asked a specific person for help. | 14.41% |
Someone offered me unsolicited help. | 9.73% |
Other - Please describe | 2.29% |
ax = pd.DataFrame(survey_df['FIND.HELPER'].value_counts()).plot(kind='barh')
ax.set_xlabel("Count of responses")
plt.title("How did you find someone to help you?")
t = ax.set_yticklabels(['I asked for help in a public forum\n(e.g. in a GitHub Issue, project mailing list, etc.)\nand someone responded.',
'I asked a specific person for help.',
'Someone offered me unsolicited help.',
'Other - Please describe'])
HELPER.PRIOR.RELATIONSHIP
counts_df = pd.DataFrame(survey_df['HELPER.PRIOR.RELATIONSHIP'].value_counts())
counts_df.columns = ["count"]
counts_df
count | |
---|---|
Total strangers, I didn't know of them previously. | 1565 |
I knew of them through their contributions to projects, but didn't know them personally. | 809 |
We knew each other a little. | 211 |
We knew each other well. | 208 |
prop_df = pd.DataFrame((survey_df['HELPER.PRIOR.RELATIONSHIP'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
Total strangers, I didn't know of them previously. | 56.03% |
I knew of them through their contributions to projects, but didn't know them personally. | 28.97% |
We knew each other a little. | 7.55% |
We knew each other well. | 7.45% |
ax = pd.DataFrame(survey_df['HELPER.PRIOR.RELATIONSHIP'].value_counts()).plot(kind='barh')
plt.title("Which best describes your prior\nrelationship with the person who helped you?")
t = ax.set_xlabel("Count of responses")
RECEIVED.HELP.TYPE
counts_df = pd.DataFrame(survey_df['RECEIVED.HELP.TYPE'].value_counts())
counts_df.columns = ["count"]
counts_df
count | |
---|---|
Writing code or otherwise implementing ideas. | 1633 |
Installing or using an application. | 820 |
Understanding community norms (e.g. how to submit a contribution, how to communicate effectively). | 181 |
Other (please describe) | 142 |
Introductions to other people | 13 |
prop_df = pd.DataFrame((survey_df['RECEIVED.HELP.TYPE'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
Writing code or otherwise implementing ideas. | 58.55% |
Installing or using an application. | 29.40% |
Understanding community norms (e.g. how to submit a contribution, how to communicate effectively). | 6.49% |
Other (please describe) | 5.09% |
Introductions to other people | 0.47% |
ax = pd.DataFrame(survey_df['RECEIVED.HELP.TYPE'].value_counts()).plot(kind='barh')
plt.title("What kind of problem did they help you with?")
t = ax.set_xlabel("Count of responses")
t = ax.set_yticklabels(['Writing code or otherwise implementing ideas.',
'Installing or using an application.',
'Understanding community norms (e.g. how to submit\na contribution, how to communicate effectively).',
'Other (please describe)', 'Introductions to other people'])
PROVIDED.HELP
counts_df = pd.DataFrame(survey_df['PROVIDED.HELP'].value_counts())
counts_df.columns = ["count"]
counts_df
count | |
---|---|
Yes | 2891 |
No | 1013 |
prop_df = pd.DataFrame((survey_df['PROVIDED.HELP'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
Yes | 74.05% |
No | 25.95% |
ax = pd.DataFrame(survey_df['PROVIDED.HELP'].value_counts()).plot(kind='barh')
plt.title("Have you ever provided help for another\nperson on an open source project?")
t = ax.set_xlabel("Count of responses")
FIND.HELPEE
counts_df = pd.DataFrame(survey_df['FIND.HELPEE'].value_counts())
counts_df.columns = ["count"]
counts_df
count | |
---|---|
They asked for help in a public forum (e.g. in a GitHub Issue, project mailing list, etc.) and I responded. | 1839 |
They asked me directly for help. | 566 |
I reached out to them to offer unsolicited help. | 405 |
Other (please describe) | 28 |
prop_df = pd.DataFrame((survey_df['FIND.HELPEE'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
They asked for help in a public forum (e.g. in a GitHub Issue, project mailing list, etc.) and I responded. | 64.80% |
They asked me directly for help. | 19.94% |
I reached out to them to offer unsolicited help. | 14.27% |
Other (please describe) | 0.99% |
ax = pd.DataFrame(survey_df['FIND.HELPEE'].value_counts()).plot(kind='barh')
ax.set_xlabel("Count of responses")
t = plt.title("How did you come to help this person?")
t = ax.set_yticklabels(['They asked for help in a public forum\n(e.g. in a GitHub Issue, project mailing list, etc.)\n and I responded.',
'They asked me directly for help.',
'I reached out to them to offer unsolicited help.',
'Other (please describe)'])
HELPEE.PRIOR.RELATIONSHIP
counts_df = pd.DataFrame(survey_df['HELPEE.PRIOR.RELATIONSHIP'].value_counts())
counts_df.columns = ["count"]
counts_df
count | |
---|---|
Total strangers, I didn't know of them previously. | 1984 |
We knew each other well. | 292 |
I knew of them through their contributions to projects, but didn't know them personally. | 288 |
We knew each other a little. | 275 |
prop_df = pd.DataFrame((survey_df['HELPEE.PRIOR.RELATIONSHIP'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
Total strangers, I didn't know of them previously. | 69.88% |
We knew each other well. | 10.29% |
I knew of them through their contributions to projects, but didn't know them personally. | 10.14% |
We knew each other a little. | 9.69% |
ax = pd.DataFrame(survey_df['HELPEE.PRIOR.RELATIONSHIP'].value_counts()).plot(kind='barh')
plt.title("Which best describes your prior\nrelationship with the person you helped?")
t = ax.set_xlabel("Count of responses")
PROVIDED.HELP.TYPE
counts_df = pd.DataFrame(survey_df['PROVIDED.HELP.TYPE'].value_counts())
counts_df.columns = ["count"]
counts_df
count | |
---|---|
Writing code or otherwise implementing ideas. | 1602 |
Installing or using an application. | 1028 |
Other (please describe) | 101 |
Understanding community norms (e.g. how to submit a contribution, how to communicate effectively). | 99 |
Introductions to other people. | 8 |
prop_df = pd.DataFrame((survey_df['PROVIDED.HELP.TYPE'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
Writing code or otherwise implementing ideas. | 56.45% |
Installing or using an application. | 36.22% |
Other (please describe) | 3.56% |
Understanding community norms (e.g. how to submit a contribution, how to communicate effectively). | 3.49% |
Introductions to other people. | 0.28% |
ax = pd.DataFrame(survey_df['PROVIDED.HELP.TYPE'].value_counts()).plot(kind='barh')
plt.title("What kind of problem did you help them with?")
t = ax.set_xlabel("Count of responses")
t = ax.set_yticklabels(['Writing code or otherwise implementing ideas.',
'Installing or using an application.', 'Other (please describe)',
'Understanding community norms (e.g. how to\nsubmit a contribution, how to communicate effectively).',
'Introductions to other people.'])
OSS.AS.JOB: Do you contribute to open source as part of your professional work? In other words, are you paid for any of your time spent on open source contributions?
counts_df = pd.DataFrame(survey_df['OSS.AS.JOB'].value_counts())
counts_df.columns = ["count"]
counts_df
count | |
---|---|
Yes, indirectly- I contribute to open source in carrying out my work duties, but I am not required or expected to do so. | 896 |
No. | 687 |
Yes, directly- some or all of my work duties include contributing to open source projects. | 464 |
prop_df = pd.DataFrame((survey_df['OSS.AS.JOB'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
Yes, indirectly- I contribute to open source in carrying out my work duties, but I am not required or expected to do so. | 43.77% |
No. | 33.56% |
Yes, directly- some or all of my work duties include contributing to open source projects. | 22.67% |
oss_as_job_df = pd.DataFrame(survey_df['OSS.AS.JOB'].value_counts())
oss_as_job_df.index = ["Yes, indirectly", "No", "Yes, directly"]
ax = oss_as_job_df.plot(kind='barh')
plt.title("Do you contribute to open source\nas part of your professional work?")
t = ax.set_xlabel("Count of responses")
OSS.AT.WORK
counts_df = pd.DataFrame(survey_df['OSS.AT.WORK'].value_counts())
counts_df.columns = ["count"]
counts_df
count | |
---|---|
Frequently | 2191 |
Sometimes | 300 |
Rarely | 110 |
Never | 65 |
prop_df = pd.DataFrame((survey_df['OSS.AT.WORK'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
Frequently | 82.18% |
Sometimes | 11.25% |
Rarely | 4.13% |
Never | 2.44% |
ax = pd.DataFrame(survey_df['OSS.AT.WORK'].value_counts()).plot(kind='barh')
plt.title("How often do you use open source\nsoftware in your professional work?")
t = ax.set_xlabel("Count of responses")
OSS.IP.POLICY
counts_df = pd.DataFrame(survey_df['OSS.IP.POLICY'].value_counts())
counts_df.columns = ["count"]
counts_df
count | |
---|---|
I am free to contribute without asking for permission. | 1178 |
My employer doesn't have a clear policy on this. | 695 |
I am permitted to contribute to open source, but need to ask for permission. | 287 |
I'm not sure. | 238 |
Not applicable | 180 |
I am not permitted to contribute to open source at all. | 63 |
prop_df = pd.DataFrame((survey_df['OSS.IP.POLICY'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
I am free to contribute without asking for permission. | 44.60% |
My employer doesn't have a clear policy on this. | 26.32% |
I am permitted to contribute to open source, but need to ask for permission. | 10.87% |
I'm not sure. | 9.01% |
Not applicable | 6.82% |
I am not permitted to contribute to open source at all. | 2.39% |
ax = pd.DataFrame(survey_df['OSS.IP.POLICY'].value_counts()).plot(kind='barh')
plt.title("How does your employer's intellectual property\nagreement/policy affect your free-time contributions\nto open source unrelated to your work?")
t = ax.set_xlabel("Count of responses")
counts_df = pd.DataFrame(survey_df['EMPLOYER.POLICY.APPLICATIONS'].value_counts())
counts_df.columns = ["count"]
counts_df
count | |
---|---|
Use of open source applications is encouraged. | 1174 |
Use of open source applications is acceptable if it is the most appropriate tool. | 916 |
My employer doesn't have a clear policy on this. | 338 |
Not applicable | 88 |
I'm not sure. | 83 |
Use of open source applications is rarely, if ever, permitted. | 42 |
prop_df = pd.DataFrame((survey_df['EMPLOYER.POLICY.APPLICATIONS'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
Use of open source applications is encouraged. | 44.45% |
Use of open source applications is acceptable if it is the most appropriate tool. | 34.68% |
My employer doesn't have a clear policy on this. | 12.80% |
Not applicable | 3.33% |
I'm not sure. | 3.14% |
Use of open source applications is rarely, if ever, permitted. | 1.59% |
ax = pd.DataFrame(survey_df['EMPLOYER.POLICY.APPLICATIONS'].value_counts()).plot(kind='barh')
ax.set_xlabel("Count of responses")
t = plt.title("Which is closest to your employer’s policy\non using open source software applications?")
OSS.HIRING
counts_df = pd.DataFrame(survey_df['OSS.HIRING'].value_counts())
counts_df.columns = ["count"]
counts_df
count | |
---|---|
Very important | 618 |
Somewhat important | 448 |
Not at all important | 361 |
Not too important | 352 |
Not applicable-I hadn't made any contributions when I got this job. | 254 |
prop_df = pd.DataFrame((survey_df['OSS.HIRING'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
Very important | 30.40% |
Somewhat important | 22.04% |
Not at all important | 17.76% |
Not too important | 17.31% |
Not applicable-I hadn't made any contributions when I got this job. | 12.49% |
ax = pd.DataFrame(survey_df['OSS.HIRING'].value_counts()).plot(kind='barh')
ax.set_xlabel("Count of responses")
t = plt.title("How important do you think your involvement\nin open source was to getting your current job?")
IMMIGRATION
counts_df = pd.DataFrame(survey_df['IMMIGRATION'].value_counts())
counts_df.columns = ["count"]
counts_df
count | |
---|---|
No, I live in the country where I was born. | 2764 |
Yes, and I intend to stay permanently. | 513 |
Yes, and I am not sure about my future plans. | 292 |
Yes, and I intend to stay temporarily. | 165 |
prop_df = pd.DataFrame((survey_df['IMMIGRATION'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
No, I live in the country where I was born. | 74.02% |
Yes, and I intend to stay permanently. | 13.74% |
Yes, and I am not sure about my future plans. | 7.82% |
Yes, and I intend to stay temporarily. | 4.42% |
ax = pd.DataFrame(survey_df['IMMIGRATION'].value_counts()).plot(kind='barh')
ax.set_xlabel("Count of responses")
t = plt.title("Do you currently live in a country other\nthan the one in which you were born?")
MINORITY.HOMECOUNTRY
counts_df = pd.DataFrame(survey_df['MINORITY.HOMECOUNTRY'].value_counts())
counts_df.columns = ["count"]
counts_df
count | |
---|---|
No | 754 |
Yes | 124 |
Not sure | 45 |
Prefer not to say | 34 |
prop_df = pd.DataFrame((survey_df['MINORITY.HOMECOUNTRY'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
No | 78.79% |
Yes | 12.96% |
Not sure | 4.70% |
Prefer not to say | 3.55% |
ax = pd.DataFrame(survey_df['MINORITY.HOMECOUNTRY'].value_counts()).plot(kind='barh')
ax.set_xlabel("Count of responses")
t = plt.title("Thinking of where you were born, are\nyou a member of an ethnicity or nationality that\nis a considered a minority in that country?")
MINORITY.CURRENT.COUNTRY
counts_df = pd.DataFrame(survey_df['MINORITY.CURRENT.COUNTRY'].value_counts())
counts_df.columns = ["count"]
counts_df
count | |
---|---|
No | 2837 |
Yes | 546 |
Not sure | 193 |
Prefer not to say | 156 |
prop_df = pd.DataFrame((survey_df['MINORITY.CURRENT.COUNTRY'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
No | 76.02% |
Yes | 14.63% |
Not sure | 5.17% |
Prefer not to say | 4.18% |
ax = pd.DataFrame(survey_df['MINORITY.CURRENT.COUNTRY'].value_counts()).plot(kind='barh')
ax.set_xlabel("Count of responses")
t = plt.title("Thinking of where you currently live, are you\na member of an ethnicity or nationality that is a\nconsidered a minority in that country?")
GENDER
counts_df = pd.DataFrame(survey_df['GENDER'].value_counts())
counts_df.columns = ["count"]
counts_df
count | |
---|---|
Man | 3387 |
Prefer not to say | 173 |
Woman | 125 |
Non-binary or Other | 39 |
prop_df = pd.DataFrame((survey_df['GENDER'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
Man | 90.95% |
Prefer not to say | 4.65% |
Woman | 3.36% |
Non-binary or Other | 1.05% |
ax = pd.DataFrame(survey_df['GENDER'].value_counts()).plot(kind='barh')
ax.set_xlabel("Count of responses")
t = plt.title("What is your gender?")
TRANSGENDER.IDENTITY
counts_df = pd.DataFrame(survey_df['TRANSGENDER.IDENTITY'].value_counts())
counts_df.columns = ["count"]
counts_df
count | |
---|---|
No | 3494 |
Prefer not to say | 158 |
Yes | 33 |
Not sure | 30 |
prop_df = pd.DataFrame((survey_df['TRANSGENDER.IDENTITY'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
No | 94.05% |
Prefer not to say | 4.25% |
Yes | 0.89% |
Not sure | 0.81% |
ax = pd.DataFrame(survey_df['TRANSGENDER.IDENTITY'].value_counts()).plot(kind='barh')
ax.set_xlabel("Count of responses")
t = plt.title("Do you identify as transgender?")
SEXUAL.ORIENTATION
counts_df = pd.DataFrame(survey_df['SEXUAL.ORIENTATION'].value_counts())
counts_df.columns = ["count"]
counts_df
count | |
---|---|
No | 3187 |
Yes | 246 |
Prefer not to say | 201 |
Not sure | 85 |
prop_df = pd.DataFrame((survey_df['SEXUAL.ORIENTATION'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
No | 85.70% |
Yes | 6.61% |
Prefer not to say | 5.40% |
Not sure | 2.29% |
ax = pd.DataFrame(survey_df['SEXUAL.ORIENTATION'].value_counts()).plot(kind='barh')
ax.set_xlabel("Count of responses")
t = plt.title("Do you identify as gay, lesbian, or bisexual,\nasexual, or any other minority sexual orientation?")
WRITTEN.ENGLISH
counts_df = pd.DataFrame(survey_df['WRITTEN.ENGLISH'].value_counts())
counts_df.columns = ["count"]
counts_df
count | |
---|---|
Very well | 2865 |
Moderately well | 742 |
Not very well | 108 |
Not at all | 6 |
prop_df = pd.DataFrame((survey_df['WRITTEN.ENGLISH'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
Very well | 77.00% |
Moderately well | 19.94% |
Not very well | 2.90% |
Not at all | 0.16% |
ax = pd.DataFrame(survey_df['WRITTEN.ENGLISH'].value_counts()).plot(kind='barh')
ax.set_xlabel("Count of responses")
t = plt.title("How well can you read and write in English?")
AGE
counts_df = pd.DataFrame(survey_df['AGE'].value_counts().sort_index())
counts_df.columns = ["count"]
counts_df
count | |
---|---|
17 or younger | 139 |
18 to 24 years | 871 |
25 to 34 years | 1400 |
35 to 44 years | 772 |
45 to 54 years | 267 |
55 to 64 years | 93 |
65 years or older | 36 |
prop_df = pd.DataFrame((survey_df['AGE'].value_counts(normalize=True).round(4)*100).sort_index())
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
17 or younger | 3.88% |
18 to 24 years | 24.34% |
25 to 34 years | 39.13% |
35 to 44 years | 21.58% |
45 to 54 years | 7.46% |
55 to 64 years | 2.60% |
65 years or older | 1.01% |
ax = pd.DataFrame(survey_df['AGE'].value_counts().sort_index()).plot(kind='barh')
ax.set_xlabel("Count of responses")
t = plt.title("What is your age?")
FORMAL.EDUCATION
counts_df = pd.DataFrame(survey_df['FORMAL.EDUCATION'].value_counts())
counts_df.columns = ["count"]
counts_df
count | |
---|---|
Bachelor's degree | 1321 |
Master's degree | 852 |
Some college, no degree | 640 |
Secondary (high) school graduate or equivalent | 375 |
Doctorate (Ph.D.) or other advanced degree (e.g. M.D., J.D.) | 256 |
Vocational/trade program or apprenticeship | 127 |
Less than secondary (high) school | 126 |
prop_df = pd.DataFrame((survey_df['FORMAL.EDUCATION'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
Bachelor's degree | 35.73% |
Master's degree | 23.05% |
Some college, no degree | 17.31% |
Secondary (high) school graduate or equivalent | 10.14% |
Doctorate (Ph.D.) or other advanced degree (e.g. M.D., J.D.) | 6.92% |
Vocational/trade program or apprenticeship | 3.44% |
Less than secondary (high) school | 3.41% |
order = ["Less than secondary (high) school",
"Secondary (high) school graduate or equivalent",
"Vocational/trade program or apprenticeship",
"Some college, no degree",
"Bachelor's degree",
"Master's degree",
"Doctorate (Ph.D.) or other advanced degree (e.g. M.D., J.D.)"]
edu_counts_df = survey_df['FORMAL.EDUCATION'].value_counts()[order]
ax = edu_counts_df.plot(kind='barh')
ax.set_xlabel("Count of responses")
t = plt.title("What is highest level of formal education that you have completed?")
PARENTS.FORMAL.EDUCATION
counts_df = pd.DataFrame(survey_df['PARENTS.FORMAL.EDUCATION'].value_counts())
counts_df.columns = ["count"]
counts_df
count | |
---|---|
Bachelor's degree | 961 |
Master's degree | 871 |
Secondary (high) school graduate or equivalent | 566 |
Some college, no degree | 388 |
Doctorate (Ph.D.) or other advanced degree (e.g. M.D., J.D.) | 387 |
Vocational/trade program or apprenticeship | 257 |
Less than secondary (high) school | 243 |
prop_df = pd.DataFrame((survey_df['PARENTS.FORMAL.EDUCATION'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
Bachelor's degree | 26.16% |
Master's degree | 23.71% |
Secondary (high) school graduate or equivalent | 15.41% |
Some college, no degree | 10.56% |
Doctorate (Ph.D.) or other advanced degree (e.g. M.D., J.D.) | 10.54% |
Vocational/trade program or apprenticeship | 7.00% |
Less than secondary (high) school | 6.62% |
order = ["Less than secondary (high) school",
"Secondary (high) school graduate or equivalent",
"Vocational/trade program or apprenticeship",
"Some college, no degree",
"Bachelor's degree",
"Master's degree",
"Doctorate (Ph.D.) or other advanced degree (e.g. M.D., J.D.)"]
edu_counts_df = survey_df['PARENTS.FORMAL.EDUCATION'].value_counts()[order]
ax = edu_counts_df.plot(kind='barh')
ax.set_xlabel("Count of responses")
t = plt.title("What is highest level of formal education that either of your parents completed?")
AGE.AT.FIRST.COMPUTER.INTERNET
counts_df = pd.DataFrame(survey_df['AGE.AT.FIRST.COMPUTER.INTERNET'].value_counts())
counts_df.columns = ["count"]
counts_df
count | |
---|---|
Younger than 13 years old | 1478 |
13 - 17 years old | 1313 |
18 - 24 years old | 695 |
25 - 45 years old | 202 |
Older than 45 years old | 23 |
prop_df = pd.DataFrame((survey_df['AGE.AT.FIRST.COMPUTER.INTERNET'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
Younger than 13 years old | 39.83% |
13 - 17 years old | 35.38% |
18 - 24 years old | 18.73% |
25 - 45 years old | 5.44% |
Older than 45 years old | 0.62% |
ax = pd.DataFrame(survey_df['AGE.AT.FIRST.COMPUTER.INTERNET'].value_counts()).plot(kind='barh')
ax.set_xlabel("Count of responses")
t = plt.title("How old were you when you first had regular\naccess to a computer with an internet connection?")
LOCATION.OF.FIRST.COMPUTER.INTERNET
counts_df = pd.DataFrame(survey_df['LOCATION.OF.FIRST.COMPUTER.INTERNET'].value_counts())
counts_df.columns = ["count"]
counts_df
count | |
---|---|
At home (belonging to me or a family member) | 2520 |
In a classroom, computer lab, or library at school | 746 |
At an internet cafe or similar space | 182 |
Other (please describe) | 106 |
At a public library or community center | 87 |
At work (recoded from open ended) | 70 |
prop_df = pd.DataFrame((survey_df['LOCATION.OF.FIRST.COMPUTER.INTERNET'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
At home (belonging to me or a family member) | 67.91% |
In a classroom, computer lab, or library at school | 20.10% |
At an internet cafe or similar space | 4.90% |
Other (please describe) | 2.86% |
At a public library or community center | 2.34% |
At work (recoded from open ended) | 1.89% |
ax = pd.DataFrame(survey_df['LOCATION.OF.FIRST.COMPUTER.INTERNET'].value_counts()).plot(kind='barh', figsize=[8.5,6])
ax.set_xlabel("Count of responses")
t = plt.title("Where did you first have regular access to a\ncomputer with internet connection?")
POPLATION
counts_df = pd.DataFrame(survey_df['POPULATION'].value_counts())
counts_df.columns = ["count"]
counts_df
count | |
---|---|
github | 5495 |
off site community | 534 |
prop_df = pd.DataFrame((survey_df['POPULATION'].value_counts(normalize=True).round(4)*100))
prop_df.columns=["percent"]
prop_df
percent | |
---|---|
github | 91.14% |
off site community | 8.86% |
ax = pd.DataFrame(survey_df['POPULATION'].value_counts()).plot(kind='barh', figsize=[8.5,6])
ax.set_xlabel("Count of responses")
t = plt.title("Where was the respondent surveyed from?")
DISCOURAGING.BEHAVIOR.*
discouraging_responses = survey_df[discouraging_vars].apply(pd.Series.value_counts).transpose()[["Yes", "No"]]
discouraging_responses
Yes | No | |
---|---|---|
DISCOURAGING.BEHAVIOR.LACK.OF.RESPONSE | 3017 | 792 |
DISCOURAGING.BEHAVIOR.REJECTION.WOUT.EXPLANATION | 1210 | 2580 |
DISCOURAGING.BEHAVIOR.DISMISSIVE.RESPONSE | 2195 | 1598 |
DISCOURAGING.BEHAVIOR.BAD.DOCS | 3559 | 263 |
DISCOURAGING.BEHAVIOR.CONFLICT | 1830 | 1966 |
DISCOURAGING.BEHAVIOR.UNWELCOMING.LANGUAGE | 649 | 3158 |
discouraging_percent = pd.DataFrame(discouraging_responses["Yes"] / discouraging_responses.sum(axis=1) * 100, columns=["percent_yes"]).sort_values(by="percent_yes")
discouraging_percent.round(2)
percent_yes | |
---|---|
DISCOURAGING.BEHAVIOR.UNWELCOMING.LANGUAGE | 17.05% |
DISCOURAGING.BEHAVIOR.REJECTION.WOUT.EXPLANATION | 31.93% |
DISCOURAGING.BEHAVIOR.CONFLICT | 48.21% |
DISCOURAGING.BEHAVIOR.DISMISSIVE.RESPONSE | 57.87% |
DISCOURAGING.BEHAVIOR.LACK.OF.RESPONSE | 79.21% |
DISCOURAGING.BEHAVIOR.BAD.DOCS | 93.12% |
sns.set(style="whitegrid", font_scale=1.75)
fig, ax = plt.subplots()
cmap=matplotlib.cm.Blues_r
discouraging_responses.sort_values(by="No").plot.barh(stacked=True, ax=ax, figsize=[12,6], cmap=cmap, edgecolor='black', linewidth=1)
labels = []
for l in ax.get_yticklabels():
title_text = l.get_text()[22:].replace(".", " ") # cut off "CONTRIBUTOR.TYPE"
labels.append(title_text)
ax.set_yticklabels(labels)
plt.title("Have you ever observed any of the following in the context of an open source project?")
plt.xlabel("Number of responses")
legend = plt.legend(fancybox=True, loc='upper center', bbox_to_anchor=(.5, -.13), ncol=4, shadow=True)
legend.get_frame().set_edgecolor('b')
legend.get_frame().set_facecolor('white')
NEGATIVE.WITNESS.*
neg_witness_responses = neg_df[neg_witness_vars].apply(pd.Series.value_counts).transpose()[[1,0]]
neg_witness_responses.columns = ["Yes", "Blank"]
neg_witness_responses
Yes | Blank | |
---|---|---|
NEGATIVE.WITNESS.RUDENESS | 1753 | 4276 |
NEGATIVE.WITNESS.NAME.CALLING | 789 | 5240 |
NEGATIVE.WITNESS.THREATS | 162 | 5867 |
NEGATIVE.WITNESS.IMPERSONATION | 177 | 5852 |
NEGATIVE.WITNESS.SUSTAINED.HARASSMENT | 237 | 5792 |
NEGATIVE.WITNESS.CROSS.PLATFORM.HARASSMENT | 175 | 5854 |
NEGATIVE.WITNESS.STALKING | 108 | 5921 |
NEGATIVE.WITNESS.SEXUAL.ADVANCES | 136 | 5893 |
NEGATIVE.WITNESS.STEREOTYPING | 423 | 5606 |
NEGATIVE.WITNESS.DOXXING | 151 | 5878 |
NEGATIVE.WITNESS.OTHER | 78 | 5951 |
NEGATIVE.WITNESS.NONE.OF.THE.ABOVE | 1721 | 4308 |
NEGATIVE.WITNESS.ANY.RESPONSE | 3664 | 2365 |
Only 3,664 respondents clicked any boxes in this question, meaning 2,365 did not click the "none of the above" or an option (or even get to this question). We have to adjust the no responses accordingly.
neg_witness_responses_adj = neg_witness_responses
neg_witness_responses_adj["Blank"] = neg_witness_responses_adj["Blank"] - 2365
neg_witness_responses_adj_df = pd.DataFrame(neg_witness_responses_adj["Yes"] / (neg_witness_responses_adj["Yes"] + neg_witness_responses_adj["Blank"]) * 100, columns=["percent_yes"])
neg_witness_responses_adj.columns = ["Yes", "No"]
neg_witness_responses_adj[:-1]
Yes | No | |
---|---|---|
NEGATIVE.WITNESS.RUDENESS | 1753 | 1911 |
NEGATIVE.WITNESS.NAME.CALLING | 789 | 2875 |
NEGATIVE.WITNESS.THREATS | 162 | 3502 |
NEGATIVE.WITNESS.IMPERSONATION | 177 | 3487 |
NEGATIVE.WITNESS.SUSTAINED.HARASSMENT | 237 | 3427 |
NEGATIVE.WITNESS.CROSS.PLATFORM.HARASSMENT | 175 | 3489 |
NEGATIVE.WITNESS.STALKING | 108 | 3556 |
NEGATIVE.WITNESS.SEXUAL.ADVANCES | 136 | 3528 |
NEGATIVE.WITNESS.STEREOTYPING | 423 | 3241 |
NEGATIVE.WITNESS.DOXXING | 151 | 3513 |
NEGATIVE.WITNESS.OTHER | 78 | 3586 |
NEGATIVE.WITNESS.NONE.OF.THE.ABOVE | 1721 | 1943 |
neg_witness_responses_adj_df.sort_values(by="percent_yes").round(2)
percent_yes | |
---|---|
NEGATIVE.WITNESS.OTHER | 2.13% |
NEGATIVE.WITNESS.STALKING | 2.95% |
NEGATIVE.WITNESS.SEXUAL.ADVANCES | 3.71% |
NEGATIVE.WITNESS.DOXXING | 4.12% |
NEGATIVE.WITNESS.THREATS | 4.42% |
NEGATIVE.WITNESS.CROSS.PLATFORM.HARASSMENT | 4.78% |
NEGATIVE.WITNESS.IMPERSONATION | 4.83% |
NEGATIVE.WITNESS.SUSTAINED.HARASSMENT | 6.47% |
NEGATIVE.WITNESS.STEREOTYPING | 11.54% |
NEGATIVE.WITNESS.NAME.CALLING | 21.53% |
NEGATIVE.WITNESS.NONE.OF.THE.ABOVE | 46.97% |
NEGATIVE.WITNESS.RUDENESS | 47.84% |
NEGATIVE.WITNESS.ANY.RESPONSE | 100.00% |
sns.set(style="whitegrid", font_scale=1.75)
fig, ax = plt.subplots()
cmap=matplotlib.cm.Blues_r
neg_witness_responses_adj[:-1].sort_values(by='No').plot.barh(stacked=True, ax=ax, figsize=[12,6], cmap=cmap, edgecolor='black', linewidth=1)
labels = []
for l in ax.get_yticklabels():
title_text = l.get_text()[17:].replace(".", " ") # cut off "CONTRIBUTOR.TYPE"
labels.append(title_text)
ax.set_yticklabels(labels)
plt.title("Have you ever witnessed any of the following behaviors\ndirected at another person in the context of an open source\nproject? (not including something directed at you)")
plt.xlabel("Number of responses")
legend = plt.legend(fancybox=True, loc='upper center', bbox_to_anchor=(.5, -.13), ncol=4, shadow=True)
legend.get_frame().set_edgecolor('b')
legend.get_frame().set_facecolor('white')
neg_exp_responses = neg_df[neg_exp_vars].apply(pd.Series.value_counts).transpose()[[1,0]]
neg_exp_responses.columns = ["Yes", "Blank"]
neg_exp_responses
Yes | Blank | |
---|---|---|
NEGATIVE.EXPERIENCE.RUDENESS | 646 | 5383 |
NEGATIVE.EXPERIENCE.NAME.CALLING | 192 | 5837 |
NEGATIVE.EXPERIENCE.THREATS | 43 | 5986 |
NEGATIVE.EXPERIENCE.IMPERSONATION | 45 | 5984 |
NEGATIVE.EXPERIENCE.SUSTAINED.HARASSMENT | 55 | 5974 |
NEGATIVE.EXPERIENCE.CROSS.PLATFORM.HARASSMENT | 42 | 5987 |
NEGATIVE.EXPERIENCE.STALKING | 35 | 5994 |
NEGATIVE.EXPERIENCE.SEXUAL.ADVANCES | 25 | 6004 |
NEGATIVE.EXPERIENCE.STEREOTYPING | 114 | 5915 |
NEGATIVE.EXPERIENCE.DOXXING | 23 | 6006 |
NEGATIVE.EXPERIENCE.OTHER | 39 | 5990 |
NEGATIVE.EXPERIENCE.NONE.OF.THE.ABOVE | 2900 | 3129 |
NEGATIVE.EXPERIENCE.ANY.RESPONSE | 3638 | 2391 |
Only 3,638 respondents clicked any boxes in this question, meaning 2,391 did not click the "none of the above" (or even get to this question). We have to adjust the no responses accordingly.
neg_exp_responses_adj = neg_exp_responses
neg_exp_responses_adj["Blank"] = neg_exp_responses_adj["Blank"] - 2391
neg_exp_responses_adj_df = pd.DataFrame(neg_exp_responses["Yes"] / (neg_exp_responses_adj["Yes"] + neg_exp_responses_adj["Blank"]) * 100, columns=["percent_yes"])
neg_exp_responses_adj.columns = ["Yes", "No"]
neg_exp_responses_adj[:-1]
Yes | No | |
---|---|---|
NEGATIVE.EXPERIENCE.RUDENESS | 646 | 2992 |
NEGATIVE.EXPERIENCE.NAME.CALLING | 192 | 3446 |
NEGATIVE.EXPERIENCE.THREATS | 43 | 3595 |
NEGATIVE.EXPERIENCE.IMPERSONATION | 45 | 3593 |
NEGATIVE.EXPERIENCE.SUSTAINED.HARASSMENT | 55 | 3583 |
NEGATIVE.EXPERIENCE.CROSS.PLATFORM.HARASSMENT | 42 | 3596 |
NEGATIVE.EXPERIENCE.STALKING | 35 | 3603 |
NEGATIVE.EXPERIENCE.SEXUAL.ADVANCES | 25 | 3613 |
NEGATIVE.EXPERIENCE.STEREOTYPING | 114 | 3524 |
NEGATIVE.EXPERIENCE.DOXXING | 23 | 3615 |
NEGATIVE.EXPERIENCE.OTHER | 39 | 3599 |
NEGATIVE.EXPERIENCE.NONE.OF.THE.ABOVE | 2900 | 738 |
neg_exp_responses_adj_df.sort_values(by="percent_yes").round(2)
percent_yes | |
---|---|
NEGATIVE.EXPERIENCE.DOXXING | 0.63% |
NEGATIVE.EXPERIENCE.SEXUAL.ADVANCES | 0.69% |
NEGATIVE.EXPERIENCE.STALKING | 0.96% |
NEGATIVE.EXPERIENCE.OTHER | 1.07% |
NEGATIVE.EXPERIENCE.CROSS.PLATFORM.HARASSMENT | 1.15% |
NEGATIVE.EXPERIENCE.THREATS | 1.18% |
NEGATIVE.EXPERIENCE.IMPERSONATION | 1.24% |
NEGATIVE.EXPERIENCE.SUSTAINED.HARASSMENT | 1.51% |
NEGATIVE.EXPERIENCE.STEREOTYPING | 3.13% |
NEGATIVE.EXPERIENCE.NAME.CALLING | 5.28% |
NEGATIVE.EXPERIENCE.RUDENESS | 17.76% |
NEGATIVE.EXPERIENCE.NONE.OF.THE.ABOVE | 79.71% |
NEGATIVE.EXPERIENCE.ANY.RESPONSE | 100.00% |
sns.set(style="whitegrid", font_scale=1.75)
fig, ax = plt.subplots()
cmap=matplotlib.cm.Blues_r
neg_exp_responses_adj[:-1].sort_values(by='No').plot.barh(stacked=True, ax=ax, figsize=[12,6], cmap=cmap, edgecolor='black', linewidth=1)
labels = []
for l in ax.get_yticklabels():
title_text = l.get_text()[20:].replace(".", " ") # cut off "CONTRIBUTOR.TYPE"
labels.append(title_text)
ax.set_yticklabels(labels)
plt.title("Have you ever experienced any of the following behaviors\ndirected at you in the context of an open source project?")
plt.xlabel("Number of responses")
legend = plt.legend(fancybox=True, loc='upper center', bbox_to_anchor=(.5, -.13), ncol=4, shadow=True)
legend.get_frame().set_edgecolor('b')
legend.get_frame().set_facecolor('white')
NEGATIVE.RESPONSE.*
neg_resp_responses = neg_df[neg_resp_vars].apply(pd.Series.value_counts).transpose()[[1,0]]
neg_resp_responses.columns = ["Yes", "Blank"]
neg_resp_responses
Yes | Blank | |
---|---|---|
NEGATIVE.RESPONSE.ASKED.USER.TO.STOP | 194 | 5835 |
NEGATIVE.RESPONSE.SOLICITED.COMMUNITY.SUPPORT | 112 | 5917 |
NEGATIVE.RESPONSE.BLOCKED.USER | 170 | 5859 |
NEGATIVE.RESPONSE.REPORTED.TO.MAINTAINERS | 95 | 5934 |
NEGATIVE.RESPONSE.REPORTED.TO.HOST.OR.ISP | 20 | 6009 |
NEGATIVE.RESPONSE.CONSULTED.LEGAL.COUNSEL | 8 | 6021 |
NEGATIVE.RESPONSE.CONTACTED.LAW.ENFORCEMENT | 9 | 6020 |
NEGATIVE.RESPONSE.OTHER | 71 | 5958 |
NEGATIVE.RESPONSE.IGNORED | 350 | 5679 |
NEGATIVE.RESPONSE.ANY.RESPONSE | 719 | 5310 |
Only 719 respondents clicked any boxes in this question, meaning 5,310 did not click on "I did not react / ignored the incident" or any response (or even get to this question). We have to adjust the no responses accordingly.
neg_resp_responses_adj = neg_resp_responses
neg_resp_responses_adj["Blank"] = neg_resp_responses_adj["Blank"] - 5310
neg_resp_responses_adj_df = pd.DataFrame(neg_resp_responses_adj["Yes"] / (neg_resp_responses_adj["Yes"] + neg_resp_responses_adj["Blank"]) * 100, columns=["percent_yes"])
neg_resp_responses_adj.columns = ["Yes", "No"]
neg_resp_responses_adj[:-1]
Yes | No | |
---|---|---|
NEGATIVE.RESPONSE.ASKED.USER.TO.STOP | 194 | 525 |
NEGATIVE.RESPONSE.SOLICITED.COMMUNITY.SUPPORT | 112 | 607 |
NEGATIVE.RESPONSE.BLOCKED.USER | 170 | 549 |
NEGATIVE.RESPONSE.REPORTED.TO.MAINTAINERS | 95 | 624 |
NEGATIVE.RESPONSE.REPORTED.TO.HOST.OR.ISP | 20 | 699 |
NEGATIVE.RESPONSE.CONSULTED.LEGAL.COUNSEL | 8 | 711 |
NEGATIVE.RESPONSE.CONTACTED.LAW.ENFORCEMENT | 9 | 710 |
NEGATIVE.RESPONSE.OTHER | 71 | 648 |
NEGATIVE.RESPONSE.IGNORED | 350 | 369 |
neg_resp_responses_adj_df.sort_values(by="percent_yes").round(2)
percent_yes | |
---|---|
NEGATIVE.RESPONSE.CONSULTED.LEGAL.COUNSEL | 1.11% |
NEGATIVE.RESPONSE.CONTACTED.LAW.ENFORCEMENT | 1.25% |
NEGATIVE.RESPONSE.REPORTED.TO.HOST.OR.ISP | 2.78% |
NEGATIVE.RESPONSE.OTHER | 9.87% |
NEGATIVE.RESPONSE.REPORTED.TO.MAINTAINERS | 13.21% |
NEGATIVE.RESPONSE.SOLICITED.COMMUNITY.SUPPORT | 15.58% |
NEGATIVE.RESPONSE.BLOCKED.USER | 23.64% |
NEGATIVE.RESPONSE.ASKED.USER.TO.STOP | 26.98% |
NEGATIVE.RESPONSE.IGNORED | 48.68% |
NEGATIVE.RESPONSE.ANY.RESPONSE | 100.00% |
sns.set(style="whitegrid", font_scale=1.75)
fig, ax = plt.subplots()
cmap=matplotlib.cm.Blues_r
neg_resp_responses_adj[:-1].sort_values(by='No').plot.barh(stacked=True, ax=ax, figsize=[12,6], cmap=cmap, edgecolor='black', linewidth=1)
labels = []
for l in ax.get_yticklabels():
title_text = l.get_text()[18:].replace(".", " ") # cut off "CONTRIBUTOR.TYPE"
labels.append(title_text)
ax.set_yticklabels(labels)
plt.title("Thinking of the last time you experienced\nharassment, how did you respond?")
plt.xlabel("Number of responses")
legend = plt.legend(fancybox=True, loc='upper center', bbox_to_anchor=(.5, -.13), ncol=4, shadow=True)
legend.get_frame().set_edgecolor('b')
legend.get_frame().set_facecolor('white')
RESPONSE.EFFECTIVENESS.*
neg_effect_responses = neg_df[neg_effect_vars].apply(pd.Series.value_counts).transpose()
neg_effect_responses = neg_effect_responses.replace(np.nan, 0).sort_values(by="Mostly effective")
neg_effect_responses = neg_effect_responses[["Not at all effective", "A little effective", "Somewhat effective", "Mostly effective", "Completely effective"]]
idx = []
for i in neg_effect_responses.index:
idx.append(i[23:].replace(".", " "))
neg_effect_responses.index = idx
neg_effect_responses.astype(int)
Not at all effective | A little effective | Somewhat effective | Mostly effective | Completely effective | |
---|---|---|---|---|---|
CONTACTED LAW ENFORCEMENT | 4 | 0 | 2 | 0 | 3 |
CONSULTED LEGAL COUNSEL | 1 | 1 | 3 | 2 | 1 |
REPORTED TO HOST OR ISP | 6 | 4 | 6 | 3 | 1 |
OTHER | 4 | 0 | 4 | 10 | 11 |
REPORTED TO MAINTAINERS | 10 | 11 | 31 | 30 | 13 |
SOLICITED COMMUNITY SUPPORT | 6 | 22 | 38 | 32 | 14 |
ASKED USER TO STOP | 48 | 51 | 50 | 33 | 11 |
BLOCKED USER | 6 | 20 | 28 | 56 | 58 |
sns.set(style="whitegrid", font_scale=1.75)
fig, ax = plt.subplots()
cmap=matplotlib.cm.Blues
neg_effect_responses.plot.barh(stacked=True, ax=ax, figsize=[12,6], cmap=cmap, edgecolor='black', linewidth=1)
plt.title("How effective were the following responses?\n(counting number of responses)")
plt.xlabel("Number of responses")
legend = plt.legend(fancybox=True, loc='upper center', bbox_to_anchor=(.5, -.13), ncol=4, shadow=True)
legend.get_frame().set_edgecolor('b')
legend.get_frame().set_facecolor('white')
neg_effect_responses_prop = neg_df[neg_effect_vars].apply(pd.Series.value_counts, normalize=True).round(4).transpose()
neg_effect_responses_prop = neg_effect_responses_prop.replace(np.nan, 0).sort_values(by="Completely effective")
neg_effect_responses_prop = neg_effect_responses_prop[["Not at all effective", "A little effective", "Somewhat effective", "Mostly effective", "Completely effective"]]
neg_effect_responses_prop = neg_effect_responses_prop * 100
idx = []
for i in neg_effect_responses_prop.index:
idx.append(i[23:].replace(".", " "))
neg_effect_responses_prop.index = idx
neg_effect_responses_prop
Not at all effective | A little effective | Somewhat effective | Mostly effective | Completely effective | |
---|---|---|---|---|---|
REPORTED TO HOST OR ISP | 30.00% | 20.00% | 30.00% | 15.00% | 5.00% |
ASKED USER TO STOP | 24.87% | 26.42% | 25.91% | 17.10% | 5.70% |
SOLICITED COMMUNITY SUPPORT | 5.36% | 19.64% | 33.93% | 28.57% | 12.50% |
CONSULTED LEGAL COUNSEL | 12.50% | 12.50% | 37.50% | 25.00% | 12.50% |
REPORTED TO MAINTAINERS | 10.53% | 11.58% | 32.63% | 31.58% | 13.68% |
CONTACTED LAW ENFORCEMENT | 44.44% | 0.00% | 22.22% | 0.00% | 33.33% |
BLOCKED USER | 3.57% | 11.90% | 16.67% | 33.33% | 34.52% |
OTHER | 13.79% | 0.00% | 13.79% | 34.48% | 37.93% |
sns.set(style="whitegrid", font_scale=1.75)
fig, ax = plt.subplots()
cmap=matplotlib.cm.Blues
neg_effect_responses_prop.plot.barh(stacked=True, ax=ax, figsize=[12,6], cmap=cmap, edgecolor='black', linewidth=1)
plt.title("How effective were the following responses?\n(proportion of responses)")
plt.xlabel("Proportion of responses")
legend = plt.legend(fancybox=True, loc='upper center', bbox_to_anchor=(.5, -.13), ncol=4, shadow=True)
legend.get_frame().set_edgecolor('b')
legend.get_frame().set_facecolor('white')
NEGATIVE.CONSEQUENCES.*
neg_conseq_responses = neg_df[neg_conseq_vars].apply(pd.Series.value_counts).transpose()[[1,0]]
neg_conseq_responses.columns = ["Yes", "Blank"]
neg_conseq_responses
Yes | Blank | |
---|---|---|
NEGATIVE.CONSEQUENCES.STOPPED.CONTRIBUTING | 390 | 5639 |
NEGATIVE.CONSEQUENCES.PSEUDONYM | 50 | 5979 |
NEGATIVE.CONSEQUENCES.WORK.IN.PRIVATE | 166 | 5863 |
NEGATIVE.CONSEQUENCES.CHANGE.USERNAME | 48 | 5981 |
NEGATIVE.CONSEQUENCES.CHANGE.ONLINE.PRESENCE | 79 | 5950 |
NEGATIVE.CONSEQUENCES.SUGGEST.COC | 116 | 5913 |
NEGATIVE.CONSEQUENCES.PRIVATE.COMMUNITY.DISCUSSION | 301 | 5728 |
NEGATIVE.CONSEQUENCES.PUBLIC.COMMUNITY.DISCUSSION | 248 | 5781 |
NEGATIVE.CONSEQUENCES.OFFLINE.CHANGES | 85 | 5944 |
NEGATIVE.CONSEQUENCES.OTHER | 90 | 5939 |
NEGATIVE.CONSEQUENCES.NONE.OF.THE.ABOVE | 1094 | 4935 |
NEGATIVE.CONSEQUENCES.ANY.RESPONSE | 1953 | 4076 |
Only 1,953 respondents clicked any boxes in this question, meaning 4,076 did not click a response or the "none of the above" option (or even get to the question). We have to adjust the no responses accordingly.
neg_conseq_responses_adj = neg_conseq_responses
neg_conseq_responses_adj["Blank"] = neg_conseq_responses_adj["Blank"] - 4076
neg_conseq_responses_adj_df = pd.DataFrame(neg_conseq_responses_adj["Yes"] / (neg_conseq_responses_adj["Yes"] + neg_conseq_responses_adj["Blank"]) * 100, columns=["percent_yes"])
neg_conseq_responses_adj.columns = ['Yes', 'No']
neg_conseq_responses_adj[:-1]
Yes | No | |
---|---|---|
NEGATIVE.CONSEQUENCES.STOPPED.CONTRIBUTING | 390 | 1563 |
NEGATIVE.CONSEQUENCES.PSEUDONYM | 50 | 1903 |
NEGATIVE.CONSEQUENCES.WORK.IN.PRIVATE | 166 | 1787 |
NEGATIVE.CONSEQUENCES.CHANGE.USERNAME | 48 | 1905 |
NEGATIVE.CONSEQUENCES.CHANGE.ONLINE.PRESENCE | 79 | 1874 |
NEGATIVE.CONSEQUENCES.SUGGEST.COC | 116 | 1837 |
NEGATIVE.CONSEQUENCES.PRIVATE.COMMUNITY.DISCUSSION | 301 | 1652 |
NEGATIVE.CONSEQUENCES.PUBLIC.COMMUNITY.DISCUSSION | 248 | 1705 |
NEGATIVE.CONSEQUENCES.OFFLINE.CHANGES | 85 | 1868 |
NEGATIVE.CONSEQUENCES.OTHER | 90 | 1863 |
NEGATIVE.CONSEQUENCES.NONE.OF.THE.ABOVE | 1094 | 859 |
neg_conseq_responses_adj_df.sort_values(by="percent_yes").round(2)
percent_yes | |
---|---|
NEGATIVE.CONSEQUENCES.CHANGE.USERNAME | 2.46% |
NEGATIVE.CONSEQUENCES.PSEUDONYM | 2.56% |
NEGATIVE.CONSEQUENCES.CHANGE.ONLINE.PRESENCE | 4.05% |
NEGATIVE.CONSEQUENCES.OFFLINE.CHANGES | 4.35% |
NEGATIVE.CONSEQUENCES.OTHER | 4.61% |
NEGATIVE.CONSEQUENCES.SUGGEST.COC | 5.94% |
NEGATIVE.CONSEQUENCES.WORK.IN.PRIVATE | 8.50% |
NEGATIVE.CONSEQUENCES.PUBLIC.COMMUNITY.DISCUSSION | 12.70% |
NEGATIVE.CONSEQUENCES.PRIVATE.COMMUNITY.DISCUSSION | 15.41% |
NEGATIVE.CONSEQUENCES.STOPPED.CONTRIBUTING | 19.97% |
NEGATIVE.CONSEQUENCES.NONE.OF.THE.ABOVE | 56.02% |
NEGATIVE.CONSEQUENCES.ANY.RESPONSE | 100.00% |
sns.set(style="whitegrid", font_scale=1.75)
fig, ax = plt.subplots()
cmap=matplotlib.cm.Blues_r
neg_conseq_responses_adj[:-1].sort_values(by='No').plot.barh(stacked=True, ax=ax, figsize=[12,6], cmap=cmap, edgecolor='black', linewidth=1)
labels = []
for l in ax.get_yticklabels():
title_text = l.get_text()[22:].replace(".", " ") # cut off "CONTRIBUTOR.TYPE"
labels.append(title_text)
ax.set_yticklabels(labels)
plt.title("As a result of experiencing or witnessing\nharassment, which, if any, of the following have you done?")
plt.xlabel("Number of responses")
legend = plt.legend(fancybox=True, loc='upper center', bbox_to_anchor=(.5, -.13), ncol=4, shadow=True)
legend.get_frame().set_edgecolor('b')
legend.get_frame().set_facecolor('white')