mlcourse.ai – Open Machine Learning Course

Author: Arina Lopukhova (@erynn). Edited by Yury Kashnitskiy (@yorko) and Vadim Shestopalov (@vchulski). This material is subject to the terms and conditions of the Creative Commons CC BY-NC-SA 4.0 license. Free use is permitted for any non-commercial purpose.

Assignment #1. Fall 2019

Exploratory data analysis of Olympic games with Pandas

Prior to working on the assignment, you'd better check out the corresponding course material:

Your task is to:

  1. write code and perform computations in the cells below
  2. choose answers in the webform. Solutions will be shared only with those who've filled in this form
  3. submit answers with some email and remember it! This will be your ID during the course. Specify your real full name in the form as well (no nicks allowed in the final top-100 rating). If in doubt, you can re-submit the form till the deadline for A1, no problem, but stick to only one email.

Deadline for A1: 2019 September 15, 20:59 GMT (London time)

You'll get up to 10 credits for this assignment.

How to get help

In ODS Slack (if you still don't have access, fill in the form mentioned on the mlcourse.ai main page), we have a channel #mlcourse_ai_news with announcements from the course team. You can discuss the course content freely in the #mlcourse_ai channel (we still have a huge Russian-speaking group, they have a separate channel #mlcourse_ai_rus).

Here's how you reply in a thread (press this dialog icon to drill down into a thread):

Please stick to special threads a1_q1-5_fall2019 and a1_q6-10_fall2019 in #mlcourse_ai_news for your questions on A1. Help each other without sharing correct code and answers. Our TA Vadim @vchulski is there to help (only in the mentioned thread, do not write to him directly).

Lastly, you can save useful messages by pinning them, further you can find pinned items on the top, just below the channel name:

Assignment

There are ten questions about 120 years of Olympic history in this task. Your task is to fill in the missing Python code and choose answers in this web-form.

Download the file athlete_events.csv from here (scraped by rgriffin from www.sports-reference.com). The dataset has the following features:

  • ID - Unique number for each athlete
  • Name - Athlete's name
  • Sex - M or F
  • Age - Integer
  • Height - In centimeters
  • Weight - In kilograms
  • Team - Team name
  • NOC - National Olympic Committee 3-letter code
  • Games - Year and season
  • Year - Integer
  • Season - Summer or Winter
  • City - Host city
  • Sport - Sport
  • Event - Event
  • Medal - Gold, Silver, Bronze, or NA
In [1]:
import pandas as pd
In [2]:
# Change the path to the dataset file if needed. 
PATH = '../../data/athlete_events.csv'
In [3]:
data = pd.read_csv(PATH)
data.head()
Out[3]:
ID Name Sex Age Height Weight Team NOC Games Year Season City Sport Event Medal
0 1 A Dijiang M 24.0 180.0 80.0 China CHN 1992 Summer 1992 Summer Barcelona Basketball Basketball Men's Basketball NaN
1 2 A Lamusi M 23.0 170.0 60.0 China CHN 2012 Summer 2012 Summer London Judo Judo Men's Extra-Lightweight NaN
2 3 Gunnar Nielsen Aaby M 24.0 NaN NaN Denmark DEN 1920 Summer 1920 Summer Antwerpen Football Football Men's Football NaN
3 4 Edgar Lindenau Aabye M 34.0 NaN NaN Denmark/Sweden DEN 1900 Summer 1900 Summer Paris Tug-Of-War Tug-Of-War Men's Tug-Of-War Gold
4 5 Christine Jacoba Aaftink F 21.0 185.0 82.0 Netherlands NED 1988 Winter 1988 Winter Calgary Speed Skating Speed Skating Women's 500 metres NaN

1. How old were the youngest male and female participants of the 1992 Olympics?

For discussions, please stick to ODS Slack, channel #mlcourse_ai_news, pinned thread #a1_q1-5_fall2019

  • 16 and 15
  • 14 and 13
  • 13 and 11
  • 11 and 12
In [4]:
# You code here

2. What was the percentage of male basketball players among all the male participants of the 2012 Olympics? Round the answer to the first decimal.

Hint: drop duplicate athletes where necessary to count each athlete just once. This applies to other questions too.

For discussions, please stick to ODS Slack, channel #mlcourse_ai_news, pinned thread #a1_q1-5_fall2019

  • 0.2
  • 1.5
  • 2.5
  • 7.7
In [5]:
# You code here

3. What are the mean and standard deviation of height for female tennis players who participated in the 2000 Olympics? Round the answer to the first decimal.

For discussions, please stick to ODS Slack, channel #mlcourse_ai_news, pinned thread #a1_q1-5_fall2019

  • 171.8 and 6.5
  • 179.4 and 10
  • 180.7 and 6.7
  • 182.4 and 9.1
In [6]:
# You code here

4. Find the heaviest athlete among 2006 Olympics participants. What sport did he or she do?

For discussions, please stick to ODS Slack, channel #mlcourse_ai_news, pinned thread #a1_q1-5_fall2019

  • Judo
  • Bobsleigh
  • Skeleton
  • Boxing
In [7]:
# You code here

5. How many times did John Aalberg participate in the Olympics held in different years?

For discussions, please stick to ODS Slack, channel #mlcourse_ai_news, pinned thread #a1_q1-5_fall2019

  • 0
  • 1
  • 2
  • 3
In [8]:
# You code here

6. How many gold medals in tennis did the Switzerland team win at the 2008 Olympics?

For discussions, please stick to ODS Slack, channel #mlcourse_ai_news, pinned thread #a1_q6-10_fall2019

  • 0
  • 1
  • 2
  • 3
In [9]:
# You code here

7. Is it true that Spain won fewer medals than Italy at the 2016 Olympics? Do not consider NaN values in Medal column.

For discussions, please stick to ODS Slack, channel #mlcourse_ai_news, pinned thread #a1_q6-10_fall2019

  • Yes
  • No
In [10]:
# You code here

8. What are the most and least common age groups among the participants of the 2008 Olympics?

For discussions, please stick to ODS Slack, channel #mlcourse_ai_news, pinned thread #a1_q6-10_fall2019

  • [45-55] and [25-35) correspondingly
  • [45-55] and [15-25) correspondingly
  • [35-45) and [25-35) correspondingly
  • [45-55] and [35-45) correspondingly
In [11]:
# You code here

9. Is it true that there were Summer Olympics held in Atlanta? Is it true that there were Winter Olympics held in Squaw Valley?

For discussions, please stick to ODS Slack, channel #mlcourse_ai_news, pinned thread #a1_q6-10_fall2019

  • Yes, Yes
  • Yes, No
  • No, Yes
  • No, No
In [12]:
# You code here

10. What is the absolute difference between the number of unique sports at the 1986 Olympics and 2002 Olympics?

For discussions, please stick to ODS Slack, channel #mlcourse_ai_news, pinned thread #a1_q6-10_fall2019

  • 3
  • 10
  • 15
  • 27
In [13]:
# You code here

That's it! Now go and do 30 push-ups! :)