from google.colab import drive
drive.mount('/data/')
data_dir = '/data/My Drive/Colab Notebooks/Experiment'
!ls '/data/My Drive/Colab Notebooks/Experiment'
!pip install matplotlib
Mounted at /data/ m_data.csv w_data.csv Requirement already satisfied: matplotlib in /usr/local/lib/python3.6/dist-packages (3.2.2) Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib) (2.8.1) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib) (2.4.7) Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib) (1.2.0) Requirement already satisfied: numpy>=1.11 in /usr/local/lib/python3.6/dist-packages (from matplotlib) (1.18.5) Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.6/dist-packages (from matplotlib) (0.10.0) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.6/dist-packages (from python-dateutil>=2.1->matplotlib) (1.15.0)
import pandas as pd
man = pd.read_csv(data_dir+'/m_data.csv')
woman = pd.read_csv(data_dir+'/w_data.csv')
data = pd.concat([man, woman])
import matplotlib.pyplot as plt
plt.scatter(man['bmi'], man['steps'], woman['bmi'], woman['steps'])
<matplotlib.collections.PathCollection at 0x7fbdb855f0f0>
z = data.sample(n=500)
data.sample(frac=0.4).to_csv('m_data.csv')
data.sample(frac=0.4).to_csv('w_data.csv')
plt.scatter(z['bmi'], z['steps'], marker=None)
<matplotlib.collections.PathCollection at 0x7fbdb5f32080>
man.head()
Unnamed: 0 | bmi | steps | |
---|---|---|---|
0 | 22393 | 14.0 | 217.0 |
1 | 16685 | 171.0 | 176.0 |
2 | 15155 | 91.0 | 168.0 |
3 | 6162 | 86.0 | 101.0 |
4 | 22150 | 146.0 | 215.0 |
man.describe()
Unnamed: 0 | bmi | steps | |
---|---|---|---|
count | 18548.000000 | 18548.000000 | 18548.000000 |
mean | 23180.819388 | 254.411096 | 219.864568 |
std | 13357.077654 | 174.845136 | 102.026507 |
min | 2.000000 | 0.000000 | 0.000000 |
25% | 11642.750000 | 97.000000 | 145.000000 |
50% | 23307.000000 | 202.000000 | 222.000000 |
75% | 34706.250000 | 441.250000 | 292.000000 |
max | 46368.000000 | 549.000000 | 411.000000 |
woman.describe()
Unnamed: 0 | bmi | steps | |
---|---|---|---|
count | 18548.000000 | 18548.000000 | 18548.000000 |
mean | 23282.573647 | 254.232909 | 220.864406 |
std | 13439.323095 | 174.901153 | 102.534039 |
min | 1.000000 | 0.000000 | 0.000000 |
25% | 11537.250000 | 97.000000 | 144.000000 |
50% | 23214.000000 | 200.000000 | 222.000000 |
75% | 35035.750000 | 442.000000 | 296.000000 |
max | 46370.000000 | 549.000000 | 411.000000 |
As we can see from the above describtion of 2 data frame, both man and woman's average steps are very similar. Man has an average of 219.86 steps and woman has a 220.86 steps.
For the data in Man and Woman, we can see that it also has a really close standaed deviation. From these 2 tables of data, we can tell that both man and woman have a similar walking pattern.
man[['bmi', 'steps']].corr()
bmi | steps | |
---|---|---|
bmi | 1.000000 | -0.099439 |
steps | -0.099439 | 1.000000 |
woman[['bmi', 'steps']].corr()
bmi | steps | |
---|---|---|
bmi | 1.0000 | -0.0956 |
steps | -0.0956 | 1.0000 |
man[['bmi', 'steps']].hist()
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f356a91c748>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f356a49cc88>]], dtype=object)
woman[['bmi', 'steps']].hist()
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f356a4cbf60>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f356aa714a8>]], dtype=object)