In [ ]:

from google.colab import drive
drive.mount('/data/')
data_dir = '/data/My Drive/Colab Notebooks/Experiment'
!ls '/data/My Drive/Colab Notebooks/Experiment'
!pip install matplotlib

Mounted at /data/
m_data.csv  w_data.csv
Requirement already satisfied: matplotlib in /usr/local/lib/python3.6/dist-packages (3.2.2)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib) (2.8.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib) (2.4.7)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib) (1.2.0)
Requirement already satisfied: numpy>=1.11 in /usr/local/lib/python3.6/dist-packages (from matplotlib) (1.18.5)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.6/dist-packages (from matplotlib) (0.10.0)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.6/dist-packages (from python-dateutil>=2.1->matplotlib) (1.15.0)

In [ ]:

import pandas as pd

man = pd.read_csv(data_dir+'/m_data.csv')
woman = pd.read_csv(data_dir+'/w_data.csv')

In [ ]:

data = pd.concat([man, woman])

In [ ]:

import matplotlib.pyplot as plt

In [ ]:

plt.scatter(man['bmi'], man['steps'], woman['bmi'], woman['steps'])

Out[ ]:

<matplotlib.collections.PathCollection at 0x7fbdb855f0f0>

In [ ]:

z = data.sample(n=500)
data.sample(frac=0.4).to_csv('m_data.csv')
data.sample(frac=0.4).to_csv('w_data.csv')

plt.scatter(z['bmi'], z['steps'], marker=None)

Out[ ]:

<matplotlib.collections.PathCollection at 0x7fbdb5f32080>

In [ ]:

man.head()

Out[ ]:

	Unnamed: 0	bmi	steps
0	22393	14.0	217.0
1	16685	171.0	176.0
2	15155	91.0	168.0
3	6162	86.0	101.0
4	22150	146.0	215.0

In [ ]:

man.describe()

Out[ ]:

	Unnamed: 0	bmi	steps
count	18548.000000	18548.000000	18548.000000
mean	23180.819388	254.411096	219.864568
std	13357.077654	174.845136	102.026507
min	2.000000	0.000000	0.000000
25%	11642.750000	97.000000	145.000000
50%	23307.000000	202.000000	222.000000
75%	34706.250000	441.250000	292.000000
max	46368.000000	549.000000	411.000000

In [ ]:

woman.describe()

Out[ ]:

	Unnamed: 0	bmi	steps
count	18548.000000	18548.000000	18548.000000
mean	23282.573647	254.232909	220.864406
std	13439.323095	174.901153	102.534039
min	1.000000	0.000000	0.000000
25%	11537.250000	97.000000	144.000000
50%	23214.000000	200.000000	222.000000
75%	35035.750000	442.000000	296.000000
max	46370.000000	549.000000	411.000000

As we can see from the above describtion of 2 data frame, both man and woman's average steps are very similar. Man has an average of 219.86 steps and woman has a 220.86 steps.

For the data in Man and Woman, we can see that it also has a really close standaed deviation. From these 2 tables of data, we can tell that both man and woman have a similar walking pattern.

In [ ]:

man[['bmi', 'steps']].corr()

Out[ ]:

	bmi	steps
bmi	1.000000	-0.099439
steps	-0.099439	1.000000

In [ ]:

woman[['bmi', 'steps']].corr()

Out[ ]:

	bmi	steps
bmi	1.0000	-0.0956
steps	-0.0956	1.0000

In [ ]:

man[['bmi', 'steps']].hist()

Out[ ]:

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f356a91c748>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f356a49cc88>]],
      dtype=object)

In [ ]:

woman[['bmi', 'steps']].hist()

Out[ ]:

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f356a4cbf60>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f356aa714a8>]],
      dtype=object)

In [ ]: