Faster First EDA with pandas-profiling

In [2]:
# importing required packages
import pandas as pd
import pandas_profiling
import numpy as np
In [3]:
# importing the data
df = pd.read_csv('/Users/lukas/Downloads/titanic/train.csv')
In [4]:
# checking the head
df.head()
Out[4]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
In [8]:
# descriptive statistics
df.describe()
Out[8]:
PassengerId Survived Pclass Age SibSp Parch Fare
count 891.000000 891.000000 891.000000 714.000000 891.000000 891.000000 891.000000
mean 446.000000 0.383838 2.308642 29.699118 0.523008 0.381594 32.204208
std 257.353842 0.486592 0.836071 14.526497 1.102743 0.806057 49.693429
min 1.000000 0.000000 1.000000 0.420000 0.000000 0.000000 0.000000
25% 223.500000 0.000000 2.000000 20.125000 0.000000 0.000000 7.910400
50% 446.000000 0.000000 3.000000 28.000000 0.000000 0.000000 14.454200
75% 668.500000 1.000000 3.000000 38.000000 1.000000 0.000000 31.000000
max 891.000000 1.000000 3.000000 80.000000 8.000000 6.000000 512.329200

Create an inline report

In [5]:
pandas_profiling.ProfileReport(df)
Out[5]:

Overview

Dataset info

Number of variables 12
Number of observations 891
Total Missing (%) 8.1%
Total size in memory 83.6 KiB
Average record size in memory 96.1 B

Variables types

Numeric 6
Categorical 4
Boolean 1
Date 0
Text (Unique) 1
Rejected 0
Unsupported 0

Warnings

  • Age has 177 / 19.9% missing values Missing
  • Cabin has 687 / 77.1% missing values Missing
  • Cabin has a high cardinality: 148 distinct values Warning
  • Fare has 15 / 1.7% zeros Zeros
  • Parch has 678 / 76.1% zeros Zeros
  • SibSp has 608 / 68.2% zeros Zeros
  • Ticket has a high cardinality: 681 distinct values Warning

Variables

Age
Numeric

Distinct count 89
Unique (%) 10.0%
Missing (%) 19.9%
Missing (n) 177
Infinite (%) 0.0%
Infinite (n) 0
Mean 29.699
Minimum 0.42
Maximum 80
Zeros (%) 0.0%

Quantile statistics

Minimum 0.42
5-th percentile 4
Q1 20.125
Median 28
Q3 38
95-th percentile 56
Maximum 80
Range 79.58
Interquartile range 17.875

Descriptive statistics

Standard deviation 14.526
Coef of variation 0.48912
Kurtosis 0.17827
Mean 29.699
MAD 11.323
Skewness 0.38911
Sum 21205
Variance 211.02
Memory size 7.0 KiB
Value Count Frequency (%)  
24.0 30 3.4%
 
22.0 27 3.0%
 
18.0 26 2.9%
 
28.0 25 2.8%
 
19.0 25 2.8%
 
30.0 25 2.8%
 
21.0 24 2.7%
 
25.0 23 2.6%
 
36.0 22 2.5%
 
29.0 20 2.2%
 
Other values (78) 467 52.4%
 
(Missing) 177 19.9%
 

Minimum 5 values

Value Count Frequency (%)  
0.42 1 0.1%
 
0.67 1 0.1%
 
0.75 2 0.2%
 
0.83 2 0.2%
 
0.92 1 0.1%
 

Maximum 5 values

Value Count Frequency (%)  
70.0 2 0.2%
 
70.5 1 0.1%
 
71.0 2 0.2%
 
74.0 1 0.1%
 
80.0 1 0.1%
 

Cabin
Categorical

Distinct count 148
Unique (%) 16.6%
Missing (%) 77.1%
Missing (n) 687
G6
 
4
C23 C25 C27
 
4
B96 B98
 
4
Other values (144)
192
(Missing)
687
Value Count Frequency (%)  
G6 4 0.4%
 
C23 C25 C27 4 0.4%
 
B96 B98 4 0.4%
 
D 3 0.3%
 
F2 3 0.3%
 
F33 3 0.3%
 
C22 C26 3 0.3%
 
E101 3 0.3%
 
E121 2 0.2%
 
E8 2 0.2%
 
Other values (137) 173 19.4%
 
(Missing) 687 77.1%
 

Embarked
Categorical

Distinct count 4
Unique (%) 0.4%
Missing (%) 0.2%
Missing (n) 2
S
644
C
168
Q
 
77
(Missing)
 
2
Value Count Frequency (%)  
S 644 72.3%
 
C 168 18.9%
 
Q 77 8.6%
 
(Missing) 2 0.2%
 

Fare
Numeric

Distinct count 248
Unique (%) 27.8%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 32.204
Minimum 0
Maximum 512.33
Zeros (%) 1.7%

Quantile statistics

Minimum 0
5-th percentile 7.225
Q1 7.9104
Median 14.454
Q3 31
95-th percentile 112.08
Maximum 512.33
Range 512.33
Interquartile range 23.09

Descriptive statistics

Standard deviation 49.693
Coef of variation 1.5431
Kurtosis 33.398
Mean 32.204
MAD 28.164
Skewness 4.7873
Sum 28694
Variance 2469.4
Memory size 7.0 KiB
Value Count Frequency (%)  
8.05 43 4.8%
 
13.0 42 4.7%
 
7.8958 38 4.3%
 
7.75 34 3.8%
 
26.0 31 3.5%
 
10.5 24 2.7%
 
7.925 18 2.0%
 
7.775 16 1.8%
 
26.55 15 1.7%
 
0.0 15 1.7%
 
Other values (238) 615 69.0%
 

Minimum 5 values

Value Count Frequency (%)  
0.0 15 1.7%
 
4.0125 1 0.1%
 
5.0 1 0.1%
 
6.2375 1 0.1%
 
6.4375 1 0.1%
 

Maximum 5 values

Value Count Frequency (%)  
227.525 4 0.4%
 
247.5208 2 0.2%
 
262.375 2 0.2%
 
263.0 4 0.4%
 
512.3292 3 0.3%
 

Name
Categorical, Unique

First 3 values
Hansen, Mr. Henrik Juul
Aubart, Mme. Leontine Pauline
Abbott, Mrs. Stanton (Rosa Hunt)
Last 3 values
Pickard, Mr. Berk (Berk Trembisky)
Goldenberg, Mrs. Samuel L (Edwiga Grabowska)
Karlsson, Mr. Nils August

First 10 values

Value Count Frequency (%)  
Abbing, Mr. Anthony 1 0.1%
 
Abbott, Mr. Rossmore Edward 1 0.1%
 
Abbott, Mrs. Stanton (Rosa Hunt) 1 0.1%
 
Abelson, Mr. Samuel 1 0.1%
 
Abelson, Mrs. Samuel (Hannah Wizosky) 1 0.1%
 

Last 10 values

Value Count Frequency (%)  
de Mulder, Mr. Theodore 1 0.1%
 
de Pelsmaeker, Mr. Alfons 1 0.1%
 
del Carlo, Mr. Sebastiano 1 0.1%
 
van Billiard, Mr. Austin Blyler 1 0.1%
 
van Melkebeke, Mr. Philemon 1 0.1%
 

Parch
Numeric

Distinct count 7
Unique (%) 0.8%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 0.38159
Minimum 0
Maximum 6
Zeros (%) 76.1%

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
Median 0
Q3 0
95-th percentile 2
Maximum 6
Range 6
Interquartile range 0

Descriptive statistics

Standard deviation 0.80606
Coef of variation 2.1123
Kurtosis 9.7781
Mean 0.38159
MAD 0.58074
Skewness 2.7491
Sum 340
Variance 0.64973
Memory size 7.0 KiB
Value Count Frequency (%)  
0 678 76.1%
 
1 118 13.2%
 
2 80 9.0%
 
5 5 0.6%
 
3 5 0.6%
 
4 4 0.4%
 
6 1 0.1%
 

Minimum 5 values

Value Count Frequency (%)  
0 678 76.1%
 
1 118 13.2%
 
2 80 9.0%
 
3 5 0.6%
 
4 4 0.4%
 

Maximum 5 values

Value Count Frequency (%)  
2 80 9.0%
 
3 5 0.6%
 
4 4 0.4%
 
5 5 0.6%
 
6 1 0.1%
 

PassengerId
Numeric

Distinct count 891
Unique (%) 100.0%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 446
Minimum 1
Maximum 891
Zeros (%) 0.0%

Quantile statistics

Minimum 1
5-th percentile 45.5
Q1 223.5
Median 446
Q3 668.5
95-th percentile 846.5
Maximum 891
Range 890
Interquartile range 445

Descriptive statistics

Standard deviation 257.35
Coef of variation 0.57703
Kurtosis -1.2
Mean 446
MAD 222.75
Skewness 0
Sum 397386
Variance 66231
Memory size 7.0 KiB
Value Count Frequency (%)  
891 1 0.1%
 
293 1 0.1%
 
304 1 0.1%
 
303 1 0.1%
 
302 1 0.1%
 
301 1 0.1%
 
300 1 0.1%
 
299 1 0.1%
 
298 1 0.1%
 
297 1 0.1%
 
Other values (881) 881 98.9%
 

Minimum 5 values

Value Count Frequency (%)  
1 1 0.1%
 
2 1 0.1%
 
3 1 0.1%
 
4 1 0.1%
 
5 1 0.1%
 

Maximum 5 values

Value Count Frequency (%)  
887 1 0.1%
 
888 1 0.1%
 
889 1 0.1%
 
890 1 0.1%
 
891 1 0.1%
 

Pclass
Numeric

Distinct count 3
Unique (%) 0.3%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 2.3086
Minimum 1
Maximum 3
Zeros (%) 0.0%

Quantile statistics

Minimum 1
5-th percentile 1
Q1 2
Median 3
Q3 3
95-th percentile 3
Maximum 3
Range 2
Interquartile range 1

Descriptive statistics

Standard deviation 0.83607
Coef of variation 0.36215
Kurtosis -1.28
Mean 2.3086
MAD 0.76197
Skewness -0.63055
Sum 2057
Variance 0.69902
Memory size 7.0 KiB
Value Count Frequency (%)  
3 491 55.1%
 
1 216 24.2%
 
2 184 20.7%
 

Minimum 5 values

Value Count Frequency (%)  
1 216 24.2%
 
2 184 20.7%
 
3 491 55.1%
 

Maximum 5 values

Value Count Frequency (%)  
1 216 24.2%
 
2 184 20.7%
 
3 491 55.1%
 

Sex
Categorical

Distinct count 2
Unique (%) 0.2%
Missing (%) 0.0%
Missing (n) 0
male
577
female
314
Value Count Frequency (%)  
male 577 64.8%
 
female 314 35.2%
 

SibSp
Numeric

Distinct count 7
Unique (%) 0.8%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 0.52301
Minimum 0
Maximum 8
Zeros (%) 68.2%

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
Median 0
Q3 1
95-th percentile 3
Maximum 8
Range 8
Interquartile range 1

Descriptive statistics

Standard deviation 1.1027
Coef of variation 2.1085
Kurtosis 17.88
Mean 0.52301
MAD 0.71378
Skewness 3.6954
Sum 466
Variance 1.216
Memory size 7.0 KiB
Value Count Frequency (%)  
0 608 68.2%
 
1 209 23.5%
 
2 28 3.1%
 
4 18 2.0%
 
3 16 1.8%
 
8 7 0.8%
 
5 5 0.6%
 

Minimum 5 values

Value Count Frequency (%)  
0 608 68.2%
 
1 209 23.5%
 
2 28 3.1%
 
3 16 1.8%
 
4 18 2.0%
 

Maximum 5 values

Value Count Frequency (%)  
2 28 3.1%
 
3 16 1.8%
 
4 18 2.0%
 
5 5 0.6%
 
8 7 0.8%
 

Survived
Boolean

Distinct count 2
Unique (%) 0.2%
Missing (%) 0.0%
Missing (n) 0
Mean 0.38384
0
549
1
342
Value Count Frequency (%)  
0 549 61.6%
 
1 342 38.4%
 

Ticket
Categorical

Distinct count 681
Unique (%) 76.4%
Missing (%) 0.0%
Missing (n) 0
1601
 
7
CA. 2343
 
7
347082
 
7
Other values (678)
870
Value Count Frequency (%)  
1601 7 0.8%
 
CA. 2343 7 0.8%
 
347082 7 0.8%
 
347088 6 0.7%
 
3101295 6 0.7%
 
CA 2144 6 0.7%
 
382652 5 0.6%
 
S.O.C. 14879 5 0.6%
 
2666 4 0.4%
 
17421 4 0.4%
 
Other values (671) 834 93.6%
 

Correlations

Sample

PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
In [11]:
# sample vs. head
df.sample(5)
Out[11]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
535 536 1 2 Hart, Miss. Eva Miriam female 7.0 0 2 F.C.C. 13529 26.2500 NaN S
621 622 1 1 Kimball, Mr. Edwin Nelson Jr male 42.0 1 0 11753 52.5542 D19 S
22 23 1 3 McGowan, Miss. Anna "Annie" female 15.0 0 0 330923 8.0292 NaN Q
655 656 0 2 Hickman, Mr. Leonard Mark male 24.0 2 0 S.O.C. 14879 73.5000 NaN S
136 137 1 1 Newsom, Miss. Helen Monypeny female 19.0 0 2 11752 26.2833 D47 S

Save Report to HTML

In [6]:
pfr = pandas_profiling.ProfileReport(df)
pfr.to_file("/tmp/example.html")
In [7]:
pfr
Out[7]:

Overview

Dataset info

Number of variables 12
Number of observations 891
Total Missing (%) 8.1%
Total size in memory 83.6 KiB
Average record size in memory 96.1 B

Variables types

Numeric 6
Categorical 4
Boolean 1
Date 0
Text (Unique) 1
Rejected 0
Unsupported 0

Warnings

  • Age has 177 / 19.9% missing values Missing
  • Cabin has 687 / 77.1% missing values Missing
  • Cabin has a high cardinality: 148 distinct values Warning
  • Fare has 15 / 1.7% zeros Zeros
  • Parch has 678 / 76.1% zeros Zeros
  • SibSp has 608 / 68.2% zeros Zeros
  • Ticket has a high cardinality: 681 distinct values Warning

Variables

Age
Numeric

Distinct count 89
Unique (%) 10.0%
Missing (%) 19.9%
Missing (n) 177
Infinite (%) 0.0%
Infinite (n) 0
Mean 29.699
Minimum 0.42
Maximum 80
Zeros (%) 0.0%

Quantile statistics

Minimum 0.42
5-th percentile 4
Q1 20.125
Median 28
Q3 38
95-th percentile 56
Maximum 80
Range 79.58
Interquartile range 17.875

Descriptive statistics

Standard deviation 14.526
Coef of variation 0.48912
Kurtosis 0.17827
Mean 29.699
MAD 11.323
Skewness 0.38911
Sum 21205
Variance 211.02
Memory size 7.0 KiB
Value Count Frequency (%)  
24.0 30 3.4%
 
22.0 27 3.0%
 
18.0 26 2.9%
 
28.0 25 2.8%
 
19.0 25 2.8%
 
30.0 25 2.8%
 
21.0 24 2.7%
 
25.0 23 2.6%
 
36.0 22 2.5%
 
29.0 20 2.2%
 
Other values (78) 467 52.4%
 
(Missing) 177 19.9%
 

Minimum 5 values

Value Count Frequency (%)  
0.42 1 0.1%
 
0.67 1 0.1%
 
0.75 2 0.2%
 
0.83 2 0.2%
 
0.92 1 0.1%
 

Maximum 5 values

Value Count Frequency (%)  
70.0 2 0.2%
 
70.5 1 0.1%
 
71.0 2 0.2%
 
74.0 1 0.1%
 
80.0 1 0.1%
 

Cabin
Categorical

Distinct count 148
Unique (%) 16.6%
Missing (%) 77.1%
Missing (n) 687
G6
 
4
C23 C25 C27
 
4
B96 B98
 
4
Other values (144)
192
(Missing)
687
Value Count Frequency (%)  
G6 4 0.4%
 
C23 C25 C27 4 0.4%
 
B96 B98 4 0.4%
 
D 3 0.3%
 
F2 3 0.3%
 
F33 3 0.3%
 
C22 C26 3 0.3%
 
E101 3 0.3%
 
E121 2 0.2%
 
E8 2 0.2%
 
Other values (137) 173 19.4%
 
(Missing) 687 77.1%
 

Embarked
Categorical

Distinct count 4
Unique (%) 0.4%
Missing (%) 0.2%
Missing (n) 2
S
644
C
168
Q
 
77
(Missing)
 
2
Value Count Frequency (%)  
S 644 72.3%
 
C 168 18.9%
 
Q 77 8.6%
 
(Missing) 2 0.2%
 

Fare
Numeric

Distinct count 248
Unique (%) 27.8%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 32.204
Minimum 0
Maximum 512.33
Zeros (%) 1.7%

Quantile statistics

Minimum 0
5-th percentile 7.225
Q1 7.9104
Median 14.454
Q3 31
95-th percentile 112.08
Maximum 512.33
Range 512.33
Interquartile range 23.09

Descriptive statistics

Standard deviation 49.693
Coef of variation 1.5431
Kurtosis 33.398
Mean 32.204
MAD 28.164
Skewness 4.7873
Sum 28694
Variance 2469.4
Memory size 7.0 KiB
Value Count Frequency (%)  
8.05 43 4.8%
 
13.0 42 4.7%
 
7.8958 38 4.3%
 
7.75 34 3.8%
 
26.0 31 3.5%
 
10.5 24 2.7%
 
7.925 18 2.0%
 
7.775 16 1.8%
 
26.55 15 1.7%
 
0.0 15 1.7%
 
Other values (238) 615 69.0%
 

Minimum 5 values

Value Count Frequency (%)  
0.0 15 1.7%
 
4.0125 1 0.1%
 
5.0 1 0.1%
 
6.2375 1 0.1%
 
6.4375 1 0.1%
 

Maximum 5 values

Value Count Frequency (%)  
227.525 4 0.4%
 
247.5208 2 0.2%
 
262.375 2 0.2%
 
263.0 4 0.4%
 
512.3292 3 0.3%
 

Name
Categorical, Unique

First 3 values
Hansen, Mr. Henrik Juul
Aubart, Mme. Leontine Pauline
Abbott, Mrs. Stanton (Rosa Hunt)
Last 3 values
Pickard, Mr. Berk (Berk Trembisky)
Goldenberg, Mrs. Samuel L (Edwiga Grabowska)
Karlsson, Mr. Nils August

First 10 values

Value Count Frequency (%)  
Abbing, Mr. Anthony 1 0.1%
 
Abbott, Mr. Rossmore Edward 1 0.1%
 
Abbott, Mrs. Stanton (Rosa Hunt) 1 0.1%
 
Abelson, Mr. Samuel 1 0.1%
 
Abelson, Mrs. Samuel (Hannah Wizosky) 1 0.1%
 

Last 10 values

Value Count Frequency (%)  
de Mulder, Mr. Theodore 1 0.1%
 
de Pelsmaeker, Mr. Alfons 1 0.1%
 
del Carlo, Mr. Sebastiano 1 0.1%
 
van Billiard, Mr. Austin Blyler 1 0.1%
 
van Melkebeke, Mr. Philemon 1 0.1%
 

Parch
Numeric

Distinct count 7
Unique (%) 0.8%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 0.38159
Minimum 0
Maximum 6
Zeros (%) 76.1%

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
Median 0
Q3 0
95-th percentile 2
Maximum 6
Range 6
Interquartile range 0

Descriptive statistics

Standard deviation 0.80606
Coef of variation 2.1123
Kurtosis 9.7781
Mean 0.38159
MAD 0.58074
Skewness 2.7491
Sum 340
Variance 0.64973
Memory size 7.0 KiB
Value Count Frequency (%)  
0 678 76.1%
 
1 118 13.2%
 
2 80 9.0%
 
5 5 0.6%
 
3 5 0.6%
 
4 4 0.4%
 
6 1 0.1%
 

Minimum 5 values

Value Count Frequency (%)  
0 678 76.1%
 
1 118 13.2%
 
2 80 9.0%
 
3 5 0.6%
 
4 4 0.4%
 

Maximum 5 values

Value Count Frequency (%)  
2 80 9.0%
 
3 5 0.6%
 
4 4 0.4%
 
5 5 0.6%
 
6 1 0.1%
 

PassengerId
Numeric

Distinct count 891
Unique (%) 100.0%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 446
Minimum 1
Maximum 891
Zeros (%) 0.0%

Quantile statistics

Minimum 1
5-th percentile 45.5
Q1 223.5
Median 446
Q3 668.5
95-th percentile 846.5
Maximum 891
Range 890
Interquartile range 445

Descriptive statistics

Standard deviation 257.35
Coef of variation 0.57703
Kurtosis -1.2
Mean 446
MAD 222.75
Skewness 0
Sum 397386
Variance 66231
Memory size 7.0 KiB
Value Count Frequency (%)  
891 1 0.1%
 
293 1 0.1%
 
304 1 0.1%
 
303 1 0.1%
 
302 1 0.1%
 
301 1 0.1%
 
300 1 0.1%
 
299 1 0.1%
 
298 1 0.1%
 
297 1 0.1%
 
Other values (881) 881 98.9%
 

Minimum 5 values

Value Count Frequency (%)  
1 1 0.1%
 
2 1 0.1%
 
3 1 0.1%
 
4 1 0.1%
 
5 1 0.1%
 

Maximum 5 values

Value Count Frequency (%)  
887 1 0.1%
 
888 1 0.1%
 
889 1 0.1%
 
890 1 0.1%
 
891 1 0.1%
 

Pclass
Numeric

Distinct count 3
Unique (%) 0.3%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 2.3086
Minimum 1
Maximum 3
Zeros (%) 0.0%

Quantile statistics

Minimum 1
5-th percentile 1
Q1 2
Median 3
Q3 3
95-th percentile 3
Maximum 3
Range 2
Interquartile range 1

Descriptive statistics

Standard deviation 0.83607
Coef of variation 0.36215
Kurtosis -1.28
Mean 2.3086
MAD 0.76197
Skewness -0.63055
Sum 2057
Variance 0.69902
Memory size 7.0 KiB
Value Count Frequency (%)  
3 491 55.1%
 
1 216 24.2%
 
2 184 20.7%
 

Minimum 5 values

Value Count Frequency (%)  
1 216 24.2%
 
2 184 20.7%
 
3 491 55.1%
 

Maximum 5 values

Value Count Frequency (%)  
1 216 24.2%
 
2 184 20.7%
 
3 491 55.1%
 

Sex
Categorical

Distinct count 2
Unique (%) 0.2%
Missing (%) 0.0%
Missing (n) 0
male
577
female
314
Value Count Frequency (%)  
male 577 64.8%
 
female 314 35.2%
 

SibSp
Numeric

Distinct count 7
Unique (%) 0.8%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 0.52301
Minimum 0
Maximum 8
Zeros (%) 68.2%

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
Median 0
Q3 1
95-th percentile 3
Maximum 8
Range 8
Interquartile range 1

Descriptive statistics

Standard deviation 1.1027
Coef of variation 2.1085
Kurtosis 17.88
Mean 0.52301
MAD 0.71378
Skewness 3.6954
Sum 466
Variance 1.216
Memory size 7.0 KiB
Value Count Frequency (%)  
0 608 68.2%
 
1 209 23.5%
 
2 28 3.1%
 
4 18 2.0%
 
3 16 1.8%
 
8 7 0.8%
 
5 5 0.6%
 

Minimum 5 values

Value Count Frequency (%)  
0 608 68.2%
 
1 209 23.5%
 
2 28 3.1%
 
3 16 1.8%
 
4 18 2.0%
 

Maximum 5 values

Value Count Frequency (%)  
2 28 3.1%
 
3 16 1.8%
 
4 18 2.0%
 
5 5 0.6%
 
8 7 0.8%
 

Survived
Boolean

Distinct count 2
Unique (%) 0.2%
Missing (%) 0.0%
Missing (n) 0
Mean 0.38384
0
549
1
342
Value Count Frequency (%)  
0 549 61.6%
 
1 342 38.4%
 

Ticket
Categorical

Distinct count 681
Unique (%) 76.4%
Missing (%) 0.0%
Missing (n) 0
1601
 
7
CA. 2343
 
7
347082
 
7
Other values (678)
870
Value Count Frequency (%)  
1601 7 0.8%
 
CA. 2343 7 0.8%
 
347082 7 0.8%
 
347088 6 0.7%
 
3101295 6 0.7%
 
CA 2144 6 0.7%
 
382652 5 0.6%
 
S.O.C. 14879 5 0.6%
 
2666 4 0.4%
 
17421 4 0.4%
 
Other values (671) 834 93.6%
 

Correlations

Sample

PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
In [9]:
df.head()
Out[9]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
In [ ]: