# importing required packages
import pandas as pd
import pandas_profiling
import numpy as np
# importing the data
df = pd.read_csv('/Users/lukas/Downloads/titanic/train.csv')
# checking the head
df.head()
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
# descriptive statistics
df.describe()
PassengerId | Survived | Pclass | Age | SibSp | Parch | Fare | |
---|---|---|---|---|---|---|---|
count | 891.000000 | 891.000000 | 891.000000 | 714.000000 | 891.000000 | 891.000000 | 891.000000 |
mean | 446.000000 | 0.383838 | 2.308642 | 29.699118 | 0.523008 | 0.381594 | 32.204208 |
std | 257.353842 | 0.486592 | 0.836071 | 14.526497 | 1.102743 | 0.806057 | 49.693429 |
min | 1.000000 | 0.000000 | 1.000000 | 0.420000 | 0.000000 | 0.000000 | 0.000000 |
25% | 223.500000 | 0.000000 | 2.000000 | 20.125000 | 0.000000 | 0.000000 | 7.910400 |
50% | 446.000000 | 0.000000 | 3.000000 | 28.000000 | 0.000000 | 0.000000 | 14.454200 |
75% | 668.500000 | 1.000000 | 3.000000 | 38.000000 | 1.000000 | 0.000000 | 31.000000 |
max | 891.000000 | 1.000000 | 3.000000 | 80.000000 | 8.000000 | 6.000000 | 512.329200 |
Create an inline report
pandas_profiling.ProfileReport(df)
Dataset info
Number of variables | 12 |
---|---|
Number of observations | 891 |
Total Missing (%) | 8.1% |
Total size in memory | 83.6 KiB |
Average record size in memory | 96.1 B |
Variables types
Numeric | 6 |
---|---|
Categorical | 4 |
Boolean | 1 |
Date | 0 |
Text (Unique) | 1 |
Rejected | 0 |
Unsupported | 0 |
Warnings
Age
Numeric
Distinct count | 89 |
---|---|
Unique (%) | 10.0% |
Missing (%) | 19.9% |
Missing (n) | 177 |
Infinite (%) | 0.0% |
Infinite (n) | 0 |
Mean | 29.699 |
---|---|
Minimum | 0.42 |
Maximum | 80 |
Zeros (%) | 0.0% |
Quantile statistics
Minimum | 0.42 |
---|---|
5-th percentile | 4 |
Q1 | 20.125 |
Median | 28 |
Q3 | 38 |
95-th percentile | 56 |
Maximum | 80 |
Range | 79.58 |
Interquartile range | 17.875 |
Descriptive statistics
Standard deviation | 14.526 |
---|---|
Coef of variation | 0.48912 |
Kurtosis | 0.17827 |
Mean | 29.699 |
MAD | 11.323 |
Skewness | 0.38911 |
Sum | 21205 |
Variance | 211.02 |
Memory size | 7.0 KiB |
Value | Count | Frequency (%) | |
24.0 | 30 | 3.4% | |
22.0 | 27 | 3.0% | |
18.0 | 26 | 2.9% | |
28.0 | 25 | 2.8% | |
19.0 | 25 | 2.8% | |
30.0 | 25 | 2.8% | |
21.0 | 24 | 2.7% | |
25.0 | 23 | 2.6% | |
36.0 | 22 | 2.5% | |
29.0 | 20 | 2.2% | |
Other values (78) | 467 | 52.4% | |
(Missing) | 177 | 19.9% |
Minimum 5 values
Value | Count | Frequency (%) | |
0.42 | 1 | 0.1% | |
0.67 | 1 | 0.1% | |
0.75 | 2 | 0.2% | |
0.83 | 2 | 0.2% | |
0.92 | 1 | 0.1% |
Maximum 5 values
Value | Count | Frequency (%) | |
70.0 | 2 | 0.2% | |
70.5 | 1 | 0.1% | |
71.0 | 2 | 0.2% | |
74.0 | 1 | 0.1% | |
80.0 | 1 | 0.1% |
Cabin
Categorical
Distinct count | 148 |
---|---|
Unique (%) | 16.6% |
Missing (%) | 77.1% |
Missing (n) | 687 |
G6 | 4 |
---|---|
C23 C25 C27 | 4 |
B96 B98 | 4 |
Other values (144) | |
(Missing) |
Value | Count | Frequency (%) | |
G6 | 4 | 0.4% | |
C23 C25 C27 | 4 | 0.4% | |
B96 B98 | 4 | 0.4% | |
D | 3 | 0.3% | |
F2 | 3 | 0.3% | |
F33 | 3 | 0.3% | |
C22 C26 | 3 | 0.3% | |
E101 | 3 | 0.3% | |
E121 | 2 | 0.2% | |
E8 | 2 | 0.2% | |
Other values (137) | 173 | 19.4% | |
(Missing) | 687 | 77.1% |
Embarked
Categorical
Distinct count | 4 |
---|---|
Unique (%) | 0.4% |
Missing (%) | 0.2% |
Missing (n) | 2 |
S | |
---|---|
C | |
Q | 77 |
(Missing) | 2 |
Value | Count | Frequency (%) | |
S | 644 | 72.3% | |
C | 168 | 18.9% | |
Q | 77 | 8.6% | |
(Missing) | 2 | 0.2% |
Fare
Numeric
Distinct count | 248 |
---|---|
Unique (%) | 27.8% |
Missing (%) | 0.0% |
Missing (n) | 0 |
Infinite (%) | 0.0% |
Infinite (n) | 0 |
Mean | 32.204 |
---|---|
Minimum | 0 |
Maximum | 512.33 |
Zeros (%) | 1.7% |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 7.225 |
Q1 | 7.9104 |
Median | 14.454 |
Q3 | 31 |
95-th percentile | 112.08 |
Maximum | 512.33 |
Range | 512.33 |
Interquartile range | 23.09 |
Descriptive statistics
Standard deviation | 49.693 |
---|---|
Coef of variation | 1.5431 |
Kurtosis | 33.398 |
Mean | 32.204 |
MAD | 28.164 |
Skewness | 4.7873 |
Sum | 28694 |
Variance | 2469.4 |
Memory size | 7.0 KiB |
Value | Count | Frequency (%) | |
8.05 | 43 | 4.8% | |
13.0 | 42 | 4.7% | |
7.8958 | 38 | 4.3% | |
7.75 | 34 | 3.8% | |
26.0 | 31 | 3.5% | |
10.5 | 24 | 2.7% | |
7.925 | 18 | 2.0% | |
7.775 | 16 | 1.8% | |
26.55 | 15 | 1.7% | |
0.0 | 15 | 1.7% | |
Other values (238) | 615 | 69.0% |
Minimum 5 values
Value | Count | Frequency (%) | |
0.0 | 15 | 1.7% | |
4.0125 | 1 | 0.1% | |
5.0 | 1 | 0.1% | |
6.2375 | 1 | 0.1% | |
6.4375 | 1 | 0.1% |
Maximum 5 values
Value | Count | Frequency (%) | |
227.525 | 4 | 0.4% | |
247.5208 | 2 | 0.2% | |
262.375 | 2 | 0.2% | |
263.0 | 4 | 0.4% | |
512.3292 | 3 | 0.3% |
Name
Categorical, Unique
First 3 values |
---|
Hansen, Mr. Henrik Juul |
Aubart, Mme. Leontine Pauline |
Abbott, Mrs. Stanton (Rosa Hunt) |
Last 3 values |
---|
Pickard, Mr. Berk (Berk Trembisky) |
Goldenberg, Mrs. Samuel L (Edwiga Grabowska) |
Karlsson, Mr. Nils August |
First 10 values
Value | Count | Frequency (%) | |
Abbing, Mr. Anthony | 1 | 0.1% | |
Abbott, Mr. Rossmore Edward | 1 | 0.1% | |
Abbott, Mrs. Stanton (Rosa Hunt) | 1 | 0.1% | |
Abelson, Mr. Samuel | 1 | 0.1% | |
Abelson, Mrs. Samuel (Hannah Wizosky) | 1 | 0.1% |
Last 10 values
Value | Count | Frequency (%) | |
de Mulder, Mr. Theodore | 1 | 0.1% | |
de Pelsmaeker, Mr. Alfons | 1 | 0.1% | |
del Carlo, Mr. Sebastiano | 1 | 0.1% | |
van Billiard, Mr. Austin Blyler | 1 | 0.1% | |
van Melkebeke, Mr. Philemon | 1 | 0.1% |
Parch
Numeric
Distinct count | 7 |
---|---|
Unique (%) | 0.8% |
Missing (%) | 0.0% |
Missing (n) | 0 |
Infinite (%) | 0.0% |
Infinite (n) | 0 |
Mean | 0.38159 |
---|---|
Minimum | 0 |
Maximum | 6 |
Zeros (%) | 76.1% |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 0 |
Median | 0 |
Q3 | 0 |
95-th percentile | 2 |
Maximum | 6 |
Range | 6 |
Interquartile range | 0 |
Descriptive statistics
Standard deviation | 0.80606 |
---|---|
Coef of variation | 2.1123 |
Kurtosis | 9.7781 |
Mean | 0.38159 |
MAD | 0.58074 |
Skewness | 2.7491 |
Sum | 340 |
Variance | 0.64973 |
Memory size | 7.0 KiB |
Value | Count | Frequency (%) | |
0 | 678 | 76.1% | |
1 | 118 | 13.2% | |
2 | 80 | 9.0% | |
5 | 5 | 0.6% | |
3 | 5 | 0.6% | |
4 | 4 | 0.4% | |
6 | 1 | 0.1% |
Minimum 5 values
Value | Count | Frequency (%) | |
0 | 678 | 76.1% | |
1 | 118 | 13.2% | |
2 | 80 | 9.0% | |
3 | 5 | 0.6% | |
4 | 4 | 0.4% |
Maximum 5 values
Value | Count | Frequency (%) | |
2 | 80 | 9.0% | |
3 | 5 | 0.6% | |
4 | 4 | 0.4% | |
5 | 5 | 0.6% | |
6 | 1 | 0.1% |
PassengerId
Numeric
Distinct count | 891 |
---|---|
Unique (%) | 100.0% |
Missing (%) | 0.0% |
Missing (n) | 0 |
Infinite (%) | 0.0% |
Infinite (n) | 0 |
Mean | 446 |
---|---|
Minimum | 1 |
Maximum | 891 |
Zeros (%) | 0.0% |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 45.5 |
Q1 | 223.5 |
Median | 446 |
Q3 | 668.5 |
95-th percentile | 846.5 |
Maximum | 891 |
Range | 890 |
Interquartile range | 445 |
Descriptive statistics
Standard deviation | 257.35 |
---|---|
Coef of variation | 0.57703 |
Kurtosis | -1.2 |
Mean | 446 |
MAD | 222.75 |
Skewness | 0 |
Sum | 397386 |
Variance | 66231 |
Memory size | 7.0 KiB |
Value | Count | Frequency (%) | |
891 | 1 | 0.1% | |
293 | 1 | 0.1% | |
304 | 1 | 0.1% | |
303 | 1 | 0.1% | |
302 | 1 | 0.1% | |
301 | 1 | 0.1% | |
300 | 1 | 0.1% | |
299 | 1 | 0.1% | |
298 | 1 | 0.1% | |
297 | 1 | 0.1% | |
Other values (881) | 881 | 98.9% |
Minimum 5 values
Value | Count | Frequency (%) | |
1 | 1 | 0.1% | |
2 | 1 | 0.1% | |
3 | 1 | 0.1% | |
4 | 1 | 0.1% | |
5 | 1 | 0.1% |
Maximum 5 values
Value | Count | Frequency (%) | |
887 | 1 | 0.1% | |
888 | 1 | 0.1% | |
889 | 1 | 0.1% | |
890 | 1 | 0.1% | |
891 | 1 | 0.1% |
Pclass
Numeric
Distinct count | 3 |
---|---|
Unique (%) | 0.3% |
Missing (%) | 0.0% |
Missing (n) | 0 |
Infinite (%) | 0.0% |
Infinite (n) | 0 |
Mean | 2.3086 |
---|---|
Minimum | 1 |
Maximum | 3 |
Zeros (%) | 0.0% |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 1 |
Q1 | 2 |
Median | 3 |
Q3 | 3 |
95-th percentile | 3 |
Maximum | 3 |
Range | 2 |
Interquartile range | 1 |
Descriptive statistics
Standard deviation | 0.83607 |
---|---|
Coef of variation | 0.36215 |
Kurtosis | -1.28 |
Mean | 2.3086 |
MAD | 0.76197 |
Skewness | -0.63055 |
Sum | 2057 |
Variance | 0.69902 |
Memory size | 7.0 KiB |
Value | Count | Frequency (%) | |
3 | 491 | 55.1% | |
1 | 216 | 24.2% | |
2 | 184 | 20.7% |
Minimum 5 values
Value | Count | Frequency (%) | |
1 | 216 | 24.2% | |
2 | 184 | 20.7% | |
3 | 491 | 55.1% |
Maximum 5 values
Value | Count | Frequency (%) | |
1 | 216 | 24.2% | |
2 | 184 | 20.7% | |
3 | 491 | 55.1% |
Sex
Categorical
Distinct count | 2 |
---|---|
Unique (%) | 0.2% |
Missing (%) | 0.0% |
Missing (n) | 0 |
male | |
---|---|
female |
Value | Count | Frequency (%) | |
male | 577 | 64.8% | |
female | 314 | 35.2% |
SibSp
Numeric
Distinct count | 7 |
---|---|
Unique (%) | 0.8% |
Missing (%) | 0.0% |
Missing (n) | 0 |
Infinite (%) | 0.0% |
Infinite (n) | 0 |
Mean | 0.52301 |
---|---|
Minimum | 0 |
Maximum | 8 |
Zeros (%) | 68.2% |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 0 |
Median | 0 |
Q3 | 1 |
95-th percentile | 3 |
Maximum | 8 |
Range | 8 |
Interquartile range | 1 |
Descriptive statistics
Standard deviation | 1.1027 |
---|---|
Coef of variation | 2.1085 |
Kurtosis | 17.88 |
Mean | 0.52301 |
MAD | 0.71378 |
Skewness | 3.6954 |
Sum | 466 |
Variance | 1.216 |
Memory size | 7.0 KiB |
Value | Count | Frequency (%) | |
0 | 608 | 68.2% | |
1 | 209 | 23.5% | |
2 | 28 | 3.1% | |
4 | 18 | 2.0% | |
3 | 16 | 1.8% | |
8 | 7 | 0.8% | |
5 | 5 | 0.6% |
Minimum 5 values
Value | Count | Frequency (%) | |
0 | 608 | 68.2% | |
1 | 209 | 23.5% | |
2 | 28 | 3.1% | |
3 | 16 | 1.8% | |
4 | 18 | 2.0% |
Maximum 5 values
Value | Count | Frequency (%) | |
2 | 28 | 3.1% | |
3 | 16 | 1.8% | |
4 | 18 | 2.0% | |
5 | 5 | 0.6% | |
8 | 7 | 0.8% |
Survived
Boolean
Distinct count | 2 |
---|---|
Unique (%) | 0.2% |
Missing (%) | 0.0% |
Missing (n) | 0 |
Mean | 0.38384 |
---|
0 | |
---|---|
1 |
Value | Count | Frequency (%) | |
0 | 549 | 61.6% | |
1 | 342 | 38.4% |
Ticket
Categorical
Distinct count | 681 |
---|---|
Unique (%) | 76.4% |
Missing (%) | 0.0% |
Missing (n) | 0 |
1601 | 7 |
---|---|
CA. 2343 | 7 |
347082 | 7 |
Other values (678) |
Value | Count | Frequency (%) | |
1601 | 7 | 0.8% | |
CA. 2343 | 7 | 0.8% | |
347082 | 7 | 0.8% | |
347088 | 6 | 0.7% | |
3101295 | 6 | 0.7% | |
CA 2144 | 6 | 0.7% | |
382652 | 5 | 0.6% | |
S.O.C. 14879 | 5 | 0.6% | |
2666 | 4 | 0.4% | |
17421 | 4 | 0.4% | |
Other values (671) | 834 | 93.6% |
# sample vs. head
df.sample(5)
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
535 | 536 | 1 | 2 | Hart, Miss. Eva Miriam | female | 7.0 | 0 | 2 | F.C.C. 13529 | 26.2500 | NaN | S |
621 | 622 | 1 | 1 | Kimball, Mr. Edwin Nelson Jr | male | 42.0 | 1 | 0 | 11753 | 52.5542 | D19 | S |
22 | 23 | 1 | 3 | McGowan, Miss. Anna "Annie" | female | 15.0 | 0 | 0 | 330923 | 8.0292 | NaN | Q |
655 | 656 | 0 | 2 | Hickman, Mr. Leonard Mark | male | 24.0 | 2 | 0 | S.O.C. 14879 | 73.5000 | NaN | S |
136 | 137 | 1 | 1 | Newsom, Miss. Helen Monypeny | female | 19.0 | 0 | 2 | 11752 | 26.2833 | D47 | S |
Save Report to HTML
pfr = pandas_profiling.ProfileReport(df)
pfr.to_file("/tmp/example.html")
pfr
Dataset info
Number of variables | 12 |
---|---|
Number of observations | 891 |
Total Missing (%) | 8.1% |
Total size in memory | 83.6 KiB |
Average record size in memory | 96.1 B |
Variables types
Numeric | 6 |
---|---|
Categorical | 4 |
Boolean | 1 |
Date | 0 |
Text (Unique) | 1 |
Rejected | 0 |
Unsupported | 0 |
Warnings
Age
Numeric
Distinct count | 89 |
---|---|
Unique (%) | 10.0% |
Missing (%) | 19.9% |
Missing (n) | 177 |
Infinite (%) | 0.0% |
Infinite (n) | 0 |
Mean | 29.699 |
---|---|
Minimum | 0.42 |
Maximum | 80 |
Zeros (%) | 0.0% |
Quantile statistics
Minimum | 0.42 |
---|---|
5-th percentile | 4 |
Q1 | 20.125 |
Median | 28 |
Q3 | 38 |
95-th percentile | 56 |
Maximum | 80 |
Range | 79.58 |
Interquartile range | 17.875 |
Descriptive statistics
Standard deviation | 14.526 |
---|---|
Coef of variation | 0.48912 |
Kurtosis | 0.17827 |
Mean | 29.699 |
MAD | 11.323 |
Skewness | 0.38911 |
Sum | 21205 |
Variance | 211.02 |
Memory size | 7.0 KiB |
Value | Count | Frequency (%) | |
24.0 | 30 | 3.4% | |
22.0 | 27 | 3.0% | |
18.0 | 26 | 2.9% | |
28.0 | 25 | 2.8% | |
19.0 | 25 | 2.8% | |
30.0 | 25 | 2.8% | |
21.0 | 24 | 2.7% | |
25.0 | 23 | 2.6% | |
36.0 | 22 | 2.5% | |
29.0 | 20 | 2.2% | |
Other values (78) | 467 | 52.4% | |
(Missing) | 177 | 19.9% |
Minimum 5 values
Value | Count | Frequency (%) | |
0.42 | 1 | 0.1% | |
0.67 | 1 | 0.1% | |
0.75 | 2 | 0.2% | |
0.83 | 2 | 0.2% | |
0.92 | 1 | 0.1% |
Maximum 5 values
Value | Count | Frequency (%) | |
70.0 | 2 | 0.2% | |
70.5 | 1 | 0.1% | |
71.0 | 2 | 0.2% | |
74.0 | 1 | 0.1% | |
80.0 | 1 | 0.1% |
Cabin
Categorical
Distinct count | 148 |
---|---|
Unique (%) | 16.6% |
Missing (%) | 77.1% |
Missing (n) | 687 |
G6 | 4 |
---|---|
C23 C25 C27 | 4 |
B96 B98 | 4 |
Other values (144) | |
(Missing) |
Value | Count | Frequency (%) | |
G6 | 4 | 0.4% | |
C23 C25 C27 | 4 | 0.4% | |
B96 B98 | 4 | 0.4% | |
D | 3 | 0.3% | |
F2 | 3 | 0.3% | |
F33 | 3 | 0.3% | |
C22 C26 | 3 | 0.3% | |
E101 | 3 | 0.3% | |
E121 | 2 | 0.2% | |
E8 | 2 | 0.2% | |
Other values (137) | 173 | 19.4% | |
(Missing) | 687 | 77.1% |
Embarked
Categorical
Distinct count | 4 |
---|---|
Unique (%) | 0.4% |
Missing (%) | 0.2% |
Missing (n) | 2 |
S | |
---|---|
C | |
Q | 77 |
(Missing) | 2 |
Value | Count | Frequency (%) | |
S | 644 | 72.3% | |
C | 168 | 18.9% | |
Q | 77 | 8.6% | |
(Missing) | 2 | 0.2% |
Fare
Numeric
Distinct count | 248 |
---|---|
Unique (%) | 27.8% |
Missing (%) | 0.0% |
Missing (n) | 0 |
Infinite (%) | 0.0% |
Infinite (n) | 0 |
Mean | 32.204 |
---|---|
Minimum | 0 |
Maximum | 512.33 |
Zeros (%) | 1.7% |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 7.225 |
Q1 | 7.9104 |
Median | 14.454 |
Q3 | 31 |
95-th percentile | 112.08 |
Maximum | 512.33 |
Range | 512.33 |
Interquartile range | 23.09 |
Descriptive statistics
Standard deviation | 49.693 |
---|---|
Coef of variation | 1.5431 |
Kurtosis | 33.398 |
Mean | 32.204 |
MAD | 28.164 |
Skewness | 4.7873 |
Sum | 28694 |
Variance | 2469.4 |
Memory size | 7.0 KiB |
Value | Count | Frequency (%) | |
8.05 | 43 | 4.8% | |
13.0 | 42 | 4.7% | |
7.8958 | 38 | 4.3% | |
7.75 | 34 | 3.8% | |
26.0 | 31 | 3.5% | |
10.5 | 24 | 2.7% | |
7.925 | 18 | 2.0% | |
7.775 | 16 | 1.8% | |
26.55 | 15 | 1.7% | |
0.0 | 15 | 1.7% | |
Other values (238) | 615 | 69.0% |
Minimum 5 values
Value | Count | Frequency (%) | |
0.0 | 15 | 1.7% | |
4.0125 | 1 | 0.1% | |
5.0 | 1 | 0.1% | |
6.2375 | 1 | 0.1% | |
6.4375 | 1 | 0.1% |
Maximum 5 values
Value | Count | Frequency (%) | |
227.525 | 4 | 0.4% | |
247.5208 | 2 | 0.2% | |
262.375 | 2 | 0.2% | |
263.0 | 4 | 0.4% | |
512.3292 | 3 | 0.3% |
Name
Categorical, Unique
First 3 values |
---|
Hansen, Mr. Henrik Juul |
Aubart, Mme. Leontine Pauline |
Abbott, Mrs. Stanton (Rosa Hunt) |
Last 3 values |
---|
Pickard, Mr. Berk (Berk Trembisky) |
Goldenberg, Mrs. Samuel L (Edwiga Grabowska) |
Karlsson, Mr. Nils August |
First 10 values
Value | Count | Frequency (%) | |
Abbing, Mr. Anthony | 1 | 0.1% | |
Abbott, Mr. Rossmore Edward | 1 | 0.1% | |
Abbott, Mrs. Stanton (Rosa Hunt) | 1 | 0.1% | |
Abelson, Mr. Samuel | 1 | 0.1% | |
Abelson, Mrs. Samuel (Hannah Wizosky) | 1 | 0.1% |
Last 10 values
Value | Count | Frequency (%) | |
de Mulder, Mr. Theodore | 1 | 0.1% | |
de Pelsmaeker, Mr. Alfons | 1 | 0.1% | |
del Carlo, Mr. Sebastiano | 1 | 0.1% | |
van Billiard, Mr. Austin Blyler | 1 | 0.1% | |
van Melkebeke, Mr. Philemon | 1 | 0.1% |
Parch
Numeric
Distinct count | 7 |
---|---|
Unique (%) | 0.8% |
Missing (%) | 0.0% |
Missing (n) | 0 |
Infinite (%) | 0.0% |
Infinite (n) | 0 |
Mean | 0.38159 |
---|---|
Minimum | 0 |
Maximum | 6 |
Zeros (%) | 76.1% |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 0 |
Median | 0 |
Q3 | 0 |
95-th percentile | 2 |
Maximum | 6 |
Range | 6 |
Interquartile range | 0 |
Descriptive statistics
Standard deviation | 0.80606 |
---|---|
Coef of variation | 2.1123 |
Kurtosis | 9.7781 |
Mean | 0.38159 |
MAD | 0.58074 |
Skewness | 2.7491 |
Sum | 340 |
Variance | 0.64973 |
Memory size | 7.0 KiB |
Value | Count | Frequency (%) | |
0 | 678 | 76.1% | |
1 | 118 | 13.2% | |
2 | 80 | 9.0% | |
5 | 5 | 0.6% | |
3 | 5 | 0.6% | |
4 | 4 | 0.4% | |
6 | 1 | 0.1% |
Minimum 5 values
Value | Count | Frequency (%) | |
0 | 678 | 76.1% | |
1 | 118 | 13.2% | |
2 | 80 | 9.0% | |
3 | 5 | 0.6% | |
4 | 4 | 0.4% |
Maximum 5 values
Value | Count | Frequency (%) | |
2 | 80 | 9.0% | |
3 | 5 | 0.6% | |
4 | 4 | 0.4% | |
5 | 5 | 0.6% | |
6 | 1 | 0.1% |
PassengerId
Numeric
Distinct count | 891 |
---|---|
Unique (%) | 100.0% |
Missing (%) | 0.0% |
Missing (n) | 0 |
Infinite (%) | 0.0% |
Infinite (n) | 0 |
Mean | 446 |
---|---|
Minimum | 1 |
Maximum | 891 |
Zeros (%) | 0.0% |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 45.5 |
Q1 | 223.5 |
Median | 446 |
Q3 | 668.5 |
95-th percentile | 846.5 |
Maximum | 891 |
Range | 890 |
Interquartile range | 445 |
Descriptive statistics
Standard deviation | 257.35 |
---|---|
Coef of variation | 0.57703 |
Kurtosis | -1.2 |
Mean | 446 |
MAD | 222.75 |
Skewness | 0 |
Sum | 397386 |
Variance | 66231 |
Memory size | 7.0 KiB |
Value | Count | Frequency (%) | |
891 | 1 | 0.1% | |
293 | 1 | 0.1% | |
304 | 1 | 0.1% | |
303 | 1 | 0.1% | |
302 | 1 | 0.1% | |
301 | 1 | 0.1% | |
300 | 1 | 0.1% | |
299 | 1 | 0.1% | |
298 | 1 | 0.1% | |
297 | 1 | 0.1% | |
Other values (881) | 881 | 98.9% |
Minimum 5 values
Value | Count | Frequency (%) | |
1 | 1 | 0.1% | |
2 | 1 | 0.1% | |
3 | 1 | 0.1% | |
4 | 1 | 0.1% | |
5 | 1 | 0.1% |
Maximum 5 values
Value | Count | Frequency (%) | |
887 | 1 | 0.1% | |
888 | 1 | 0.1% | |
889 | 1 | 0.1% | |
890 | 1 | 0.1% | |
891 | 1 | 0.1% |
Pclass
Numeric
Distinct count | 3 |
---|---|
Unique (%) | 0.3% |
Missing (%) | 0.0% |
Missing (n) | 0 |
Infinite (%) | 0.0% |
Infinite (n) | 0 |
Mean | 2.3086 |
---|---|
Minimum | 1 |
Maximum | 3 |
Zeros (%) | 0.0% |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 1 |
Q1 | 2 |
Median | 3 |
Q3 | 3 |
95-th percentile | 3 |
Maximum | 3 |
Range | 2 |
Interquartile range | 1 |
Descriptive statistics
Standard deviation | 0.83607 |
---|---|
Coef of variation | 0.36215 |
Kurtosis | -1.28 |
Mean | 2.3086 |
MAD | 0.76197 |
Skewness | -0.63055 |
Sum | 2057 |
Variance | 0.69902 |
Memory size | 7.0 KiB |
Value | Count | Frequency (%) | |
3 | 491 | 55.1% | |
1 | 216 | 24.2% | |
2 | 184 | 20.7% |
Minimum 5 values
Value | Count | Frequency (%) | |
1 | 216 | 24.2% | |
2 | 184 | 20.7% | |
3 | 491 | 55.1% |
Maximum 5 values
Value | Count | Frequency (%) | |
1 | 216 | 24.2% | |
2 | 184 | 20.7% | |
3 | 491 | 55.1% |
Sex
Categorical
Distinct count | 2 |
---|---|
Unique (%) | 0.2% |
Missing (%) | 0.0% |
Missing (n) | 0 |
male | |
---|---|
female |
Value | Count | Frequency (%) | |
male | 577 | 64.8% | |
female | 314 | 35.2% |
SibSp
Numeric
Distinct count | 7 |
---|---|
Unique (%) | 0.8% |
Missing (%) | 0.0% |
Missing (n) | 0 |
Infinite (%) | 0.0% |
Infinite (n) | 0 |
Mean | 0.52301 |
---|---|
Minimum | 0 |
Maximum | 8 |
Zeros (%) | 68.2% |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 0 |
Median | 0 |
Q3 | 1 |
95-th percentile | 3 |
Maximum | 8 |
Range | 8 |
Interquartile range | 1 |
Descriptive statistics
Standard deviation | 1.1027 |
---|---|
Coef of variation | 2.1085 |
Kurtosis | 17.88 |
Mean | 0.52301 |
MAD | 0.71378 |
Skewness | 3.6954 |
Sum | 466 |
Variance | 1.216 |
Memory size | 7.0 KiB |
Value | Count | Frequency (%) | |
0 | 608 | 68.2% | |
1 | 209 | 23.5% | |
2 | 28 | 3.1% | |
4 | 18 | 2.0% | |
3 | 16 | 1.8% | |
8 | 7 | 0.8% | |
5 | 5 | 0.6% |
Minimum 5 values
Value | Count | Frequency (%) | |
0 | 608 | 68.2% | |
1 | 209 | 23.5% | |
2 | 28 | 3.1% | |
3 | 16 | 1.8% | |
4 | 18 | 2.0% |
Maximum 5 values
Value | Count | Frequency (%) | |
2 | 28 | 3.1% | |
3 | 16 | 1.8% | |
4 | 18 | 2.0% | |
5 | 5 | 0.6% | |
8 | 7 | 0.8% |
Survived
Boolean
Distinct count | 2 |
---|---|
Unique (%) | 0.2% |
Missing (%) | 0.0% |
Missing (n) | 0 |
Mean | 0.38384 |
---|
0 | |
---|---|
1 |
Value | Count | Frequency (%) | |
0 | 549 | 61.6% | |
1 | 342 | 38.4% |
Ticket
Categorical
Distinct count | 681 |
---|---|
Unique (%) | 76.4% |
Missing (%) | 0.0% |
Missing (n) | 0 |
1601 | 7 |
---|---|
CA. 2343 | 7 |
347082 | 7 |
Other values (678) |
Value | Count | Frequency (%) | |
1601 | 7 | 0.8% | |
CA. 2343 | 7 | 0.8% | |
347082 | 7 | 0.8% | |
347088 | 6 | 0.7% | |
3101295 | 6 | 0.7% | |
CA 2144 | 6 | 0.7% | |
382652 | 5 | 0.6% | |
S.O.C. 14879 | 5 | 0.6% | |
2666 | 4 | 0.4% | |
17421 | 4 | 0.4% | |
Other values (671) | 834 | 93.6% |
df.head()
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |