Besides the features already mentioned, Python has:
It is already installed! Use virtualenv and pip to setup isolated environments and install more packages. Conda is an alternative.
A bit trickier! Best use the Anaconda distribution from Continuum Analytics to install everything you need to get going.
Strong and dynamically typed
x = 23
3*x
69
x = "Hello "
y = "World!"
print(x + y)
Hello World!
print(x + 1)
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-4-17ee3e2a0896> in <module>() ----> 1 print(x + 1) TypeError: Can't convert 'int' object to str implicitly
x = 3
if x > 0:
if x % 2 == 0:
print("Positive, even number!")
else:
print("Positive, odd number!")
else:
print("Non-positive number!")
Positive, odd number!
def bmi(height, weight):
return weight / height**2
print("The BMI is: {:.3}".format(bmi(1.85, 79)))
The BMI is: 23.1
x = (1, 3, 5)
print(x)
(1, 3, 5)
x[2]
5
a, b, c = x
print(a + b + c)
9
a, *others = x
print(a, 'and', others)
1 and [3, 5]
x = [1, 3, 5]
print(x)
[1, 3, 5]
x.append(7)
print(x)
[1, 3, 5, 7]
del x[0]
print(x)
[3, 5, 7]
x = {'a': 1, 'b': 2, 'c': 3}
print(x['b'])
2
x['d'] = 4
print(x)
{'a': 1, 'd': 4, 'b': 2, 'c': 3}
x['dispatch'] = lambda x: x + 41
x['dispatch'](1)
42
Powerful and easy to use data structures like lists and dictionaries allow declarative programming.
x = []
for i in range(5):
x.append(i**2)
print(x)
[0, 1, 4, 9, 16]
# better
x = [i**2 for i in range(5)]
print(x)
[0, 1, 4, 9, 16]
Many more high-level concepts available to express an algorithm as natural as possible.
Many libraries internally use efficient C/C++ and FORTRAN implementations!
Based on the properties of a passenger like:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
df = pd.read_csv('./input/train.csv')
df.head()
Alive | Class | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Port | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 3 | Braund, Mr | male | 22 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 1 | 1 | Cumings, M | female | 38 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 1 | 3 | Heikkinen, | female | 26 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 1 | 1 | Futrelle, | female | 35 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 0 | 3 | Allen, Mr. | male | 35 | 0 | 0 | 373450 | 8.0500 | NaN | S |
# We drop some hard to use columns and define 'Port', 'Sex' and 'Class' as categories
df = df.drop(['Name', 'Ticket', 'Cabin'], axis=1)
df['Port'] = df['Port'].astype('category')
df['Sex'] = df['Sex'].astype('category')
df['Class'] = df['Class'].astype('category')
df.head()
Alive | Class | Sex | Age | SibSp | Parch | Fare | Port | |
---|---|---|---|---|---|---|---|---|
0 | 0 | 3 | male | 22 | 1 | 0 | 7.2500 | S |
1 | 1 | 1 | female | 38 | 1 | 0 | 71.2833 | C |
2 | 1 | 3 | female | 26 | 0 | 0 | 7.9250 | S |
3 | 1 | 1 | female | 35 | 1 | 0 | 53.1000 | S |
4 | 0 | 3 | male | 35 | 0 | 0 | 8.0500 | S |
df.shape
(891, 8)
df.describe()
Alive | Age | SibSp | Parch | Fare | |
---|---|---|---|---|---|
count | 891.000000 | 714.000000 | 891.000000 | 891.000000 | 891.000000 |
mean | 0.383838 | 29.699118 | 0.523008 | 0.381594 | 32.204208 |
std | 0.486592 | 14.526497 | 1.102743 | 0.806057 | 49.693429 |
min | 0.000000 | 0.420000 | 0.000000 | 0.000000 | 0.000000 |
25% | 0.000000 | 20.125000 | 0.000000 | 0.000000 | 7.910400 |
50% | 0.000000 | 28.000000 | 0.000000 | 0.000000 | 14.454200 |
75% | 1.000000 | 38.000000 | 1.000000 | 0.000000 | 31.000000 |
max | 1.000000 | 80.000000 | 8.000000 | 6.000000 | 512.329200 |
df[['Sex', 'Port']].describe()
Sex | Port | |
---|---|---|
count | 891 | 889 |
unique | 2 | 3 |
top | male | S |
freq | 577 | 644 |
# Fill not available observations
age_mean = df['Age'].mean()
df['Age'] = df['Age'].fillna(age_mean)
df['Port'] = df['Port'].fillna('S')
df[['Sex', 'Port']].describe()
Sex | Port | |
---|---|---|
count | 891 | 891 |
unique | 2 | 3 |
top | male | S |
freq | 577 | 646 |
fig, ax = plt.subplots(1, 1, figsize=(12, 5))
ax.axes.set_xlim(0, 80)
g = sns.distplot(df['Age'], color="b", ax=ax)
# Draw a nested barplot to show survival for class and sex
g = sns.factorplot(x="Class", y="Alive", hue="Sex", data=df, size=7, kind="bar", palette="muted")
g.set_ylabels("survival probability")
g.set_xlabels("passenger class")
<seaborn.axisgrid.FacetGrid at 0x7faacdb509b0>
from sklearn.ensemble import RandomForestClassifier
from sklearn.cross_validation import train_test_split
# Define features and target variables
X = df.drop('Alive', axis=1)
y = df['Alive']
# Convert categories to integer values
X['Sex'] = X['Sex'].cat.codes
X['Port'] = X['Port'].cat.codes
# Convert to NumPy arrays
X = X.values
y = y.values
# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
# Create and train the model
model = RandomForestClassifier(n_estimators = 100, random_state=0)
model.fit(X_train, y_train)
# Make some predictions on the test set and compare with the truth
preds = model.predict(X_test)
print("Accuracy: {:.1%}".format(np.mean(preds == y_test)))
Accuracy: 83.4%