Python - as any programming language - has many extensions and libraries at its disposal. Basically, there exist libraries for everything.
Basically, libraries are a collection of methods (small pieces of code where you put sth in and get sth else out) which you can use to analyse your data, visualise your data, run models ... do anything you like.
As said, methods usually take something as input. That something is usually a variable.
In the following, we will work our way from variables to libraries.
Variables are one of the simplest types of objects in a programming language. An [object](https://en.wikipedia.org/wiki/Object_(computer_science) is a value stored in the memory of your computer, marked by a specific identifyer. Variables can have different types, such as strings, numbers, and booleans. Differently to other programming languages, you do not need to declare the type of a variable, as variables are handled as objects in Python.
x = 4.2 # floating point number
y = 'Hello World!' # string
z = True # boolean
x = 4.24725723
print(type(x))
y = 'Hello World! Hello universe'
print(y)
z = True
print(type(z))
<class 'float'> Hello World! Hello universe <class 'bool'>
We can use operations (normal arithmetic operations) to use variables for getting results we want. With numbers, you can add, substract, multiply, divide, basically taking the values from the memory assigned to the variable name and performing calculations.
Let's have a look at operations with numbers and strings. We leave booleans to the side for the moment. We will simply add the variables below.
n1 = 7
n2 = 42
s1 = 'Looking good, '
s2 = 'you are.'
n1 = 7
n2 = 42
s1 = 'Looking good, '
s2 = 'you are.'
first_sum = n1 + n2
print(first_sum)
first_conc = s1 + s2
print(first_conc)
49 Looking good, you are.
Variables can be more than just a number. If you think of an Excel-Spreadsheet, a variable can be the content of a single cell, or multiple cells can be combined in one variable (e.g. one column of an Excel table).
So let's create a list -a collection of variables - from x
, n1
, and n2
. Lists in python are created using [ ].
Now, if you want to calculate the sum of this list, it is really exhausting to sum up every item of this list manually.
first_list = [x, n1, n2]
# a sum of a list could look like
second_sum = some_list[0] + some_list[1] + ... + some_list[n] # where n is the last item of the list, e.g. 2 for first_list.
Actually, writing the second sum like this is the same as before. It would be great, if this step of calculating the sum could be used many times without writing it out. And this is, what functions are for. For example, there already exists a sum function:
sum(first_list)```
first_list = [x, n1, n2]
second_sum = first_list[0] + first_list[1] + first_list[2]
print('manual sum {}'.format(second_sum))
# This can also be done with a function
print('sum function {}'.format(sum(first_list)))
manual sum 53.2 sum function 53.2
The sum()
method we used above is a function.
Functions (later we will call them methods) are pieces of code, which take an input, perform some kind of operation, and (optionally) return an output.
In Python, functions are written like:
def func(input):
"""
Description of the functions content # called the function header
"""
some kind of operation on input # called the function body
return output
As an example, we write a sumup
function which sums up a list.
def sumup(inp):
"""
input: inp - list/array with floating point or integer numbers
return: sumd - scalar value of the summed up list
"""
val = 0
for i in inp:
val = val + i
return val
# let's compare the implemented standard sum function with the new sumup function
sum1 = sum(first_list)
sum2 = sumup(first_list)
print("The python sum function yields {}, \nand our sumup function yields {}.".format(*(sum1,sum2)))
The python sum function yields 53.2, and our sumup function yields 53.2.
# summing up the numbers from 1 to 100
import numpy as np
ar_2_sum = np.linspace(1,100,100, dtype='i')
print("the sum of the array is: {}".format(sumup(ar_2_sum)))
the sum of the array is: 5050
As we see above, functions are quite practical and save a lot of time. Further, they help structuring your code. Some functions are directly available in python without any libraries or other external software. In the example above however, you might have noticed, that we import
ed a library called numpy
.
In those libraries, functions are merged to one package, having the advantage that you don't need to import each single function at a time.
Imagine you move and have to pack all your belongings. You can think of libraries as packing things with similar purpose in the same box (= library).
When we talk about functions in the environment of classes, we usually call them methods. But what are classes?
Classes are ways to bundle functionality together. Logically, functionality with similar purpose (or different kind of similarity).
One example could be: think of apples.
Apples are now a class. You can apply methods to this class, such as eat()
or cut()
. Or more sophisticated methods including various recipes using apples comprised in a cookbook.
The eat()
method is straight forward. But the cut()
method may be more interesting, since there are various ways to cut an apple.
Let's assume there are two apples to be cut differently. In python, once you have assigned a class to a variable, you have created an instance of that class. Then, methods of are applied to that instance by using a . notation.
Golden_Delicious = apple()
Yoya = apple()
Golden_Delicious.cut(4)
Yoya.cut(8)
The two apples Golden Delicious and Yoya are instances of the class apple. Real incarnations of the abstract concept apple. The Golden Delicious is cut into 4 pieces, while the Yoya is cut into 8 pieces.
This is similar to more complex libraries, such as the scikit-learn
. In one exercise, you used the command:
from sklearn.cluster import KMeans
which simply imports the class KMeans
from the library part sklearn.cluster
. KMeans
comprises several methods for clustering, which you can use by calling them similar to the apple example before.
For this, you need to create an instance of the KMeans
class.
...
kmeans_inst = KMeans(n_clusters=n_clusters) # first we create the instance of the KMeans class called kmeans_inst
kmeans_inst.fit(data) # then we apply a method to the instance kmeans_inst
...
An example:
# here we just create the data for clustering
from sklearn.datasets.samples_generator import make_blobs
import matplotlib.pyplot as plt
%matplotlib inline
X, y = make_blobs(n_samples=100, centers=3, cluster_std= 0.5,
random_state=0)
plt.scatter(X[:,0], X[:,1], s=70)
<matplotlib.collections.PathCollection at 0x7fb085832748>
# now we create an instance of the KMeans class
from sklearn.cluster import KMeans
nr_of_clusters = 3 # because we see 3 clusters in the plot above
kmeans_inst = KMeans(n_clusters= nr_of_clusters) # create the instance kmeans_inst
kmeans_inst.fit(X) # apply a method to the instance
y_predict = kmeans_inst.predict(X) # apply another method to the instance and save it in another variable
# lets plot the predicted cluster centers colored in the cluster color
plt.scatter(X[:, 0], X[:, 1], c=y_predict, s=50, cmap='Accent')
centers = kmeans_inst.cluster_centers_ # apply the method to find the new centers of the determined clusters
plt.scatter(centers[:, 0], centers[:, 1], c='red', s=200, alpha=0.6); # plot the cluster centers
This short presentation is meant to make you familiar with the concept of variables, functions, methods and classes. All of which are objects!