Getting started with plotting in python¶

First, we need some biolerplate code, to load the plotting library and to set up your Jupyter notebook for interactive plotting:

In [1]:

%matplotlib inline
import matplotlib.pyplot as plt

We also need a second library, called pandas, which helps with working with data.

In [2]:

import pandas as pd

Finding documentation for python functions is easy:

In [3]:

?pd.read_csv

Now we read in the population frequency data:

In [11]:

dat = pd.read_csv("population_frequencies.txt", delim_whitespace=True, names=["nr", "pop"])

and verify that it worked:

In [56]:

dat

Out[56]:

	nr	pop
0	9	Abkhasian
1	16	Adygei
2	6	Albanian
3	7	Aleut
4	4	Aleut_Tlingit
5	7	Altaian
6	10	Ami
7	10	Armenian
8	9	Atayal
9	10	Balkar
10	29	Basque
11	25	BedouinA
12	19	BedouinB
13	10	Belarusian
14	6	BolshoyOleniOstrov
15	9	Borneo
16	10	Bulgarian
17	8	Cambodian
18	2	Canary_Islander
19	2	ChalmnyVarre
20	9	Chechen
21	20	Chukchi
22	3	Chukchi1
23	10	Chuvash
24	10	Croatian
25	8	Cypriot
26	10	Czech
27	10	Dai
28	9	Daur
29	4	Dolgan
...	...	...
86	27	Sardinian
87	8	Saudi
88	4	Scottish
89	10	Selkup
90	10	Semende
91	10	She
92	2	Sherpa.DG
93	11	Sicilian
94	53	Spanish
95	5	Spanish_North
96	8	Syrian
97	8	Tajik
98	10	Thai
99	2	Tibetan.DG
100	10	Tu
101	22	Tubalar
102	10	Tujia
103	50	Turkish
104	7	Turkmen
105	10	Tuvinian
106	9	Ukrainian
107	25	Ulchi
108	10	Uygur
109	10	Uzbek
110	3	WHG
111	7	Xibo
112	20	Yakut
113	9	Yamnaya_Samara
114	10	Yi
115	19	Yukagir

116 rows × 2 columns

OK, so let's proceed with simple plotting:

In [52]:

plt.plot(dat["nr"])

Out[52]:

[<matplotlib.lines.Line2D at 0x7f8e0c1af198>]

Not bad, but we'd like to sort the values. For that we use the sort_values function:

In [54]:

?dat.sort_values

In [57]:

dat_sorted = dat.sort_values(by="nr")

In [59]:

dat_sorted

Out[59]:

	nr	pop
44	1	Italian_South
56	1	JK2065
85	1	Saami_WGA
67	2	Levanluhta
84	2	Saami.DG
19	2	ChalmnyVarre
18	2	Canary_Islander
92	2	Sherpa.DG
99	2	Tibetan.DG
22	3	Chukchi1
110	3	WHG
88	4	Scottish
29	4	Dolgan
4	4	Aleut_Tlingit
95	5	Spanish_North
50	6	Jew_Iraqi
60	6	Korean
45	6	Itelmen
14	6	BolshoyOleniOstrov
2	6	Albanian
73	6	Mongola
52	6	Jew_Moroccan
47	7	Jew_Ashkenazi
104	7	Turkmen
48	7	Jew_Georgian
53	7	Jew_Tunisian
3	7	Aleut
5	7	Altaian
111	7	Xibo
42	8	Iranian
...	...	...
31	10	English
105	10	Tuvinian
13	10	Belarusian
78	11	Norwegian
76	11	Nganasan
93	11	Sicilian
41	12	Icelandic
79	13	Orcadian
65	14	LBK_EN
1	16	Adygei
115	19	Yukagir
12	19	BedouinB
21	20	Chukchi
112	20	Yakut
43	20	Italian_North
40	20	Hungarian
37	20	Greek
101	22	Tubalar
83	22	Russian
107	25	Ulchi
11	25	BedouinA
86	27	Sardinian
10	29	Basque
46	29	Japanese
35	32	French
82	38	Palestinian
30	39	Druze
38	43	Han
103	50	Turkish
94	53	Spanish

116 rows × 2 columns

In [55]:

x = range(len(dat_sorted))
y = dat_sorted["nr"]
plt.plot(x, y)

Out[55]:

[<matplotlib.lines.Line2D at 0x7f8e0bff15c0>]

Now we just need to add tick labels and change the size of the plot:

In [51]:

dat_sorted = dat.sort_values(by="nr")
y = dat_sorted["nr"]
x = range(len(y))
xticks = dat_sorted["pop"]
plt.figure(figsize=(20,8))
plt.plot(x, y)
plt.xticks(x, xticks, rotation="vertical");

OK, this was a very short introduction to python and plotting. Clearly there is a lot more to learn, but hopefully this may serve as a teaser for learning more about it. The matplotlib- and pandas-libraries are both well documented, check out the linked websites to find out more.