Loading libraries and setting up

In [33]:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import scipy

PCA Plotting

Defining the column names:

In [34]:
#names = ["Individual"] + ["PC" + str(i) for i in range(1, 5)] + ["Population"]
names = ["Individual", "PC1", "PC2", "PC3", "PC4", "Population"]

Loading the eigenVec file from the pca run

West-Eurasian PCA

In [35]:
pcaDat = pd.read_csv("/home/training/share/pca_results/pca.WestEurasia.evec",
                     delim_whitespace=True, skiprows=1, names=names)

Looking at the data, we find that it is a matrix, with each individual on one row, and the columns denoting the first 10 principal components. The last column contains the population for each individual:

In [36]:
pcaDat
Out[36]:
Individual PC1 PC2 PC3 PC4 Population
0 Yuk_009 0.0123 0.1252 0.1147 0.0567 Yukagir
1 Yuk_025 0.0120 0.1258 0.1168 0.0576 Yukagir
2 Yuk_022 0.0136 0.1303 0.1186 0.0564 Yukagir
3 Yuk_020 0.0170 0.1278 0.1176 0.0584 Yukagir
4 MC_40 0.0183 0.1226 0.1123 0.0537 Chukchi
5 Yuk_024 0.0144 0.1271 0.1124 0.0584 Yukagir
6 Yuk_023 0.0124 0.1348 0.1238 0.0642 Yukagir
7 MC_16 0.0144 0.1266 0.1169 0.0541 Chukchi
8 MC_15 0.0146 0.1250 0.1119 0.0559 Chukchi
9 MC_18 0.0175 0.1238 0.1167 0.0523 Chukchi
10 Yuk_004 0.0110 0.1273 0.1117 0.0573 Yukagir
11 MC_08 0.0187 0.1253 0.1185 0.0564 Chukchi
12 Nov_005 0.0152 0.1349 0.1285 0.0618 Nganasan
13 MC_25 0.0182 0.1258 0.1196 0.0532 Chukchi
14 Yuk_019 0.0161 0.1327 0.1229 0.0617 Yukagir
15 Yuk_011 0.0152 0.1217 0.1148 0.0569 Yukagir
16 Sesk_47 0.0167 0.1241 0.1177 0.0549 Chukchi1
17 MC_17 0.0180 0.1268 0.1147 0.0544 Chukchi
18 Yuk_021 0.0141 0.1329 0.1210 0.0653 Yukagir
19 MC_06 0.0159 0.1264 0.1135 0.0557 Chukchi
20 MC_38 0.0178 0.1240 0.1143 0.0534 Chukchi
21 MC_14 0.0165 0.1238 0.1114 0.0524 Chukchi
22 Ul5 0.0070 0.1306 0.1144 0.0540 Ulchi
23 Ul31 0.0056 0.1289 0.1182 0.0550 Ulchi
24 Ul65 0.0051 0.1331 0.1117 0.0599 Ulchi
25 Tuba12 0.0172 0.0906 0.0790 0.0362 Tubalar
26 Tuba20 0.0129 0.0894 0.0767 0.0308 Tubalar
27 Nel19 0.0273 0.0605 0.0608 0.0333 Yukagir
28 Nlk16 0.0217 0.0744 0.0753 0.0360 Even
29 Kor66 0.0148 0.1259 0.1157 0.0531 Koryak
... ... ... ... ... ... ...
1259 I0429 0.0413 0.0447 0.0440 0.0098 Yamnaya_Samara
1260 I0438 0.0384 0.0497 0.0399 0.0020 Yamnaya_Samara
1261 I0585 0.0770 -0.0424 0.0372 0.0355 WHG
1262 I0797 -0.0101 -0.0452 -0.0342 -0.0124 LBK_EN
1263 I0795 -0.0057 -0.0495 -0.0429 0.0098 LBK_EN
1264 I0022 -0.0133 -0.0433 -0.0356 -0.0089 LBK_EN
1265 I0026 -0.0142 -0.0438 -0.0430 -0.0027 LBK_EN
1266 I1507 0.0866 -0.0455 0.0393 0.0311 WHG
1267 I0025 -0.0103 -0.0449 -0.0404 -0.0023 LBK_EN
1268 I0443 0.0350 0.0401 0.0412 0.0028 Yamnaya_Samara
1269 I0054 -0.0054 -0.0413 -0.0410 -0.0124 LBK_EN
1270 I0046 -0.0066 -0.0446 -0.0386 -0.0092 LBK_EN
1271 I0048 -0.0128 -0.0367 -0.0388 -0.0129 LBK_EN
1272 I0056 -0.0067 -0.0472 -0.0388 -0.0054 LBK_EN
1273 I0057 -0.0113 -0.0442 -0.0357 -0.0008 LBK_EN
1274 I0100 -0.0063 -0.0455 -0.0410 -0.0051 LBK_EN
1275 I0659 -0.0084 -0.0437 -0.0431 -0.0099 LBK_EN
1276 I0821 -0.0071 -0.0428 -0.0380 -0.0103 LBK_EN
1277 I1550 -0.0107 -0.0386 -0.0402 -0.0039 LBK_EN
1278 BOO001 0.0399 0.0760 0.0915 0.0453 BolshoyOleniOstrov
1279 BOO002 0.0445 0.0735 0.0925 0.0379 BolshoyOleniOstrov
1280 BOO003 0.0466 0.0765 0.0862 0.0415 BolshoyOleniOstrov
1281 BOO004 0.0411 0.0723 0.0938 0.0419 BolshoyOleniOstrov
1282 BOO005 0.0461 0.0731 0.0909 0.0401 BolshoyOleniOstrov
1283 BOO006 0.0394 0.0917 0.1002 0.0438 BolshoyOleniOstrov
1284 CHV001 0.0441 0.0331 0.0587 0.0325 ChalmnyVarre
1285 CHV002 0.0442 0.0351 0.0610 0.0373 ChalmnyVarre
1286 JK1968 0.0398 0.0385 0.0661 0.0299 Levanluhta
1287 JK1970 0.0408 0.0466 0.0600 0.0363 Levanluhta
1288 JK2065 0.0392 -0.0065 0.0195 0.0043 JK2065

1289 rows × 6 columns

We can quickly plot the first two PCs for all individuals:

In [37]:
plt.figure(figsize=(10, 10))
plt.scatter(x=pcaDat["PC1"], y=pcaDat["PC2"])
plt.xlabel("PC1");
plt.ylabel("PC2");

which is not very helpful, because we can't see where each population falls. We can highlight a few populations to get a bit more of a feeling:

In [38]:
plt.figure(figsize=(10, 10))
plt.scatter(x=pcaDat["PC1"], y=pcaDat["PC2"], label="")
for pop in ["Finnish", "Sardinian", "Armenian", "BedouinB"]:
    d = pcaDat[pcaDat["Population"] == pop]
    plt.scatter(x=d["PC1"], y=d["PC2"], label=pop)
plt.legend()
plt.xlabel("PC1");
plt.ylabel("PC2");

We can improve the plot further by plotting only the population that we had in the population list. So we need to load the population list and while we're at it, also add color- and symbol numbers for plotting.

In [41]:
popListDat = pd.read_csv("/home/training/share/WestEurasia.poplist.txt",
                         names=["Population"]).sort_values(by="Population")
nPops = len(popListDat)
nCols = 8
nSymbols = int(nPops / nCols)
colorIndices = [int(i / nSymbols) for i in range(nPops)]
symbolIndices = [i % nSymbols for i in range(nPops)]
popListDat = popListDat.assign(colorIndex=colorIndices, symbolIndex=symbolIndices)
popListDat
Out[41]:
Population colorIndex symbolIndex
1 Abkhasian 0 0
2 Adygei 0 1
3 Albanian 0 2
4 Armenian 0 3
5 Assyrian 0 4
6 Balkar 0 5
7 Basque 0 6
8 BedouinA 0 7
9 BedouinB 1 0
10 Belarusian 1 1
11 Bulgarian 1 2
12 Canary_Islander 1 3
13 Chechen 1 4
0 Chuvash 1 5
14 Croatian 1 6
15 Cypriot 1 7
16 Czech 2 0
17 Druze 2 1
18 English 2 2
19 Estonian 2 3
20 Finnish 2 4
21 French 2 5
22 Georgian 2 6
23 German 2 7
24 Greek 3 0
25 Hungarian 3 1
26 Icelandic 3 2
27 Iranian 3 3
28 Irish 3 4
29 Irish_Ulster 3 5
... ... ... ...
38 Jew_Tunisian 4 6
39 Jew_Turkish 4 7
40 Jew_Yemenite 5 0
41 Jordanian 5 1
42 Kumyk 5 2
44 Lebanese 5 3
43 Lebanese_Christian 5 4
45 Lebanese_Muslim 5 5
46 Lezgin 5 6
47 Lithuanian 5 7
48 Maltese 6 0
49 Mordovian 6 1
50 North_Ossetian 6 2
51 Norwegian 6 3
52 Orcadian 6 4
53 Palestinian 6 5
54 Polish 6 6
55 Romanian 6 7
56 Russian 7 0
57 Sardinian 7 1
58 Saudi 7 2
59 Scottish 7 3
60 Shetlandic 7 4
61 Sicilian 7 5
62 Sorb 7 6
64 Spanish 7 7
63 Spanish_North 8 0
65 Syrian 8 1
66 Turkish 8 2
67 Ukrainian 8 3

68 rows × 3 columns

and we end up with only 720 points. Note that I'm here flipping the x axis to make the correlation to Geography more obvious. We can now plot all points with colors and symbols:

In [42]:
plt.figure(figsize=(10,10))
symbolVec = ["8", "s", "p", "P", "*", "h", "H", "+", "x", "X", "D", "d"]
colorVec = [u'#1f77b4', u'#ff7f0e', u'#2ca02c', u'#d62728', u'#9467bd',
            u'#8c564b', u'#e377c2', u'#7f7f7f', u'#bcbd22', u'#17becf']
for i, row in popListDat.iterrows():
    d = pcaDat[pcaDat.Population == row["Population"]]
    plt.scatter(x=-d["PC1"], y=d["PC2"], c=colorVec[row["colorIndex"]],
                marker=symbolVec[row["symbolIndex"]], label=row["Population"])
plt.xlabel("PC1");
plt.ylabel("PC2");
plt.legend(loc=(1.1, 0), ncol=3)
Out[42]:
<matplotlib.legend.Legend at 0x7f70862abba8>

Adding ancient populations:

In [43]:
plt.figure(figsize=(10,10))
symbolVec = ["8", "s", "p", "P", "*", "h", "H", "+", "x", "X", "D", "d", "v", "<", ">", "^"]
colorVec = [u'#1f77b4', u'#ff7f0e', u'#2ca02c', u'#d62728', u'#9467bd',
            u'#8c564b', u'#e377c2', u'#7f7f7f', u'#bcbd22', u'#17becf']
for i, row in popListDat.iterrows():
    d = pcaDat[pcaDat.Population == row["Population"]]
    plt.scatter(x=-d["PC1"], y=d["PC2"], c=colorVec[row["colorIndex"]],
                marker=symbolVec[row["symbolIndex"]], label=row["Population"])

for i, pop in enumerate(["Levanluhta", "Saami.DG", "BolshoyOleniOstrov", "Yamnaya_Samara", "LBK_EN", "WHG"]):
    d = pcaDat[pcaDat.Population == pop]
    plt.scatter(x=-d["PC1"], y=d["PC2"], c="black", marker=symbolVec[i], label=pop)
plt.xlabel("PC1");
plt.ylabel("PC2");
plt.legend(loc=(1.1, 0), ncol=3)
Out[43]:
<matplotlib.legend.Legend at 0x7f704c7136a0>

East-Eurasian PCA

In [44]:
popListDat = pd.read_csv("/home/training/share/AllEurasia.poplist.txt",
                         names=["Population"]).sort_values(by="Population")
nPops = len(popListDat)
nCols = 9
nSymbols = int(nPops / nCols)
colorIndices = [int(i / nSymbols) for i in range(nPops)]
symbolIndices = [i % nSymbols for i in range(nPops)]
popListDat = popListDat.assign(colorIndex=colorIndices, symbolIndex=symbolIndices)
popListDat
Out[44]:
Population colorIndex symbolIndex
0 Abkhasian 0 0
1 Adygei 0 1
2 Albanian 0 2
3 Aleut 0 3
4 Aleut_Tlingit 0 4
5 Altaian 0 5
6 Ami 0 6
7 Armenian 0 7
8 Assyrian 0 8
9 Atayal 0 9
10 Avar 0 10
11 Azeri 0 11
12 Balkar 0 12
13 Basque 1 0
14 BedouinA 1 1
15 BedouinB 1 2
16 Belarusian 1 3
17 Borneo 1 4
18 Bulgarian 1 5
19 Buryat 1 6
20 Cambodian 1 7
21 Chechen 1 8
22 Chukchi 1 9
23 Chukchi1 1 10
24 Chuvash 1 11
25 Croatian 1 12
26 Cypriot 2 0
27 Czech 2 1
28 Dai 2 2
29 Daur 2 3
... ... ... ...
89 Saami.DG 6 11
90 Saami_WGA 6 12
91 Sardinian 7 0
92 Saudi 7 1
93 Scottish 7 2
94 Selkup 7 3
95 Semende 7 4
96 She 7 5
97 Sherpa.DG 7 6
98 Sicilian 7 7
99 Spanish 7 8
100 Spanish_North 7 9
101 Syrian 7 10
102 Tajik 7 11
103 Thai 7 12
104 Tibetan.DG 8 0
105 Tu 8 1
106 Tubalar 8 2
107 Tujia 8 3
108 Turkish 8 4
109 Turkmen 8 5
110 Tuvinian 8 6
111 Ukrainian 8 7
112 Ulchi 8 8
113 Uygur 8 9
114 Uzbek 8 10
115 Xibo 8 11
116 Yakut 8 12
117 Yi 9 0
118 Yukagir 9 1

119 rows × 3 columns

In [45]:
pcaDat = pd.read_csv("/home/training/share/pca_results/pca.AllEurasia.evec",
                     delim_whitespace=True, skiprows=1, names=names)
In [46]:
plt.figure(figsize=(10,10))
symbolVec = ["8", "s", "p", "P", "*", "h", "H", "+", "x", "X", "D", "d", "v", "<", ">", "^"]
colorVec = [u'#1f77b4', u'#ff7f0e', u'#2ca02c', u'#d62728', u'#9467bd',
            u'#8c564b', u'#e377c2', u'#7f7f7f', u'#bcbd22', u'#17becf']
for i, row in popListDat.iterrows():
    d = pcaDat[pcaDat.Population == row["Population"]]
    plt.scatter(x=-d["PC1"], y=d["PC2"], c=colorVec[row["colorIndex"]],
                marker=symbolVec[row["symbolIndex"]], label=row["Population"])

for i, pop in enumerate(["Levanluhta", "Saami.DG", "BolshoyOleniOstrov", "Yamnaya_Samara", "LBK_EN", "WHG"]):
    d = pcaDat[pcaDat.Population == pop]
    plt.scatter(x=-d["PC1"], y=d["PC2"], c="black", marker=symbolVec[i], label=pop)

plt.xlabel("PC1");
plt.ylabel("PC2");
plt.legend(loc=(1.1, 0), ncol=3)
Out[46]:
<matplotlib.legend.Legend at 0x7f7086207a58>

Outgroup F3 statistics

In [27]:
f3dat_han = pd.read_csv("/home/training/work/share/solutions/outgroupF3_results_Han.txt",
                        delim_whitespace=True,
                        names=["dummy", "A", "B", "C", "F3", "stderr", "Z", "nSNPs"])
f3dat_han
Out[27]:
dummy A B C F3 stderr Z nSNPs
0 result: Han Chuvash Mbuti 0.233652 0.002072 112.782 502678
1 result: Han Albanian Mbuti 0.215629 0.002029 106.291 501734
2 result: Han Armenian Mbuti 0.213724 0.001963 108.882 504370
3 result: Han Bulgarian Mbuti 0.216193 0.001979 109.266 504310
4 result: Han Czech Mbuti 0.218060 0.002002 108.939 504089
5 result: Han Druze Mbuti 0.209551 0.001919 109.205 510853
6 result: Han English Mbuti 0.216959 0.001973 109.954 504161
7 result: Han Estonian Mbuti 0.220730 0.002019 109.332 503503
8 result: Han Finnish Mbuti 0.223447 0.002044 109.345 502217
9 result: Han French Mbuti 0.216623 0.001969 110.012 509613
10 result: Han Georgian Mbuti 0.214295 0.001935 110.721 503598
11 result: Han Greek Mbuti 0.215203 0.001984 108.465 507475
12 result: Han Hungarian Mbuti 0.217894 0.001999 109.004 507409
13 result: Han Icelandic Mbuti 0.218683 0.002015 108.553 504655
14 result: Han Italian_North Mbuti 0.215332 0.001978 108.854 507589
15 result: Han Italian_South Mbuti 0.211787 0.002271 93.265 492400
16 result: Han Lithuanian Mbuti 0.219615 0.002032 108.098 503681
17 result: Han Maltese Mbuti 0.210359 0.001956 107.542 503985
18 result: Han Mordovian Mbuti 0.223469 0.002008 111.296 503441
19 result: Han Norwegian Mbuti 0.218873 0.002023 108.197 504621
20 result: Han Orcadian Mbuti 0.217773 0.002014 108.115 504993
21 result: Han Russian Mbuti 0.223993 0.001995 112.274 506525
22 result: Han Sardinian Mbuti 0.213230 0.001980 107.711 508413
23 result: Han Scottish Mbuti 0.218489 0.002039 107.145 499784
24 result: Han Sicilian Mbuti 0.212272 0.001975 107.486 505477
25 result: Han Spanish_North Mbuti 0.215885 0.002029 106.383 500853
26 result: Han Spanish Mbuti 0.213869 0.001975 108.297 513648
27 result: Han Ukrainian Mbuti 0.218716 0.002007 108.950 503981
28 result: Han Levanluhta Mbuti 0.236252 0.002383 99.123 263049
29 result: Han BolshoyOleniOstrov Mbuti 0.247814 0.002177 113.849 457102
30 result: Han ChalmnyVarre Mbuti 0.233499 0.002304 101.345 366220
31 result: Han Saami.DG Mbuti 0.236198 0.002274 103.852 489038
In [28]:
d=f3dat_han.sort_values(by="F3")
y = range(len(d))
plt.figure(figsize=(6, 8))
plt.errorbar(d["F3"], y, xerr=d["stderr"], fmt='o')
plt.yticks(y, d["B"]);
plt.xlabel("F3(Han, Test; Mbuti)");

F3 bi-plot

We're loading the two outgroupF3 datasets:

In [29]:
outgroupf3dat_Han = pd.read_csv("/home/training/work/share/solutions/outgroupF3_results_Han.txt",
                        delim_whitespace=True,
                        names=["dummy", "A", "B", "C", "F3", "stderr", "Z", "nSNPs"])
outgroupf3dat_MA1 = pd.read_csv("/home/training/work/share/solutions/outgroupF3_results_MA1.txt",
                        delim_whitespace=True,
                        names=["dummy", "A", "B", "C", "F3", "stderr", "Z", "nSNPs"])

outgroupf3dat_merged = outgroupf3dat_Han.merge(outgroupf3dat_MA1, on="B", suffixes=("_Han", "_MA1"))
outgroupf3dat_merged
Out[29]:
dummy_Han A_Han B C_Han F3_Han stderr_Han Z_Han nSNPs_Han dummy_MA1 A_MA1 C_MA1 F3_MA1 stderr_MA1 Z_MA1 nSNPs_MA1
0 result: Han Chuvash Mbuti 0.233652 0.002072 112.782 502678 result: MA1_HG.SG Mbuti 0.243818 0.002349 103.781 350484
1 result: Han Albanian Mbuti 0.215629 0.002029 106.291 501734 result: MA1_HG.SG Mbuti 0.236494 0.002296 103.008 344332
2 result: Han Armenian Mbuti 0.213724 0.001963 108.882 504370 result: MA1_HG.SG Mbuti 0.231399 0.002264 102.229 349612
3 result: Han Bulgarian Mbuti 0.216193 0.001979 109.266 504310 result: MA1_HG.SG Mbuti 0.237498 0.002281 104.103 349800
4 result: Han Czech Mbuti 0.218060 0.002002 108.939 504089 result: MA1_HG.SG Mbuti 0.243224 0.002328 104.457 349553
5 result: Han Druze Mbuti 0.209551 0.001919 109.205 510853 result: MA1_HG.SG Mbuti 0.226740 0.002197 103.193 359004
6 result: Han English Mbuti 0.216959 0.001973 109.954 504161 result: MA1_HG.SG Mbuti 0.243135 0.002317 104.941 349321
7 result: Han Estonian Mbuti 0.220730 0.002019 109.332 503503 result: MA1_HG.SG Mbuti 0.247065 0.002362 104.619 348861
8 result: Han Finnish Mbuti 0.223447 0.002044 109.345 502217 result: MA1_HG.SG Mbuti 0.245684 0.002379 103.266 347208
9 result: Han French Mbuti 0.216623 0.001969 110.012 509613 result: MA1_HG.SG Mbuti 0.240235 0.002269 105.886 357842
10 result: Han Georgian Mbuti 0.214295 0.001935 110.721 503598 result: MA1_HG.SG Mbuti 0.232645 0.002253 103.243 349082
11 result: Han Greek Mbuti 0.215203 0.001984 108.465 507475 result: MA1_HG.SG Mbuti 0.236566 0.002280 103.757 355261
12 result: Han Hungarian Mbuti 0.217894 0.001999 109.004 507409 result: MA1_HG.SG Mbuti 0.241720 0.002313 104.483 355340
13 result: Han Icelandic Mbuti 0.218683 0.002015 108.553 504655 result: MA1_HG.SG Mbuti 0.244488 0.002386 102.481 350287
14 result: Han Italian_North Mbuti 0.215332 0.001978 108.854 507589 result: MA1_HG.SG Mbuti 0.236407 0.002273 104.002 354999
15 result: Han Italian_South Mbuti 0.211787 0.002271 93.265 492400 result: MA1_HG.SG Mbuti 0.230839 0.002767 83.427 321217
16 result: Han Lithuanian Mbuti 0.219615 0.002032 108.098 503681 result: MA1_HG.SG Mbuti 0.246864 0.002403 102.718 348656
17 result: Han Maltese Mbuti 0.210359 0.001956 107.542 503985 result: MA1_HG.SG Mbuti 0.230200 0.002259 101.903 347725
18 result: Han Mordovian Mbuti 0.223469 0.002008 111.296 503441 result: MA1_HG.SG Mbuti 0.245284 0.002346 104.571 350058
19 result: Han Norwegian Mbuti 0.218873 0.002023 108.197 504621 result: MA1_HG.SG Mbuti 0.243930 0.002301 106.031 350182
20 result: Han Orcadian Mbuti 0.217773 0.002014 108.115 504993 result: MA1_HG.SG Mbuti 0.243614 0.002320 105.008 351053
21 result: Han Russian Mbuti 0.223993 0.001995 112.274 506525 result: MA1_HG.SG Mbuti 0.245212 0.002298 106.698 355953
22 result: Han Sardinian Mbuti 0.213230 0.001980 107.711 508413 result: MA1_HG.SG Mbuti 0.231967 0.002264 102.449 355548
23 result: Han Scottish Mbuti 0.218489 0.002039 107.145 499784 result: MA1_HG.SG Mbuti 0.244598 0.002434 100.512 339441
24 result: Han Sicilian Mbuti 0.212272 0.001975 107.486 505477 result: MA1_HG.SG Mbuti 0.231141 0.002260 102.297 351028
25 result: Han Spanish_North Mbuti 0.215885 0.002029 106.383 500853 result: MA1_HG.SG Mbuti 0.238479 0.002426 98.319 341661
26 result: Han Spanish Mbuti 0.213869 0.001975 108.297 513648 result: MA1_HG.SG Mbuti 0.235386 0.002257 104.293 361951
27 result: Han Ukrainian Mbuti 0.218716 0.002007 108.950 503981 result: MA1_HG.SG Mbuti 0.243551 0.002345 103.881 348948
28 result: Han Levanluhta Mbuti 0.236252 0.002383 99.123 263049 result: MA1_HG.SG Mbuti 0.247704 0.003055 81.090 172055
29 result: Han BolshoyOleniOstrov Mbuti 0.247814 0.002177 113.849 457102 result: MA1_HG.SG Mbuti 0.256041 0.002624 97.561 305851
30 result: Han ChalmnyVarre Mbuti 0.233499 0.002304 101.345 366220 result: MA1_HG.SG Mbuti 0.249619 0.002862 87.212 239594
31 result: Han Saami.DG Mbuti 0.236198 0.002274 103.852 489038 result: MA1_HG.SG Mbuti 0.251530 0.002622 95.922 326072

And we can now plot the two F3 statistics against each other:

In [30]:
plt.figure(figsize=(10, 10))
plt.scatter(x=outgroupf3dat_merged["F3_Han"], y=outgroupf3dat_merged["F3_MA1"])
plt.xlabel("F3(Test, Han; Mbuti)");
plt.ylabel("F3(Test, MA1; Mbuti)");

This isn't very useful yet, as we need to again label the points

In [31]:
plt.figure(figsize=(10, 10))
plt.scatter(x=outgroupf3dat_merged["F3_Han"], y=outgroupf3dat_merged["F3_MA1"])
for i, row in outgroupf3dat_merged.iterrows():
    plt.annotate(row["B"], (row["F3_Han"], row["F3_MA1"]))
plt.xlabel("F3(Test, Han; Mbuti)");
plt.ylabel("F3(Test, MA1; Mbuti)");
In [ ]: