Simple Commands¶

We first check which directory we are in, using the pwd (=Present Working Directory) command:

In [2]:

pwd

/home/stephan

OK, se that's indeed our home folder. We can list the contents of that folder:

In [1]:

ls

bashnb_test.ipynb  rnb_test.ipynb	  test.txt
pynb_test.ipynb    setting_up_bash.ipynb

We can now create a new directory:

In [4]:

mkdir testDir

and change into that directory:

In [5]:

cd testDir

and confirm that we are now in the new dir:

In [6]:

pwd

/home/stephan/testDir

and playing with echo:

In [7]:

echo "Hello, how are you?"

Hello, how are you?

OK, so let's try some more useful things with grep:

In [8]:

grep French /data/pca/genotypes_small.ind

           HGDP00511 M     French
           HGDP00512 M     French
           HGDP00513 F     French
           HGDP00514 F     French
           HGDP00515 M     French
           HGDP00516 F     French
           HGDP00517 F     French
           HGDP00518 M     French
           HGDP00519 M     French
           HGDP00522 M     French
           HGDP00523 F     French
           HGDP00524 F     French
           HGDP00525 M     French
           HGDP00526 F     French
           HGDP00527 F     French
           HGDP00528 M     French
           HGDP00529 F     French
           HGDP00531 F     French
           HGDP00533 M     French
           HGDP00534 F     French
           HGDP00535 F     French
           HGDP00536 F     French
           HGDP00537 F     French
           HGDP00538 M     French
           HGDP00539 F     French
     SouthFrench3326 M     French
     SouthFrench3947 M     French
     SouthFrench1323 M     French
     SouthFrench3951 M     French
     SouthFrench3068 M     French
     SouthFrench1112 M     French
     SouthFrench4018 M     French

Alright, so that lists all French individuals. Now let's count them:

In [9]:

grep -c French /data/pca/genotypes_small.ind

Pipes¶

Let's look at the structure of our ind file:

In [10]:

head /data/pca/genotypes_small.ind

             Yuk_009 M    Yukagir
             Yuk_025 F    Yukagir
             Yuk_022 F    Yukagir
             Yuk_020 F    Yukagir
               MC_40 M    Chukchi
             Yuk_024 F    Yukagir
             Yuk_023 F    Yukagir
               MC_16 M    Chukchi
               MC_15 F    Chukchi
               MC_18 M    Chukchi

Let's filter out the population column:

In [13]:

head /data/pca/genotypes_small.ind | awk '{print $3}'

Yukagir
Yukagir
Yukagir
Yukagir
Chukchi
Yukagir
Yukagir
Chukchi
Chukchi
Chukchi

Let's sort it (notice we now use cat instead of head, but use head in the end:

In [16]:

cat /data/pca/genotypes_small.ind | awk '{print $3}' | sort | head

Abkhasian
Abkhasian
Abkhasian
Abkhasian
Abkhasian
Abkhasian
Abkhasian
Abkhasian
Abkhasian
Adygei
sort: fflush fehlgeschlagen: Standardausgabe: Datenübergabe unterbrochen (broken pipe)
sort: Schreibfehler

OK, so there are some error messages in the end because head ungracefully discards the rest of the data, but that's OK.

Now let's use uniq to get rid of population name duplicates:

In [17]:

cat /data/pca/genotypes_small.ind | awk '{print $3}' | sort | uniq | head

Abkhasian
Adygei
Albanian
Aleut
Aleut_Tlingit
Altaian
Ami
Armenian
Atayal
Balkar

And now let's count:

In [18]:

cat /data/pca/genotypes_small.ind | awk '{print $3}' | sort | uniq | wc -l

OK, so there are 116 populations in the dataset. And how many individuals?

In [19]:

wc -l /data/pca/genotypes_small.ind

1340 /data/pca/genotypes_small.ind

So 1340 individuals on 116 populations, so a bit more than 10 per population on average. Good to know!

In [ ]: