Display system info

In [6]:
!system_profiler SPSoftwareDataType
Software:

    System Software Overview:

      System Version: OS X 10.9.5 (13F34)
      Kernel Version: Darwin 13.4.0
      Boot Volume: Hummingbird
      Boot Mode: Normal
      Computer Name: hummingbird
      User Name: Sam (Sam)
      Secure Virtual Memory: Enabled
      Time since boot: 121 days 1:26

In [7]:
cd /Volumes/Data/Sam/scratch/
/Volumes/Data/Sam/scratch

Quality trim all fastq.gz files using Trimmomatic (v0.30)

Code explanation of for loop below:

  1. %%bash specifies to use the shell for this Jupyter cell
  2. for file in /Volumes/nightingales/C_gigas/2212_lane2_[^N]* initiates a for loop to handle all files beginning with 2212_lane2_ and only those that do not have the letter "N" at that position in the file name.
  3. do tells the for loop what to do with each of the files.
  4. newname=${file##*/} takes the value of the $file variable (which is /Volumes/nightingales/C_gigas/2212_lane2_[^N]*) and trims the longest match from the beginning of the pattern (the pattern is */; the ## is a bash command to specifiy how to trim). The resulting output (which is just the file name without the full path) is then stored in the newname variable.
  5. This line initiates Trimmomatic and uses the following arguments to specify order of execution:
    1. single end reads (SE)
    2. number of threads (-threads 16),
    3. type of quality score (-phred33),
    4. input file location ("$file"),
    5. output file name/location (/Volumes/Data/Sam/scratch/20140521_trimmed_$newname),
    6. single end Illumina TruSeq adaptor trimming (ILLUMINACLIP:/usr/local/bioinformatics/Trimmomatic-0.30/adapters/TruSeq3-SE.fa:2:30:10); uses fasta file with adaptor sequences; came with program,
    7. trim read lengths to set length by trimming from end of read (CROP:90); removes last 10 bases
    8. cut number of bases at beginning of read (HEADCROP:39)
    9. cut number of bases at beginning of read if below quality threshold (LEADING:3)
    10. cut number of bases at end of read if below quality threshold (TRAILING:3)
    11. cut if average quality within 4 base window falls below 15 (SLIDINGWINDOW:4:15)
  6. done closes for loop.
In [33]:
%%bash
for file in /Volumes/nightingales/C_gigas/2212_lane2_[^N]*
do
newname=${file##*/} 
java -jar /usr/local/bioinformatics/Trimmomatic-0.30/trimmomatic-0.30.jar \
SE \
-threads 16 \
-phred33 "$file" \
/Volumes/Data/Sam/scratch/20150521_trimmed_$newname \
ILLUMINACLIP:/usr/local/bioinformatics/Trimmomatic-0.30/adapters/TruSeq3-SE.fa:2:30:10 \
CROP:90 \
HEADCROP:39 \
LEADING:3 \
TRAILING:3 \
SLIDINGWINDOW:4:15;
done
TrimmomaticSE: Started with arguments: -threads 16 -phred33 /Volumes/nightingales/C_gigas/2212_lane2_CTTGTA_L002_R1_001.fastq.gz /Volumes/Data/Sam/scratch/20150521_trimmed_2212_lane2_CTTGTA_L002_R1_001.fastq.gz ILLUMINACLIP:/usr/local/bioinformatics/Trimmomatic-0.30/adapters/TruSeq3-SE.fa:2:30:10 CROP:90 HEADCROP:39 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 16000000 Surviving: 15796545 (98.73%) Dropped: 203455 (1.27%)
TrimmomaticSE: Completed successfully
TrimmomaticSE: Started with arguments: -threads 16 -phred33 /Volumes/nightingales/C_gigas/2212_lane2_CTTGTA_L002_R1_002.fastq.gz /Volumes/Data/Sam/scratch/20150521_trimmed_2212_lane2_CTTGTA_L002_R1_002.fastq.gz ILLUMINACLIP:/usr/local/bioinformatics/Trimmomatic-0.30/adapters/TruSeq3-SE.fa:2:30:10 CROP:90 HEADCROP:39 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 16000000 Surviving: 15793607 (98.71%) Dropped: 206393 (1.29%)
TrimmomaticSE: Completed successfully
TrimmomaticSE: Started with arguments: -threads 16 -phred33 /Volumes/nightingales/C_gigas/2212_lane2_CTTGTA_L002_R1_003.fastq.gz /Volumes/Data/Sam/scratch/20150521_trimmed_2212_lane2_CTTGTA_L002_R1_003.fastq.gz ILLUMINACLIP:/usr/local/bioinformatics/Trimmomatic-0.30/adapters/TruSeq3-SE.fa:2:30:10 CROP:90 HEADCROP:39 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 16000000 Surviving: 15784607 (98.65%) Dropped: 215393 (1.35%)
TrimmomaticSE: Completed successfully
TrimmomaticSE: Started with arguments: -threads 16 -phred33 /Volumes/nightingales/C_gigas/2212_lane2_CTTGTA_L002_R1_004.fastq.gz /Volumes/Data/Sam/scratch/20150521_trimmed_2212_lane2_CTTGTA_L002_R1_004.fastq.gz ILLUMINACLIP:/usr/local/bioinformatics/Trimmomatic-0.30/adapters/TruSeq3-SE.fa:2:30:10 CROP:90 HEADCROP:39 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 10634369 Surviving: 10493068 (98.67%) Dropped: 141301 (1.33%)
TrimmomaticSE: Completed successfully
TrimmomaticSE: Started with arguments: -threads 16 -phred33 /Volumes/nightingales/C_gigas/2212_lane2_GCCAAT_L002_R1_001.fastq.gz /Volumes/Data/Sam/scratch/20150521_trimmed_2212_lane2_GCCAAT_L002_R1_001.fastq.gz ILLUMINACLIP:/usr/local/bioinformatics/Trimmomatic-0.30/adapters/TruSeq3-SE.fa:2:30:10 CROP:90 HEADCROP:39 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 16000000 Surviving: 15797775 (98.74%) Dropped: 202225 (1.26%)
TrimmomaticSE: Completed successfully
TrimmomaticSE: Started with arguments: -threads 16 -phred33 /Volumes/nightingales/C_gigas/2212_lane2_GCCAAT_L002_R1_002.fastq.gz /Volumes/Data/Sam/scratch/20150521_trimmed_2212_lane2_GCCAAT_L002_R1_002.fastq.gz ILLUMINACLIP:/usr/local/bioinformatics/Trimmomatic-0.30/adapters/TruSeq3-SE.fa:2:30:10 CROP:90 HEADCROP:39 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 16000000 Surviving: 15794884 (98.72%) Dropped: 205116 (1.28%)
TrimmomaticSE: Completed successfully
TrimmomaticSE: Started with arguments: -threads 16 -phred33 /Volumes/nightingales/C_gigas/2212_lane2_GCCAAT_L002_R1_003.fastq.gz /Volumes/Data/Sam/scratch/20150521_trimmed_2212_lane2_GCCAAT_L002_R1_003.fastq.gz ILLUMINACLIP:/usr/local/bioinformatics/Trimmomatic-0.30/adapters/TruSeq3-SE.fa:2:30:10 CROP:90 HEADCROP:39 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 16000000 Surviving: 15797804 (98.74%) Dropped: 202196 (1.26%)
TrimmomaticSE: Completed successfully
TrimmomaticSE: Started with arguments: -threads 16 -phred33 /Volumes/nightingales/C_gigas/2212_lane2_GCCAAT_L002_R1_004.fastq.gz /Volumes/Data/Sam/scratch/20150521_trimmed_2212_lane2_GCCAAT_L002_R1_004.fastq.gz ILLUMINACLIP:/usr/local/bioinformatics/Trimmomatic-0.30/adapters/TruSeq3-SE.fa:2:30:10 CROP:90 HEADCROP:39 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 16000000 Surviving: 15787414 (98.67%) Dropped: 212586 (1.33%)
TrimmomaticSE: Completed successfully
TrimmomaticSE: Started with arguments: -threads 16 -phred33 /Volumes/nightingales/C_gigas/2212_lane2_GCCAAT_L002_R1_005.fastq.gz /Volumes/Data/Sam/scratch/20150521_trimmed_2212_lane2_GCCAAT_L002_R1_005.fastq.gz ILLUMINACLIP:/usr/local/bioinformatics/Trimmomatic-0.30/adapters/TruSeq3-SE.fa:2:30:10 CROP:90 HEADCROP:39 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 16000000 Surviving: 15789953 (98.69%) Dropped: 210047 (1.31%)
TrimmomaticSE: Completed successfully
TrimmomaticSE: Started with arguments: -threads 16 -phred33 /Volumes/nightingales/C_gigas/2212_lane2_GCCAAT_L002_R1_006.fastq.gz /Volumes/Data/Sam/scratch/20150521_trimmed_2212_lane2_GCCAAT_L002_R1_006.fastq.gz ILLUMINACLIP:/usr/local/bioinformatics/Trimmomatic-0.30/adapters/TruSeq3-SE.fa:2:30:10 CROP:90 HEADCROP:39 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 255678 Surviving: 250209 (97.86%) Dropped: 5469 (2.14%)
TrimmomaticSE: Completed successfully

Concatenate two groups of sequences into single file

400ppm (control) sequences - Index GCCAAT

In [34]:
%%bash
#gunzips all matching files in folder and appends the data to a single file:
#201500521_trimmed_2212_lane2_400ppm_GCCAAT.fastq
for file in 20150521_trimmed_2212_lane2_G*
do
gunzip -c "$file"  >> 20150521_trimmed_2212_lane2_400ppm_GCCAAT.fastq
done
In [35]:
%%bash
#Gzip file
gzip 20150521_trimmed_2212_lane2_400ppm_GCCAAT.fastq

1000ppm (acidification) sequences - Index CTTGTA

In [36]:
%%bash
#gunzips all matching files in folder and appends the data to a single file:
#20150521_trimmed_2212_lane2_1000ppm_CTTGTA.fastq
for file in 20150521_trimmed_2212_lane2_C*
do
gunzip -c "$file" >> 20150521_trimmed_2212_lane2_1000ppm_CTTGTA.fastq
done
In [37]:
%%bash
#Gzip file
gzip 20150521_trimmed_2212_lane2_1000ppm_CTTGTA.fastq

FASTQC on concatenated files using FASTQC (v0.11.2)

In [38]:
%%bash
for file in /Volumes/Data/Sam/scratch/20150521_*[e2]_[14]*.gz; do fastqc "$file" --outdir=/Volumes/Eagle/Arabidopsis/; done
Analysis complete for 20150521_trimmed_2212_lane2_1000ppm_CTTGTA.fastq.gz
Analysis complete for 20150521_trimmed_2212_lane2_400ppm_GCCAAT.fastq.gz
Started analysis of 20150521_trimmed_2212_lane2_1000ppm_CTTGTA.fastq.gz
Approx 5% complete for 20150521_trimmed_2212_lane2_1000ppm_CTTGTA.fastq.gz
Approx 10% complete for 20150521_trimmed_2212_lane2_1000ppm_CTTGTA.fastq.gz
Approx 15% complete for 20150521_trimmed_2212_lane2_1000ppm_CTTGTA.fastq.gz
Approx 20% complete for 20150521_trimmed_2212_lane2_1000ppm_CTTGTA.fastq.gz
Approx 25% complete for 20150521_trimmed_2212_lane2_1000ppm_CTTGTA.fastq.gz
Approx 30% complete for 20150521_trimmed_2212_lane2_1000ppm_CTTGTA.fastq.gz
Approx 35% complete for 20150521_trimmed_2212_lane2_1000ppm_CTTGTA.fastq.gz
Approx 40% complete for 20150521_trimmed_2212_lane2_1000ppm_CTTGTA.fastq.gz
Approx 45% complete for 20150521_trimmed_2212_lane2_1000ppm_CTTGTA.fastq.gz
Approx 50% complete for 20150521_trimmed_2212_lane2_1000ppm_CTTGTA.fastq.gz
Approx 55% complete for 20150521_trimmed_2212_lane2_1000ppm_CTTGTA.fastq.gz
Approx 60% complete for 20150521_trimmed_2212_lane2_1000ppm_CTTGTA.fastq.gz
Approx 65% complete for 20150521_trimmed_2212_lane2_1000ppm_CTTGTA.fastq.gz
Approx 70% complete for 20150521_trimmed_2212_lane2_1000ppm_CTTGTA.fastq.gz
Approx 75% complete for 20150521_trimmed_2212_lane2_1000ppm_CTTGTA.fastq.gz
Approx 80% complete for 20150521_trimmed_2212_lane2_1000ppm_CTTGTA.fastq.gz
Approx 85% complete for 20150521_trimmed_2212_lane2_1000ppm_CTTGTA.fastq.gz
Approx 90% complete for 20150521_trimmed_2212_lane2_1000ppm_CTTGTA.fastq.gz
Approx 95% complete for 20150521_trimmed_2212_lane2_1000ppm_CTTGTA.fastq.gz
Started analysis of 20150521_trimmed_2212_lane2_400ppm_GCCAAT.fastq.gz
Approx 5% complete for 20150521_trimmed_2212_lane2_400ppm_GCCAAT.fastq.gz
Approx 10% complete for 20150521_trimmed_2212_lane2_400ppm_GCCAAT.fastq.gz
Approx 15% complete for 20150521_trimmed_2212_lane2_400ppm_GCCAAT.fastq.gz
Approx 20% complete for 20150521_trimmed_2212_lane2_400ppm_GCCAAT.fastq.gz
Approx 25% complete for 20150521_trimmed_2212_lane2_400ppm_GCCAAT.fastq.gz
Approx 30% complete for 20150521_trimmed_2212_lane2_400ppm_GCCAAT.fastq.gz
Approx 35% complete for 20150521_trimmed_2212_lane2_400ppm_GCCAAT.fastq.gz
Approx 40% complete for 20150521_trimmed_2212_lane2_400ppm_GCCAAT.fastq.gz
Approx 45% complete for 20150521_trimmed_2212_lane2_400ppm_GCCAAT.fastq.gz
Approx 50% complete for 20150521_trimmed_2212_lane2_400ppm_GCCAAT.fastq.gz
Approx 55% complete for 20150521_trimmed_2212_lane2_400ppm_GCCAAT.fastq.gz
Approx 60% complete for 20150521_trimmed_2212_lane2_400ppm_GCCAAT.fastq.gz
Approx 65% complete for 20150521_trimmed_2212_lane2_400ppm_GCCAAT.fastq.gz
Approx 70% complete for 20150521_trimmed_2212_lane2_400ppm_GCCAAT.fastq.gz
Approx 75% complete for 20150521_trimmed_2212_lane2_400ppm_GCCAAT.fastq.gz
Approx 80% complete for 20150521_trimmed_2212_lane2_400ppm_GCCAAT.fastq.gz
Approx 85% complete for 20150521_trimmed_2212_lane2_400ppm_GCCAAT.fastq.gz
Approx 90% complete for 20150521_trimmed_2212_lane2_400ppm_GCCAAT.fastq.gz
Approx 95% complete for 20150521_trimmed_2212_lane2_400ppm_GCCAAT.fastq.gz

Copy files to Eagle for web-based access

In [39]:
%%bash
for file in 2015*e2_[14]*; do cp "$file" /Volumes/Eagle/Arabidopsis/; done