Repetitive DNA elements ("repeats") are DNA sequences prevalent in genomes, especially of higher eukaryotes. Repeats make up about 50% of the human genome and over 80% of the maize genome. Repeats can be categorized as interspersed, where similar DNA sequences are spread throughout the genome, or tandem, where similar sequences are adjacent (see Treangen and Salzberg). Some interspersed repeats are long segmental duplications, but most are relatively short transposons and retrotransposons. Though repeats are sometimes referred to as “junk,” they are involved in processes of current scientific interest, including genome expansion, speciation, and epigenetic regulation (see Fedoroff). Some are still actively expressed and duplicated, including in the human genome (see Witherspoon et al, Tyekucheva et al).
RepeatMasker is both a tool for identifying repeats in a genome sequence, and a database of repeats that have been found. The database covers some well known model species, like human, chimpanzee, gorilla, rhesus, rat, mouse, horse, cow, cat, dog, chicken, zebrafish, bee, fruitfly and roundworm. People often use RepeatMasker to remove ("mask out") repetitive sequences from the genome so that they can be ignored (or otherwise treated specially) in later analyses, though that's not our goal here.
It's intructive to click on some of the species listed in the database and examine the associated bar and pie charts describing their repeat content. For example, note the differences between the bar charts for human and mouse, especially for SINE/Alu and LINE/L1.
import urllib.request
rm_site = 'http://www.repeatmasker.org'
fn = 'ce10.fa.out.gz'
url = '%s/genomes/ce10/RepeatMasker-rm405-db20140131/%s' % (rm_site, fn)
urllib.request.urlretrieve(url, fn)
('ce10.fa.out.gz', <http.client.HTTPMessage at 0x7ff3accac278>)
import gzip
import itertools
fh = gzip.open(fn, 'rt')
for ln in itertools.islice(fh, 10):
print(ln, end='')
SW perc perc perc query position in query matching repeat position in repeat score div. del. ins. sequence begin end (left) repeat class/family begin end (left) ID 508 0.0 0.0 0.0 chrI 1 432 (15071991) + (GCCTAA)n Simple_repeat 1 432 (0) 1 1226 10.0 0.0 0.0 chrI 566 595 (15071828) + (GCCTAA)n Simple_repeat 1 41 (240) 2 344 22.2 0.0 0.0 chrI 596 676 (15071747) C RCS5 Satellite (41) 1387 1307 3 1226 10.0 0.0 0.0 chrI 677 846 (15071577) + (GCCTAA)n Simple_repeat 42 281 (0) 2 432 21.9 2.4 0.0 chrI 1622 1744 (15070679) + LONGPAL1 DNA/MULE-MuDR 136 261 (2330) 4 8509 0.6 0.0 0.1 chrI 2052 3026 (15069397) + PALTTTAAA3 DNA 1 974 (529) 5 4521 1.1 0.2 0.2 chrI 3124 3652 (15068771) + PALTTTAAA3 DNA 974 1502 (1) 6
Above are the first several lines of the .out.gz
file for the roundworm (C. elegans). The columns have headers, which are somewhat helpful. More detail is available in the RepeatMasker documentation under "How to read the results". (Note that in addition to the 14 fields descrived in the documentation, there's also a 15th ID
field.)
Here's an extremely simple class that parses a line from these files and stores the individual values in its fields:
class Repeat(object):
def __init__(self, ln):
# parse fields
(self.swsc, self.pctdiv, self.pctdel, self.pctins, self.refid,
self.ref_i, self.ref_f, self.ref_remain, self.orient, self.rep_nm,
self.rep_cl, self.rep_prior, self.rep_i, self.rep_f, self.unk) = ln.split()
# int-ize the reference coordinates
self.ref_i, self.ref_f = int(self.ref_i), int(self.ref_f)
We can parse a file into a list of Repeat objects:
def parse_repeat_masker_db(fn):
reps = []
with gzip.open(fn) if fn.endswith('.gz') else open(fn) as fh:
fh.readline() # skip header
fh.readline() # skip header
fh.readline() # skip header
while True:
ln = fh.readline()
if len(ln) == 0:
break
reps.append(Repeat(ln.decode('UTF8')))
return reps
reps = parse_repeat_masker_db('ce10.fa.out.gz')
Now let's obtain the genome for the roundworm in FASTA format. For more information on FASTA, see the FASTA notebook. As seen above, the name of the genome assembly used by RepeatMasker is ce10
. We can get it from the UCSC server. It's around 30 MB.
ucsc_site = 'http://hgdownload.cse.ucsc.edu/goldenPath'
fn = 'chromFa.tar.gz'
urllib.request.urlretrieve("%s/ce10/bigZips/%s" % (ucsc_site, fn), fn)
('chromFa.tar.gz', <http.client.HTTPMessage at 0x7ff38f4ac518>)
!tar zxvf chromFa.tar.gz
chrI.fa chrII.fa chrIII.fa chrIV.fa chrM.fa chrV.fa chrX.fa
Let's load chromosome I into a string so that we can see the sequences of the repeats.
from collections import defaultdict
def parse_fasta(fns):
ret = defaultdict(list)
for fn in fns:
with open(fn, 'rt') as fh:
for ln in fh:
if ln[0] == '>':
name = ln[1:].rstrip()
else:
ret[name].append(ln.rstrip())
for k, v in ret.items():
ret[k] = ''.join(v)
return ret
genome = parse_fasta(['chrI.fa', 'chrII.fa', 'chrIII.fa', 'chrIV.fa', 'chrM.fa', 'chrV.fa', 'chrX.fa'])
genome['chrI'][:1000] # printing just the first 1K nucleotides
'gcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaaAAAATTGAGATAAGAAAACATTTTACTTTTTCAAAATTGTTTTCATGCTAAATTCAAAACGTTTTTTTTTTAGTGAAGCTTCTAGATATTTGGCGGGTACCTCTAATTTTGCCTGCCTGCCAACCTATATGCTCCTGTGTTtaggcctaatactaagcctaagcctaagcctaatactaagcctaagcctaagactaagcctaatactaagcctaagcctaagactaagcctaagactaagcctaagactaagcctaatactaagcctaagcctaagactaagcctaagcctaatactaagcctaagcctaagactaagcctaatactaagcctaagcctaagactaagcctaagactaagcctaagactaagcctaatactaagcctaagcctaagactaagcctaagcctaaAAGAATATGGTAGCTACAGAAACGGTAGTACACTCTTCTGAAAATACAAAAAATTTGCAATTTTTATAGCTAGGGCACTTTTTGTCTGCCCAAATATAGGCAACCAAAAATAATTGCCAAGTTTTTAATGATTTGTTGCATATTGAAAAAAACA'
Note the combination of lowercase and uppercase. Actually, that relates to our discussion here. The lowercase stretches are repeats! The UCSC genome sequences use the lowercase/uppercase distinction to make it clear where the repeats are -- and they know this because they ran RepeatMasker on the genome beforehand. In this case, the two repeats you can see are both simple hexamer repeats. Also, note that their position in the genome corresponds to the first two rows of the RepeatMasker database that we printed above.
We write a function that, given a Repeat and given a dictionary containing the sequences of all the chromosomes in the genome, outputs each repeat string.
def extract_repeat(rep, genome):
assert rep.refid in genome
return genome[rep.refid][rep.ref_i-1:rep.ref_f]
extract_repeat(reps[0], genome)
'gcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaa'
extract_repeat(reps[1], genome)
'CCTGTGTTtaggcctaatactaagcctaag'
extract_repeat(reps[2], genome)
'cctaagcctaatactaagcctaagcctaagactaagcctaatactaagcctaagcctaagactaagcctaagactaagcct'
Let's specifically try to extract a repeat from the DNA/CMC-Chapaev family.
chapaevs = filter(lambda x: 'DNA/CMC-Chapaev' == x.rep_cl, reps)
[extract_repeat(chapaev, genome) for chapaev in chapaevs]
['cacggcccggcggggggtacatggatgagaattctctaccgtattccaatttggctgactgcgtgctcaacgttgaatactcagtgtaaactttcgtacaccgttgcgtactgcacagcgcgcattttaattgacgacatttagcaaaaattgaacataagatttttcggaattatgaagctcaattttcacaaaaataatgagttttttgtagaatttatgaaaaaacgtgaatatatagattttttgttcatgatattcaagaaaaagcgatttttagttcttcacagaggaatcctctcgcatttcacttgctcatgatgttttttgctccactttaggacgataaaaatgcgaattgttgataaaatgaatgaataatataaaaa', 'ggggctgctgaaaccaatgtcggcatgatgagagttccggtcttctgaatccatttcctgcgtgggctgtggcgacgagctgcacgtctgaaaatcaagtttttgtaatt', 'tttgggcgcatgatatggagctgaatcattcgattttagaatcagcatgcttttattcatattttaggatctttttaaaaaatctggaccaacagttttcgaaaaaatttaatttttgttcagaaatgtgaatattcactaaatcgaaaaaaataattgcaaaatccgtcagctgaacattcaaaacttatcaatttgaaatcagcatatttcagtgtataattaaaaaagtttcaaaaattctgagaccaatttttattgagaaaaataatttttcgctcgaattattgaattttcactaaatgcaaaaaacagtaaacttgggcccatgctacaagcctgaatctttcaaattaagaaccagcatgattttttcaatattctaggacgtttaaaaaaaatctggaccaacagtttttgaggaacgtaattttttatacaaaaatgttctgatttttcactaaactcaaaaaaatagtcaagttgggcccatgctgtacacctaaatcattaaaattcagaaccgccatgtattttttcttaccaaaggctctttaaaaaaaatctggaccaacagtttttgagatatttagaaaaacaactcacttttcgacgtttttcgccttttcgtggctcacccggttgatttttgcggcgatttgtggtctttcgctgaaaatattatttttatttcaattattaacgaagaaaacaagaaaaaacgacgagaaaacatcaaaaaaacgcgaaaaaacatcgaaaaaccaccgcaacctcatgaacaaaaaaaaagcattgcagccgcgggactagttttcgcaactttctaggccatgtcccgttcgccgtgccgtg', 'cacggcccggcggggggtacatggacgagaattctctaccgtattccaatttggctgactgcgtgctcaacgttgaatactcagtttaaagtttcgtacaccgttgcgtactgcacagcgcgcattttaattgacgaaatttcgcgaaaattaacagaagatttttttcggaattatagagctgaaattgaaaaaaaaactatcaaattttcatcgaatttgtgaaaaatcgtaagtatgaagatcttttcttcactatattcaaggaaaatcgatatttcgcttttcacagacgaatgatgtctcattttactcgatgaaagtttctgatgagctgtttttatcgatttttgagcgataaaaatgcgatttgttgataaaatggatcaattatataaagaaacaacatatattgctctgagattactttttgagaatcaattctttatttttcggtcattttaaattaagcattaaaataaaaatattagaaatcataataaaaaaaacagaaaatcgatatattactttttcttcggaatttcacgacttttttggacgaattttattctgtaaactttcttcttcgaatttgtgtccacgtggctttcagtcgaagaagattctgcagcactccttcttgcttgcccacaacttactcgaattttctaaaatttttaacttattgaaattgtcatttcacctttacactcacttcagctaaactattactgcatttcggaagttgataggatactggtggagcaacaagtggatggcttctagtgattggctggcttgtcgagcaagtttgtgtgattgcctgaaataatttttgatttcaattttgagttgatttaaagcagtgaacctaccaccgggttcggacgagaaagagcattactcggtagaccacggaatccaattttcgttgaattgcctccaaatgcaatagaagtttgtacgttttgtgagaagtcgggctgaaaattttcaaaatttgaaacttttcgagaaaaataaaaatctcaccacagcatttcgagattttgtcgattgtggaagccttttcctggagcgaaaattgattttttttttcgctaaattttttcttttttgggcagccgtgacgtcccgaataactgcttttgggtcccgaagatcattttgcgaagaaattggcagaactgttgcatcttttggtacgatggaaagaccgggaatggacgtgttctgaaatagttgtgtttttaagaatgcagaaatgtttttctgtaccaaaattaccatagtcatgtcattcatgatgttacgacacatgagctctctcagaacatggatgtaacgccttttcttgtcccggtaattgcaaaatctcctctcaagtgcattgaaaatcgcgtggacagattcaactccttgttctgtgatccttccaatgtttctcacatcttttgccatttgtggtgcatggtagaccaacaagtgcagctttaaaataattgtttcttcgggaaccgctactttcaaatcctccacaaatccgcgaatcgaattttgaagtattaagacgtcggaatcatttaaaaacttgtttcccgaaagtgacataatagttgaaagctttcccattgctgatttcaatccgagcaacattgggcataaatttgggccaaaaatgttgaaagtctcctctacaacagccggcgttagcagcaatttcaaatggtttccgcaaaatgattggaaccaagcctgcttgtccgctccaaacttagcccaacactgtcccattttttcaagtgttccttcgggagtaccattcacaattgtatcgagcaacaatttttccgattgaagtgctttcagttcagcatgcgactccaatttcatctttccggtggctccttgatacttttcttccgcacttttaattaggttaacagcgttttttagagttgcttttcgtgttttcaggataggaaaagaagtagtgttatccaaagtatcagaatatttccagaggggattgaagatatatttgtcaaaaatacccatgataatgtgcagaagaggaatcaaatagaacatgatcgcaacgtgtggcagaagtggagtacatcctttgcgaacacccaagtcgccattttcacaacaagctttgtaaagatcgattgttcgtgggtggaatgtttcatcaacattcatatccttgattttcatcctctcttcagctccccgtggattctgtgcaaaacatttgaagcagaaattgtgggatgaatgtccttggtgtccaagaatatcagattgaaacttgcaatctccagttgcaatttgcacaatttttgcggttttttgaactcctttgtccaaatatcaaattttcgttagcttgccaagctgctcaagaacgtccggaatgaattttttcagagacgaataattgtcggatccgtcatatactgcaattaccataacgtgtctcgaagaattcggtcgagatacgtttccgattaccaatgccaactttgtgcttccacctccagcgtcaccaacgactccaatcttgattactcctttcgtgtatccgtcgtccacaaattgatttgaattgcatagaagctctattcgataggctaaaacttctgcaattttcatgcactgcacaatggtaatcacttttcctttattgtcgaacgaagtggaaactttgaaactggagatcattgataactggattgacaaatctcttgtgttctttaccgatggaagcaaatcatagccaatggcattagtcaaatagtttttgattttttccatctgacttagagataatccgcattttgataaaaagtcaacggcctcaaagtttgaaagcttgtttttgtagctttgattctcttctgaattcaggaattttgtgaattttcgaataaattgtccgacgtcatcctcgaggcagatttcgtgttgaagcaagtgaagagctttgcgaaatcgatttttgatacaacttttgcttcttagattcgaaatattaactttaaaagctgattttttaaggttttcaacttcttcggcgtgtctttgtagactcagaaccatagctttgccacttttcttcacatctgcacagcttctcaccaatcgaccttctataccactgacgatcgttcgtatattgcatacttccatttgcagcgaagaattagatgctcttatagtgatattttcatggcggactatttgcatttcttccgaaaacaccgcaaactcatcaatccgcttttgtatttcttctgatatttcatttttttcatttttcagtcgttcgatcgttagtcggagcattttgatctgcggaatttgctcaacattggagattattcgaaccctcggtgtactgaacgagtttcgtaaaggtgtcggtggaaatacgggattggagaatctcagcaaaatcatataatattagttttgaaatattgaaaaaaattacattgtgagaaaaagtcggaatttcgtcactaaaatccatttccacgtctctcgtcagaattccttcatccatattgaaacaatttgacgacctgcatgtagttgcggagctactggaagcaatgtcgggatggtgggagtttcgatcttctgaactgatttcctgattagcctgtggcgacgagctgcacgtctgaaaatcacgtttttgaagttagaacaaactactccaacttaattaaagttgacaaaattgagctgaacgaacctccactttcgaattgttcagttcttcctcttcagtttgatcttttgaaactccattagcactgttccttgctctctgggcatttgctaaaagaaggcctgcacaagatttttcttttcttttttgtttgaagtatacttttgtcatctggaaatattgcatgaatattataagggaaacaatttttaaatatcgattttcacgaaatttgaaaaaatcaataatttgggcgcatgatattgagctgaatgtttcgaatttagaatcagcatgcttttattcatattttaggatctttttaaaaaatctggaccaacagtttttgaaaaaaaaatacttttcgttcagaaatgtactgattttccactgattttcacgaaatttgaaaaaatcaataatttaggcgcatgatattgagctgaatgttttgaatttagaatcagcatgcttttattcatattttaggatctttttaaaaaatctggaccaacagttttcgaaaaaattcaatttttgttcagaaatgtgaatattcactaaatcgaaaaaaataattgcaaaatccgtcggctgaacattcaaaacttatcaatttgaaatcagcatatttcagtgtataattaaaaaaggtttcaaaaattctgagaccaatttttgttgagaaaaataatttttcgttcgaattatcgatttttcacgaaatgccaaaaacagtaaacttgggcccatgctaaaagcctgaatctttcaaattaaaaaccagcatgattttttctatattctaagacgtttaaaaaaaatctggaccaacagttcttgaggaaagtaattttttatacaaaaatgtgctgatttttcactaaattcaaaaaaataatcaagttgggcccatgctatacacctaaatcattaaaattcagaaccgccatgtatgtattttttcataccataggctctttaaaaaaaatctggaccaacagtttttgagatatgtcaaaaaaaacaactcactttttgacgtttttcgccttttcgcggatgatgcggtcgatttttgcggcgatttgtggtctttcgctgaaaatattatttttatttcaatttttaacgaagaaaacaagaaaaaacgacgagaaaacatcaaaaaacacgaaaaaaacgtcgaaaaactcccgcaacctcatgaaaaaaaataaagcactgcagccgcgggactagttttcgcaactttctaggccatgtcccgttcgccgtgccgtg', 'acgtggctgaagaaatttctacagtagtcccatttggctgactgaatattcaacgcgaataagttttgtacactattgcgtactttgcgtacgcgcattttatttgacgacaattcgtcaatatcagc', 'aattcctaaattttttattaaaatcgaaaaaaaaaaatgaaatacgtgagattgagtttcgagacttttttattcagaatcagcatatatttctccatatttgagtaggttttcagaaatattgtaccataatttttggaaaaatgtaatttttaattcgaaattgcactgaatttctcgaatttttcactaaaatcgagaaaataaatatgaaatacgcgagattgaggttcaagactttttaattcggaatcagcatatatttttccatatttgagtagattttcagaaatattgtaccataatttttcgagatattttgaataataacttacttttcgacgttttttgcctttgtccggtttaatccatcgaatttcgaagcggtttgcgtagattagctgaaaacattatgcttattccacgtagtaacaagaaaaaacaagaaaaaataagaaaaaacgaagaaaa', 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcagttaata', 'cacggtatcacaaaaactagatctctcgtaaaatttgagaaagatctcgcaggtacgcagtgaaatggtccgcaatgtgtcatcgcggtgtttgcgtacttgcgtaccgtagtccgcaaaacattgcagcggcaaatagatttttgaagcaaattttagcagaaaaaaggcagaattaatagtttcaaggtgaaaaaaaaaataaaaaatagattttattattcaatttagtagctaacaattgagaattgaattattgaaacacagaaaaattaattttgaacagtaaaacaaagaaataatcgaaacggaagattgaaaattgaaaaatacatacaaatcgattcaaaaaacgaatttaatatgtggaatcagcctccttcttcgctttacggatgcaagttgaaatttttcttggtaaaactttcgaaaataaggaaacccggtgaaatttcttaattgcctcgactcctggtgaactttttgagataaatccatcgaaattttgctgagtagctcattgacaaggtttcaatctgaaatttcataaaatcaattattttcgaataattttaatcccaacaaaccgaaggaaatcctgaattttagcttttcgatagaatcctaaggtgtcacatcgacaattttccaagttgaaagaaaccatgtggagcatctccgcttataatatcctcgacaaaatattgatgagaaatccatcaaatttgcaccgagtatctgcttgtcaatgtttttacctgaaatttaatgaaattaataattttttaataattttaagcacaactaaccatagaaaatcccgaattttatgatattgaggaaatcccgaaattttaattgcgaaaaggttcagaattgtgtgagggagagcgctgggtggtttattgggaagacgaggcgtcctgcaaagaaaatgttacgatttgtttttttgaaaggttccacttacttcctccatatcagaaattagattgctacatgacagctccttagcaatgtacaggtatcttttccctgtgtttctaaccgagtggaagcgtctttcgaggcgattgaaaacggcatgaagcgattcgattccttgctccgaaagccttcccaaattcctttccgtttttgcgacttctactacgtgagcaaccaggacatgcaatttttgagtgattgattcttctgggtgggcagcttgtaggaattcaacgaattccttcatcgactcatccaattcgctgatgtcatcgtcagagagcaatcgattggccgacaatgacatgattttggacagcctgctcatcgcgttcttgacattcagaagcatcggagtcatgtggtttttcagatttttgaatgcagctgtgacacctttctcagacaaaatcagcttcgtgtgatttccagtgtacatttgaaaccacgctcggcgtgttgcaccaacttcttccagatcattctcaaattgtttaagatatccaccagcaagtccatccagcgtctcgtccaataacactttttcttctttcagagcagtaaactgtgctttcatctctctttttctcttaagtggtgaagcttcgaattttttgttcgcgtcttgaatcttcaaatcggcaactctttttgtctcttctttatttcttcgaatttcaaatgatgttttattgtctaaactcacaacggccatccaaatcggctcaaagatgtatttcgtgaacagtccgacaatcaagtgtagcattgcaggcaagtagtgttccaacttgacattttgaagaattggtccacttccgcatctgacgccaaagctaccgtatttagaattgagcttgtaagaattcattgttttcaagaaatatatcttttgcaagttcagatctttaagcttttgcatcagtcctcctcttggattggtttcgaaacaaaacgggcaaaaataggtggcacattgttttttatgtgacagtaaatcacatgtaaacttaaaatcaccgactactttttggacaacgccacgtgtcaccacccttccatcttccatataggtgatgctggtgaagttgttgatcttcacgataagatcagacaggtaggccatgataagttcccgcgaatcagagtcatcaaaaacagcgagaaggacgattcggtgcggcgagttcgcatgatcacaatttccgatcagaagacaaagtttcgtcgttcctcctcccgaatctccaccaattccgataacaattttgccatcggtataggaatcatggcgtaattgttttgatacagacaaacgctccagcttcggaatgacacttttctcgacatcgataattttgacaacggtcaccttttctccttttgatatatatgttgtggccttataattgtcgatagttgacattctcgttttccgttgcatcgtcaaattaatagttggcatgatttcaagatcggtgaacatcttatagttttgcttcacccgccgcaattgattgttagagaatctacatttttcttgaaaaataatcgtttgccaatgcgtcaattggacttggaaggactgcgaatcattttgttggagatatttctggaaatcaatcatgaaattacgaatatcctcttctcgactgattcgttgaagcaaatccagtgccaaatctatcctagattttccagagaacgattccttcttcgcaactctgatatccatattattctgtactttttcgcccttgacgtaatttgagcgatttgttttcttaatatcagattttcgttggttttccgttttgatgtttttcttattgaaattatctctttcctttgcatgaaaaattcatcagaaaaaacggcaaactcgtcgtctctctccatcaaccgatttagaagtgtctcttcgtcaatgtctgttcttggattgagctggaccgaatggtttgagctggacaggatggcggttctgaaatggaaaattgaataaacaatcaacaacaaagaaataaatctacctctgatatgctatccaaaatggaatcaacttctagaaatctcattcgttctctactcaagtcttctctaattcccttcacatatggtgtctcattcgaaaacacgccgacgctgtcactcgatgaatagagtggacttgacgatctttcaaaagagtgtggtgttcttggaggtgatgaagaagattccgcaatagaagaagttgatggttgtgtgtcgaacggctggaaaaattatttttaaaatattataatcgtcttaggatccgagttgtagcatggattaattcacttacaatatcagtttcaggtgttattgtcagattcaaatcgatattgcgtttcgactccacatccagattatcatcaaaatccacaaatggagacactcgtcttctctcaggagtgatctgcgcatctccgtttctgtcaagagcttctgtcaattctacattcgacaggtttaggtccagggaagcatttctttcagtatttgaattttcgtcttcttcatcttttttccatcgtcgtttccgagcatttagaagatttgaagaccatgctgattgatttgatggaggaggcatctgaaaaaaaaattttttactcaattttcagtgaaaaaaatttacttttggaaaattttaagtgaaaaatgtacgatttgtgtatatgtttgcctttatcttgtagagaatttttagctgattctaattcggcaataaaaataaaacattttgtgatttttttcgaaaaaaatcatttttcctccattttcagtgcaaaattttagtttcgaaaaaattttcttgaagaatgtacgatttctaaaaaattctctctgtatcttgcttaaaattaacagttctttctctttgaagcaaaaaactaacacattttgtgatttttttggcaaaaaaaattattttataaattcttattacaaaaaaaaattttttcgaaaaaattttaatgaaaaattaagatttctctatgaattgagttccatcttatacacaattaaattttgatcataaaacaactataaatcgtaaagagtttttgattttcttgaaaaagggaacgtttttaaaaactattttcagtgaaaaaaatttacttttggaaaattttaagtgaaaaatgtacgattcgtgtatatgtttgcctttatcttgtagagattttttagctgattctaattcggcaataaaaatataacattttgtgatttttttcgaaaaaaatcatttttcctccattttcagtgcaaaattttagtttcgaaaaaaaaattaatgaagaaggtacgatttctaaaaaattcggcccgtatcttgttcagaatttttagctctctctttttcaatcaaaaaaataaaatattttgtgattttttaagagaactcgtcaaaaaactcacttaattcgttttcctttctgcccacgccgacgctcctttgttttcctcgaattttttcgcttactacctgaaacaagacacactttttcgattttacaaaaaaaaataacaaaaaagatacggaaaaacggttaaaatcggcaaaaaacggagagaatcgatgccgagtgaaaggcttgaaatttaaacaattgttcgcaatagagcgtgtttgcctccatctagagattgaaccaccgtg', 'tgctgaaaattgctgaaaatcgaaatttcgtcagctgatgtcgattattctgcgcgggggtacggtacgcaagtccgcaaacactgtcacgccaaattgcgga', 'cacggtagcacagaaactagatctctcgtaaaatttgagaaagatctcgcaggtacgcagcgaaatggtccgcaatgtgtctcgcggtgtttgcgtacttgcgtaccgtagtccgcaaaacattgcagcggcaaatagatttttgaagcaaattttagcagaaaaaaggcagaattactagtttaaagtgaataaaattaaaaaaaaaagattttattattaaatttatcaactaacaattcaaaattaatgtattaaaacacagaaaagttgattttgaacagaaaaacggagtaatcatttaaaaagacaatattaaagtgaaaaaacacgcaaatcgattgaaaaaacgaatttaatatgtggaatcggccttctttttcgcttttcggctgcaagttggaatttttcttgggaaaaactttcgaaaatgaggaaatcagtgggaacttcttaatttcctcgactcctggtgaactttttgtgttaaatccatcgaaattgtgcggaatagctcattgacaaggtttcaatctgcaatttgatgaaattctatatttttaaattattttaatcacaacaaacctgaggaaatcccgaattcgatcctttcgataaaatccagagatgtcactttgccacttttccaaaatgaaggaaaccatgtggagcttctcagctttttgagttcctggccgaaatcatgatgaaaaactatcgaaattgacttgagcagcttcttggcaaggtttttgtctgaaattttaagattttaatgatttttgaacgtttttaacacaacgaaccaaaggaaatcctgaatttcacatttctgactgtttcctgggatgttacatcggcagttttccaaaatgaaggacatcatgtagagcatctccacttattgaaattctggtgaagttcttgccgacaaatccatcgacattacgttgaacgtcttctaggcaaggtgtttatctgaaaattcatga', 'taagcagtttttgaaaagttttcgaaaaaaaAAAGAATTTCCGTTTTTTGAGATttaattttcagtgaaaaaaatttacttttggaaaatttcaagtgaaaaatgtacgattcgtgtatatgtttgcctttatcttgtagagaatttttagctgattctaattcggcaataaaaatataacattttgtgatcgttttcgaaaaaaaaatctttttctttatttttagtgcaaaattttagtttcgataatttttctatgaagaatgtacgatttctagaaaattctgcctgtatcttgctcaaaattaacagttctttctttttaaagcaaaaaattaacacattttgtgattttttggcaaaaaaaattattttataatttcttatttcaaaaaattttttttcgaaaaaatcttaatgaaaaattaagatttctctatgaatttagttccatcttatacaaaatttaatgctgatcataaaacaactataaaatgtgaagactttttgattttcttgaaaaatggaacgtttttaaaaactgttttcagtgaaaaaaatttacttttgaaaaattgtgaaaattgaaaaattgtgaataagtgaaatatgtacgatctctcaataattttgtcttcatcttgtagagaattgttagctgtttctgattcggcaagaaaaatacaacattttgtgatcgttttcgaaaaaaaaaatttttttcttaatttttagtgcaaaattttagtttcgaaaaaaaatttatgaagaaggtacgatttctagaaaattctgctcgtatcttgttcagaatttttagctctttctttttcataccaaaaaataaaatattttgtgattttttaagagaactcgtcaaaaaactcacttaattcgttttccttgccgcccacttcgacgttcctttgtttttctcgaattttttcgcttactacttgaaacaagacacactcttttgattttacaaaaaaaaattacaaaaaagatacggaaaaacggttaaaatcggcaaaaaacggagagaatcgatgccgagcgaaaggcttgaaatttaaacgattgttcgcaatagagcgtgtttaccgccatctagcgactgaaccaccgtg', 'cacggtggttcagttgctagatgggtgcaaacgcgctccaccgaacaa', 'cacggcccggcgaaagagacgtggccgcgagagctgcgccggctaggccaccgcctcctatggttaagatttttgaacgaataaacatttttaatttggctgctaagctcatttatctttgttttttctcgttttttctcatttttatcgataaaaatatattttttgttgcagaaaatcacaaaaccgcggcaaaacagcactcaaccgccaactgggaggaggaaaatccgaaaaaagagtttttt', 'tgcgaaaaactgtttaaagtatcgattttcttcaatatcagcaacatacaatcctttaaaatgattattttttgtaaattcgataaaaattcatttatttttcacaacttctgcccgaaaattaccgaaataaccagcgtttctataactaagaaagtgtcgtcaattaaaatgccgcgtccgcaaaatgtcgtacgaaacttttcgctgagtatcaaacgttgaatattcagtcagccaaattttactacggtagagattttacagccacgtacggttcgccgggccgtg', 'ttgttcggtggagtgcgtttgcacccatctagcaactgaaccaccgtg', 'tttttcgtgtttttttatgtttttttatgttttttcgtcgttttttcttgttttcttcgttaataattgaatttaaaataatattttcagtaaaaggacttaaatcgccgcaaaaatcgaccgcgtgagccgcgaaacggcggaaaacgtctaaaagtgagttgtttt', 'aaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaactcaacacattttcgatcatttttgaacaaaaaaattgttttctgaaaaatttgacgcttaatttttttgaaactgcactcaaatatgaagaaaagcatgattaaccagaaaaagtaacatttgagacccaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacactttttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttattttttttgaaactgcactcaaatataaagaaaaacatgcttgttctgaattagtaagaattgcgactcaagctcgcgtatctcatatttatttgttcgatttcagtggaaaatcgacacatttttgggaaaatttttttttttcgaaatattcgctcctaatttttaatgatttccagatgaca', 'tctctatattattcattcattttatcaacaaacttctatcgccctaacgtcgatcaaaaaagctcatcagcaactgccgtcgagt', 'tctcgtcgagtgaaatgcgatagaatttgtctgtgaaaaaccaaatatcgattttccttgaatatcgtgaagaacaaatcttcatacttacgattcttcacaaattcgatgaaaatctgatagttttttcaattttagctctataattccgaaaaaaatcttctgttaattttcgcgaaatgtcgtcaattaaaatgcgcgctgtgcagtacgcaacggtgtacgaaagtttacactgagtattcaacgttgagcacgcagtcagccaaattggaatacggtagagaatcctcgtccatgtaccccccgccgggccgtg', 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcaattaata', 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcaattaata', 'ttgttcggtggagcgcgtttgcacccatctagcaactgaaccaccgtg', 'cacggtggttcagttgctagatgggtgcaaacgcgctccaccgaacaa', 'gttttttcttattttttcttgtttttttttgttactacgtggaataagcataatgttttcagctaatctacgcaaaccgcttcgaaattcgatggattaaaccggacaaaggtaagaaacgtcgaaaagtaagttattattcaaaatatctcgaaaaattatggtacaatatttctgaaaacctactcaaatatggaaaaatatatgctgattccggattgaaaagtcttgaacctcaatctcgcgtatttcttattgagtttctcgattttagtgaaaaattcgagaaattcagtgcaatttcgaattaaaaattaaatttttccaagaattatggtacaatatttctgaaaacccactcaaatatggaaaaatatatgctgattctgaattaaaaagtcttgaacctcaatctcgcgtatttcatatttagtttctcaattttagtgaaaaattcgagaaattcagtgcaatttcgaattaaaaattaaatttttccaaaaattatggtacaatatttctgaaaacccactcaaatatggaaaaatatatgctgattctgaattaaaaagtcttgaacctcaatctcgcgtatttcatatttagtttctcaattttagtgaaaaattcgagaaattcagtgcaatttcgaattaaaaattaaatttttccaaaaattatggtacaatatttctgaaaacccactcaaatatagaaaaatatatgctgattctgaattagtaacatttgaaactcaatctcacgtatttcataatttttttttcgattttaatgaaaaatatattcagtgcaattttcgattttttttcaaaa', 'gctgatattgacgaattgtcgtcaaataaaatacgcgtacgcaaagtacgcaatagtgtacaaaacttattcgcgttgaatattcagtcagccaaatgggactactgtagaaatttcttcagccacgt', 'tgtgggctacggtagtcaagtacgcaaacaccacgagcattttcacaattgcgtacaaaatttttttcaagcttt', 'tattaattgctaaaatttatgtggactacggtagtcaagtccgcaaacaccacg', 'ttttgaaaaaaagttactttttgttcgaaaatgtattgattttcacttattttcactagattcaaaaaaa', 'ttggacccatgctatactcaacatttttttggaattctgaatcagcattctcttcataaattagacaatttctaaaaaatctggaccaa', 'ttttccggtgatttgctaaaactataatttctatttcaattattaaccgagaaaaccagaaaaaa', 'ttgttcggtggagcgcgtttgcacccatctagcaactgaaccaccgtg', 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcagttaata', 'tattaattgctaaaatttatgtggactacgatagccaagtccgcaaacaccacg', 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcagttaatacttcccggtttcgttttacatgataatatcgttgaatttaagcaaaaaatgcagcattagtttcatgaaaaaaaaaataagaga', 'atttttcctcaaaaactagaatttttcattgaaaaatggcttaaaaatcgatttttgttcgaaaaaa', 'aaggttttcatcgataaagtcacgaatttgtcgaaatgctttggtgagttttatctttttcagaaaaaaaattcgaaaattttcag', 'cacggtggttcagttgctagatgggtgcaaacgcgct', 'attgttcggtagagcgcgtttgcactcatctagctgctgtaccaccgtg', 'agaaatttctacagtagtcccatttggctgactgaatattcaacgcgaataagttttgtacactattgcgtactttgcgtacgcgcattttatttgacgacaattcg', 'ttttggaaaaatttaatttttaattcgaaattgcactgaatttctcgaatttttcactaaaatcgaaaaaaaatatatgaaatacgcgagattgaggttcaaggcttttcaattcggaatcagcctatatttttccatacttcaatgggttttcagaaatattgtatgtgaatttttggaaaaatgtgatttttaatttgaaattgcactgaatttcttgaatttttcaataaaatcgagaaaataaatatgaaatacgcgagattgaggttcaagactttttaattcggaatcagcatatatttttctatatttgagtgggttttcagaaatattgtatgtgaattttttgagaaaa', 'tttttcgaaaagtgttggaccagatttttttgatcttagaatatgaataaaagcatgctg', 'caaggcccggcaaaccgtacgtggctgcaaaatctctaccgtagtaaaatttggctgactgaatattcaacgtttgatactcagtgaaaagattcgtac', 'cacggtggttcagtcgctagatggcggtaaacacgctctattgcgaacaatcgtttaaatttcaagcctttcgctcggcatcgattctctccgttttttgccgattttaaccgtttttccgtattttttttgttatttttttttgcaaaattgaaaaagtgtgtcttgtttcaggtagtaatcgaaaaatttcgaggaaaacgaaggaacgtcaccgtgagcagcaaggaaaacgaattcagtgagttttttgactatttctctctgaaaaaattaaaaaatgtattactttttgattgaaaaagaaagagcaaaaaattctgaagaagatgtaacttaatttattcagaacaaaagttttttttgtctcgaaattacaatttttcgaccaaaaatagaaaatgatcattttctcgaaaatgaatcccaaatttatgtcagttttttatgaaaaaataatcagcgtaaacttctgtactagaaatactaccttttttttggattgaaagtttgtatcgtgctcaaaattttaaaagtaaaattattttacgctgaaaaatgcaaaatcataactttttttgaggagaaacccccaaatgttatcgattttgttatcaaattgaactcagcttaaaattatctacaagatgaagctaaattcattgaga', 'atattaattaagttttttgtatcaaattgtgtgttttctcaattttatattgcctttttatttcataactttccttttctgttcaaaatcaacttttttttgtgttttaacacttcaattatcaattgttagtttataaatttcataataaactctgattttttattttttttcatcttgaaactattaattctgctgtttttctgctaaaatttgcttcaaaaatctatttgccgcttcattgttttgcggactacggtacgcaagtacgcaaacaccgcaacgacacattgcggaccatttcgctgcgtacgctgcgagatctttctcaaattttacgagagatctagtttctgtgctactgtg', 'cacggcccggcgaaccgtacgtggctgtaaaatctctaccgtagtaaaatttagctgactgaatattcaacgtttgatactcagcgaaaagtttcatacgacattttgcggacgcggcattttaattgacgacactttcttagttatagaaacgctggttatttcggcaattttcgggcagaaattgtgaaaaataaatgaatttttatcgaatttacaaaaaataataattttaaaggattgtatgttgctgatattgaagaaaatcgatactttaaacagtttttcgca', 'aaaaaaaactcacttttttcgaattttcctccttccagttggcggttgagtgctgttttgccgcggttttgtgattttctgcaacaaaaaaaatatttttatcgataaaaatgagaaaaaacgagaaaaaacgaagataaatgagcttagcagtcaaattaaaaatgtttattcgttcaaaaatcttaaccataggaggcggtggcctagccggcgcagctctcgcggccacgtctctttcgccgggccgtg', 'ttttcttcgttttttcttattttttcttgttttttcttgttactacgtggaataagcataatgttttcagctaaactacgcaaatcgcttcgaaattcgatggattaaaccggacaaaggcaaaaaacgtcgaaaagtaagttattattcaaaatatctcgaaaaattctggtacaatatttctgaaaacccactgaaatatggatatgctgatttcgaattaaaaagtcttgaacctcaatctcgcgtatttcatatttattttctcgattttattgaaaaatttgagaaattcagtacaatttcgaattaaaaattaaatttttccaaaaattctggtacaatatttctgaaaacccactgaaatatggaaaaatatgctgatttcgaattaaaaagtcttgaacctcaatctcgcgtatttcatatttattttctcgattttattgaaaaattcaagaaattcagtgcaatttcaaattaaaaatcacatttttccacaa', 'gctgatattgacgaattgtcgtcaaataaaatgcgcgtacgcaatgtacacaatagtgtaaaaaactttatcgcgttgaatattcagtcagccaaatgggactactgtagaaatttcttcagccacgt', 'acgtggctgaagaaatttctacagtagtcccatttggctgactgaatattcaacgcgaataagttttgtacactattgcgtactttgcgtacgcgcattttatttgacgacaattcgtcaatatcagc', 'aaaatgttgaagagaaaccagagaaattgatcgagtagattcttggcaagttttgaaattatatggttttaataagttttgaacatttttaaatacaactaaccatgga', 'ttgtttggtggagcgcgtttgcaccaatctagcaactgaaccaccgtg', 'tttttttaaatttttttcttggctgctttactgatgtttttttctcaattttttcttgttttctttgttactaatttaaattaaaaaaactattttcagcttatcacagcaaatcggagcgaaactcgaccgcgataacaggaaaaagtcgaaaagtgagttttttgccaaaatatctcgaaaaactcatattttgttttgaaaacagatgcaaataaaaagaaatacat', 'atgtattaattgctaaaatttatgtggactacggtagtcaagtccgcaaacaccacg', 'tttcgtaatgtttttttcgagtttttgattgttttttctcatttttttttgtttttctttattattagttaaaatataaaaactattttaagctaatcaacgcaaatcgaggcgaaaaccgatcgcagaaagaggaaaagtcgaaaagtgagtttttttgcaaaaatatttca', 'acgtggctgaagaaatttctacagtagtcccatttggctgactgaatattcaacgcgaataagttttgtacactattgcgtactctgcgtacgcgcattttatttgacgacaattcgttaatatcagc', 'ttcattaaaatcgaaaaaaaaattatgaaatacgtgagattgagtttcaaatgttactaattcagaatcagcatatatttttctatatttgagtgggttttcagaaatattgtaccataatttttggaaaaattaaattttaattcgaaattgcactgaatttctcgaatttttcactaaaatcgagaaactaaatatgaaatacgcgaaattgaggttcaagacttttaattcagaatcagcatatatttttctatatttgagtgggttttcagaaatattgtaccataatttttggaaaaatttaatttttaattcgaaattgcactgaatttctcgaatttttcactaaaatcgagaaactaaatatgaaatacgcgagattgagattcaagacttttaaattcggaatcagcacatatttttccatatttgagtaggttttcagaaatattgtaccatattttttcgagatattttgaataataacttacttttcgacgttttttgcctttgtccggtttaatccatcgaatttcgaagcggtttgcgtagattagctgaaaacattatgcttattcca', 'gcaaacatactcttttgcgaataagcgatttttttgttttttttttttggtgttttccgttttttgcttgttttcaccgtttcccctctttttttttgtttttttttgtcaaatcgagaaagagtgtgtttttttttcaggtgttaaacaagatttgcgagcaaaacgagggcacaccatcgtaagaagcgaagaaaacgagaaaagtgagttttttgaagattcctctttaaaaaatagggaaatgttttagttttgagccaaaaaagaaagagctgaatttttcaaacaagatacatgc', 'cacggcccggcgaaccgtacgtggctgtaaaatctctaccgtagtgaaatttggctgactgaatattcaacgtttgatactcagcgaaaagtttcgtacgacattttgcggacgcggcattttaattgacgacactttcttagttatagaaacgctggttatttcggcaatttcgggcagaaattgtgaaaaataaatggatttttatcgaatttacagaaaataattatttgaaagtattgtatgttgctgatattgaagaaaatcgatactttaaacagtttttcgca', 'aaaaaactcacttttttcggattttccgcctcccagttggcggttgagtgctgttttgtcgcggttttttaattttctgcaacaaaatgtatatttttatcgataaaaatgagaaaaaacgagaaaaaac', 'ttgttcggtggagcgcgtttgcacccatctagcagctgaaccaccgtg', 'cacggtggttcacttgctagatgggtgcaaacgcgctccactgaacaa', 'tattaattgctaaaatttatgtggactacggtagtcaagtccgcaaacaccacg', 'gcgaaattctgcattttgtcgtgagatccgcggtgtttgcgtacttctggggctaccgtaacccggaaaa', 'tcgagttttacattgaaaaaaaatggccaaaaatcggagaaaaatgggcaaaaaacggagagaattgatgacaaatcaaag', 'tcgagttttacattgaaaaaaaatggccaaaaatcggagaaaaatgggcaaaaaacggagagaattgatgacaaatcaaag', 'ccttaaaaggaagaaatttggtggaaaaatacaattttcgctctaaaaaattccgtaaattcgagaatttatgaaaaatactttggttttttat', 'gcaaaattctgcaatatgtcgtcaaattcggtgtttgcgtattttcgacgctaccgtaccccgcggaa', 'ttttcttcgttttttcttattttttcttgttttttcttgttactacgtggaataagcataatgttttcagctaatctacgcaaaccgcttcgaaattcgatggattaaaccggacaaaggtaaaaaacgtcgaaaagtaagttattattcaaaatatctcgaaaaattatggtacaatatttctgaaaacctactcaaatatggaaaaatatatgctgattccggattgaaaagtcttgaacctcaatctcgcgtatttcttattgagtttctcgattttagtgaaaaattcgagaaattcagtgcaatttcgaattaaaaattaaatttttccaaaaattatggtacaatatttctgaaaacccactcaaatatggaaaaatatatgctgattctgaattaaaaagtcttgaacctcaatctcgcgtatttcatatttagtttctcaattttagtgaaaaattcgagaaattcagtgcaatttcgaattaaaaattaaatttttccaaaaattatggtacaatatttctgaaaacccactcaaatatagaaaaatatatgctgattctgaattagtaacatttgaaactcaatctcacgtatttcataatttttttttcgattttaatgaaaaatctaagaattcagtgcaattttcgattttttttcaaaa', 'gctgatattgacgaattgtcgtcaaataaaatgcgcgtacgcaaagtacgcaatagtgtacaaaacttattcgcgttgaatattcagtcagccaaatgggactactgtagaaatttcttcagccacgt', 'actacggtgcgcaagtactcaaacactgcgacgtcagagcgcagac', 'tttagacgtatttttcttttctctgctcttatgatcgattttcgcagaggtttttgattatccggtaaatattactagttattctaatttttcattaaaaaattacatcgaaaataacgaaaaaacatcgaaaaacgcgaaagatcaacgaaaccaattcatgaattaattcgaatttataattcagtacaaaagcgattcggtcgcgggactagattttgcaacttcctaggccatttccaatttgcagtgc', 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcaattaata', 'ggtggttcagttgctagatgggtgcaaacgcgctccaccgaacaa', 'ttgttcggtggagctcgtttgcacccatctagcaactgaaccaccgtg', 'cacggtgcttcagttgctagatgggtgcgaacgcgctccaccgaacaa', 'ttgttcggtggagcgcgtttgcacccatctagcaactgaaccaccgtg', 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcaattaata', 'gcatggggcgtggccgaaaattctctactaccgtttaccaatttggctaatttgccaatcaacgttgaaaagttttgtacatcg', 'cacggcccggcggggggtacatggacgagaattctctaccgtattccaatttgactgactgcgtgctcaacgttgagtactcagtttaaagtttcgtacaccgttgcgtactacacagcgcgcattttaattgacgacatttcgcgaaaattaacagaagattttttcggaattatagagctgaaattgaaaaaaaactatcaaattttcatcgaatttgtgaaaaatcgtaagtatgaagatcttttcttcactatattcaaggaaaatcgatatttagcttttcacagacgaatgatgtctcattttact', 'gcctcattttactcgatggaagtttctgatgagctgtttttatcgatttttgagcgataaaaatgcgatttgttgataaaatggataaattatataaagaaacaacatatattgctctgagattactttttgagaatcaattctttatttttcggtcattttaaattaagcattaaaataaaaatattagaaatcataataaaaaaaacagaaaatcgatatattactatttcttcggaatttcacgacttttttggacgaattttagtctgtaaactttcttcttcgaatttgtgtccacgtggctttcagtcgaagaagattctgcagcactccttcttgcttgcccacaacttgctcgaattttctaaaatttttaacttattgaaattgtcatttcacctttacactctcttcagctaaactattactgcatttcggaagttgataagatactggtggagcaacaagtggatggcttctagtgattggctggcttgtcgagcaagtttgtgtgattgcctgaaataatttttgatttcaattttgagttgatttaaag', 'gatttaaagcagtgaacctaccatcgggttcggacgagaaagagcattgctcggtagaccacggaatccaattttcgttgaattgcctccaaatgcaatagaagtttgtacgttttgtgagaagtcgggctggaaattttcaaaatttgaaacttttcgtgaaaaataaaaatctcaccacagcatttcgagattttgtcgattgtggaagccttttcttggagctaaaattgattt', 'tacgatggaaagaccgggaatggacgtgttctgaaatagttgtgtttttaagaatgcataaatttttttctgtaccaaaattaccatagtcatgtcattcatgatgttacgacacatgagctctctcagaacatggatgtaacgccttttcttgtcccggtaattgcaaaatctcctctcaagtgcattgaaaatcgcgtggacagattcaactccttgttctgtgatccttccaatgtttctcacatcttttgccatttgtggtgcatggtagaccaacaagtgcagctttaaaataattgtttcttcgggaaccgctactttcaaatcctccacaaatccgcgaatcgaattttgaagtattaagacgtcggaatcatttaaaaacttgtttcccgaaagtgacataatagttgaaagctttcccattgctgatttcaatccgagcaacattgggcataaatttgggccaaaaatgttgaaagtctcctctacaacagccggcgttagcagcaatttcaaatggtttccgcaaaatgattggaaccaagcctgcttgtccgctccaaacttagcccaacactgttccattttttcaagtgttcctccgggagtaccattcacaattgtgtcgagcaacaatttttccgattgaagtgctttcagttcagcatgcgactccaatttcatctttccggtggctgcttgatacttttcttccgcacttttgattaggttaacagcgttttttagagttgcttttcgtgttttcaggatagggaaacaagtagtgttatccaaagtgacagaatatttccagaggggattgaagatatatttgtcaaaaatacccatgataatgtgcagaagaggaatcaaatagaacatgatcgcaacgtgtggcagaagtggagtacatcctttgcgaacacccaagtcgccattttcacaacaagctttgtaaagatcgattgttcgtgggtggaatgtttcatcaacattcatatccttgattttcatcctctcttcagctccccgtggattctgtgcaaaacatttgaagcagaaattgtgggatgaatgtccttggtgtccaagaatatcagattgaaacttgcaatctccagttgcaatttgcacaatttttgcggttttttgaactcctttgtccaaatatcgaattttcgttagcttgccaagctgctcaagaacgtccggaatgaattttttcagagacgagtaattgtcggatccgtcatatactgcaattaccataacgtgtctcgaagaattcggtcgagatacgtttccgattaccaatgccaactttgtgcttccacctccagcgtcaccaacgactccaatgttgattactcctttcgtgtatccgtcgtccacaaattgatttgaattgcatagaagctctattcgataggctaaaacttctgcaattttcatgcactgcacaatggtaatcacttttcctttattgtcgaacgaagtggaaactttgaaactggagatcattgataactggattggcaaatctcttgcgttctttaccgatggaagcaaatcatagccaatggcattagtcaaatagtttttgattttttccatctgacttagagataatccgcattttgataaaaagtcaacggcctcaaagtttgaaagcttgtttttgtagctttgattctcttctgaattcaggaattttgtaaattttcgaataaattgtccgacgtcatcctcgaggcagatttcgtgttgaagcaagtgaagagctttgcgaaatcgatttttgatacaacttttgtttcttagattcgaaaatttaactttaaaagctgattttttaaggttttcaacttcttcggcgtgtctttgtagactcagaaccatagctttgccacttttcttcacatctgcacagcttctcaccaatcgaccttctataccactgacgatcgttcgtatattgcatacttccatttgcagcgaagaattagatgctcttatagtgatattttcatggcggactatttgtatttcttccgaaaacaccgcaaacgcatcattctgcttttgtatttcttctgatatttcatttttttcatttttcagtcgttcgatcgttagtcggagcattttgatctgcggaatttgctcaacattggagattattcgaaccctcggtgtactgaacgagtttcgtgaaggtgtcggtggaaatacgggattggagaatctctgcgaaatcatataatataatattagttttgaaatattgaaaaaaattacattgtgagaaaaagtcggaatatcgtcactaaaatccatttccacgtctctcgtcagaattccttcatccatattgaaacaatttgacgacctgcatgtagttgcggagctactggaagcaatgtcgggatggtgggagtttcgatcttctgaactgatttcctgattagcctgtggcgacgaactgcacgtctgaaaatcacgtttttgaagttagaacaaactactccaacttaattaaagtagacaaaattgagctgaacgaacctccactttcgaattgttcagttcttcctcttcagtttgatcttttgaaactccattagcactgttccttgctctctgggcatttgctaaaagaaggcctgcacaagatttttcttttcttttttgtttgaagtatacttttgtcatctggaaatattgcatgaatattataagggaaacaatttttaaatatcgattttcacgaaatttgaaaaaatcaataatttgggcgcatgatattgagctgaatgtttcgaatttagaatcagcatgcttttattcatattttaggatctttttaaaaaatctggaccaacagtttttgaaaaaaaatacttttcgttcagaaatgtactgattttccactgattttcacgaaatttgaaaaaatcaataatttgggcgcatgatattgagctgaatgtttcgaatttagaatcagcatgcttttattcatattttaggatctttttaaaaaatctggaccaacagttttcgaaaaaattcaatttttgttcagaaatgtgaatattcactaaatcgaaaaaaataattgcaaaatccgtcggctgagcattcaaaacttatcaatttgaaatcagcatatttcagtgtataattaaaaaagatttcaaaaattctgagaccaatttttgttgagaaaaataatttttcgttcgaattatcgatttttcacgaaatgccaaaaacagtaaacttgggcccatgctaaaagcctgaatctttcaaattaaaaaccaacatgattttttctatattctaagacgtttaaaaaaaatctggaccaacagttcttgaggaaagtaattttttatacaaaaatgtgctgatttttcactaaattcaaaaaaatagtcaagttgggcccatgctatacacctaaatcattaaaattcagaaccgccatgtattttttcataccataggctctttaaaaaaaatctggaccaacagtttttgagagatgtcaaaaaaacaactcacttttcgacgtttttcgtgtttccccggatgatgcggtcgatttttgctgcgatttgtggtctttcgctgaaaatattatttttatttcaatttttaacgaagaaaacaagaaaaaacgacgagaaaacatcaaaaaacacgaaaaaaacgtcgaaaaactcccgcaacctcatgaaaaaaaataaagcactgcagccgcgggactagttttcgcaactttctaggccatgtcccgttcgccgtgtcgtg', 'aaaaaaactcacttttcgactttttcctgtttctgcgatcgggttttgcgtcgatttgtggtaattagctgaaaatataaactatagtttttatattttaactattaataaagaaaacaagagaaaagtgagaaaaaacaatcaaaaactcgaaaaa', 'tattaattgctaaaatttatgtggactacggtagtcaagtccgcaaacaccacg', 'TGCGAAAAACTGTTTAaagtatcgattttcttcaatatcagcaacatacaattctttaaaatgattattttttgtaaattcgataaaaattaatttatttttcacaatttctgcccgaaaattgccgaaatgaccagcgtttctagaactaaaacaagtgtcgtcaattaaaatgccgcgtccgcaaaatgtcgtacgaaacttttcgctgagtatcaaacgttgaatattcagtcagccaaattttactacggtagagattttacagccacgtacggttcgccgggccgtg', 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcaattaata', 'cacggtggttcagttgctagatgggtgcaaacgcgctccaccgaacaa', 'cacggtggttcaatcgctagatggaggcaaacacgctctattgcgaacaattgtttaaatttcaagcctttcactcggcatcgattctctccgttttttgccgattttaaccgtttttccgtatcttttttgttatttttttttgtaaaatcgaaaaagtgtgtcttgtttcaggtagtaagcgaaaaaattcgaggaaaacaaaggagcgtcggcgtgggcagcaaggaaaacgaattaagtgagttttttgacgagttctcttaaaaaatcacaaaatattttatttttttgattgaaaaagagagagctaaaaattctgaacaagatacgggccgaattttttagaaatcgtaccttcttcattaattttttttcgaaactaaaattttgcactgaaaatggaggaaaaatgatttttttcgaaaaaaatcacaaaatgttatatttttattgccgaattagaatcagctaaaaaatctctacaagataaaggcaaacatatacacgaatcgtacatttttcacttaaaattttccaaaagtaaatttttttcactgaaaatagtttttaaaaacgttccctttttcaagaaaatcaaaaactctttacgatttatagttgttttatgatcaaaatttaattgtgtataagatggaactcaattcatagagaaatcttaatttttcattaaaattttttcgaaaaaatttttttttgtaataagaatttataaaataattttttttgccaaaaaaatcacaaaatgtgttagttttttgcttcaaagagaaagaactgttaattttaagcaagatacagagagaattttttagaaatcgtacattcttcaagaaaattttttcgaaactaaaattttgcactgaaaatggaggaaaaatgatttttttcgaaaaaaatcacaaaatgttttatttttattgccgaattagaatcagctaaaaattctctacaagataaaggcaaacatatacataaatcgtacatttttcacttaaaattttccaaaagtaaatttttttcactgaaaattgaggaaaaaatttttttttcagatgcctcctccatcaaatcaatcagcatggtcttcaaatcttctaaatgctcggaaacgacgatggaaaaaagatgaagaagacgaaaattcaaatactgaaagaaatgcttccctggacctaaacctgtcgaatgtagaattgacagaagctctcgacagaaacggagatgcgcagatcactcctgagagaagacgagtgtctccatttgtggattttgatgataatctggatgtggagtcgaaacgcaatatcgatttgaatctgacaataacacctgaaactgatattgtaagtgaattaatccatgctacaactcggatcctaagacgattataatattttaaaaatgatttttccagccgttcgacacacaaccatcaacttcttctattgcggaatcttcttcattacctccaagaacaccacactcttttgaaagatcgtcaagtccactctattcatcgagtgacagcgtcggcgtattttcgaatgagacaccatatgtgaagggaattagagaagacttgagtagagaacgaatgagattgctagaagttgattccattttggatagcatatcagaggtagatttatttctttgttgttgattgtttattcaattttccatttcagaaccttgacgccatcctgtccagctcaaaccattcggtccagctcaatccaagaacagacattgacgaagagacacttctaaatcggttgatggagagagacgacgagtttgccgttttttctgatgaatttttcatgcaaaggaaagagataatttcaataagaaaaacatcaaaacggaaaaccaacgaaaatctgatattaagaaaacaaatcgctcaaattacgccaagggcgaaaaagtacagaataatatggatatcagagttgcgaagaaggaatcgttcacaaagccgaaacgttgtcacgcttcccaagtataccatcgaatggagcaagctcgaaaataagagctctggaaaatctaggatagatttggcactggatttgcttcaacgaatcagtcgagaagaggatattcgtaatttcatgattgatttccagaaatatctccaacaaaatgattcgcagtccttccaagtccaattgacgcattggcaaacgattatttttcaagaaaaatgtagattctctaacaatcaattgcggcgggtgaagcaatactataagatgttcaccgatcttgaaatcatgccaactattaatttgacgatgcaactgaaaacgagaatgtcaactatcgacaattataaggccacaacatatatatcaaaaggagaaaaggtgaccgttgtcaaaattatcgatgtcgagaaaagtgtcattccgaagctggagcgtttttctgtatcaaaacaattacgccatgattcctataccgatggcaaaattgttatcggaattggtggagattcgggaggaggaacgacgaaactttgtcttctgatcggaaattgtgatcatgcgaactcgccgcaccgaatcgtccttctcgctgtttttgatgactctgattcgcgggaacttatcatggcctacctgtctgatcttatcgtgaagatcaacaacttcaccagcatcacctatatggaagatggaagggtggtgacacgtggcgttgtccaaaaagtagtcggtgattttaagtttacatgtgatttactgtcacataaaaaacaatgtgccacctatttttgcccgttttgtttcgaaaccaatccaagaggaggactgatgcaaaagcttaaagatctgaacttgcaaaagatatatttcttgagaacaatgaattcttacaagctcaattctaaatacggtagctttggtgtcagatgcggaagtggaccaattcttcaaaatgtcaagttggaacactacttgcctgcaatgctacacttgattgtcggactgttcacgaaatacatctttgagccgatttggatggccgttgtgagtttagacaataaaacatcatttgaaattcgaagaaataaagaagagacaaaaagagttgccgatttgaagattcaagacgcgaacaaaaaattcgaagcttcaccacttaagagaaaaagagagatgaaagcacagtttactgctctgaaagaagaaaaagtgttattggacgagacgctggatggacttgctggtggatatcttaaacaatttgagaatgatctggaagaagttggtgcaacacgccgagcgtggtttcaaatgtacactggaaatcacacgaagctgattttgtctgagaaaggtgtcacagctgcattcaaaaatctgaaaaaccacatgactccgatgcttctgaatgtcaagaacgcgatgagcaggctgtccaaaatcatgtcattgtcggccaatcgattgctctctgacgatgacatcagcgaattggatgagtcgatgaaggaattcgttgaattcctacaagctgcccacccagaagaatcaatcactcaaaaattgcatgtcctggttgctcacgtagtagaagtcgcaaaaacggaaaggagctgggaaggctttcggagcaaggaatcgaatcgcttcatgccgttttcaatcgcctcgaaagacgcttccactcggttagaaacacagggaaaagatacctgtacattgctaaggagctgtcatgtagcaatctaatttctgatatggaggaagtaagtggaacctttcaaaaaaacaaatcgtaacattttctttgcaggacgcctcgtcttcccaataaaccacccagcgctctccctcacacaattctgaaccttttcgcaattaaaatttcgggatttcctcaatatcataaaattcgggattttctatggttagttgtgcttaaaattattaaaaaattattaatttcattaaatttcaggtaaaaacattgacaagcagatactcggtgcaaatttgatggatttctcatcaatattttgtcgaggatttcataagcggagatgctccacatggtttctttcaacttggaaaattgtcgatgtgacaccttaggattctatcgaaaagctaaaattcaggatttccttcggtttgttgggattaaaattattcgaaaataattgattttatgaaatttcagattgaaaccttgtcaatgagctactcagcaaaatttcgatggatttatctcaaaaagttcaccaggagtcgaggcaattaagaaatttcaccgggtttccttattttcgaaagttttacctagaaaaatttcaacttgcatccgtaaagcgaagaaggaggctgattccacatattaaattcgttttttgaatcgatttgtatgtatttttcaattttcaatcttccgtttcgattatttctttgttttactgttcaaaattaatttttctgtgtttcaataattcaattctcaattgttagctactaaattgaataataaaatctattttttatttttttttcaccttgaaactattaattctgccttttttctgctaaaatttgcttcaaaaatctatttgccgctgcaatgttttgcggactacggtacgcaagtacgcaaacaccgcgatgacacattgcggaccatttcgctgcgtacctgcgagatctttctcaaattttacgagagatctagtttttgtgataccgtg', 'ttgttcggtggagcgcgtttgcacccatctagcaactgaaccaccgtg', 'cacggtggttcaatcgctagatggaggcaaacacgctctattgcgaacaattgtttaaatttcaagcctttcactcggcatcgattctctccgttttttgccgattttaaccgtttttccgtatcttttttgttatttttttttgtaaaatcgaaaaagtgtgtcttgtttcaggtagtaagcgaaaaaattcgaggaaaacaaaggagcgtcggcgtgggcagcaaggaaaacgaattaagtgagttttttgacgagttctcttaaaaaatcacaaaatattttatttttttgattgaaaaagagagagctaaaaattctgaacaagatacgggccgaattttttagaaatcgtaccttcttcattaattttttttcgaaactaaaattttgcactgaaaatggaggaaaaatgatttttttcgaaaaaaatcacaaaatgttatatttttattgccgaattagaatcagctaaaaaatctctacaagataaaggcaaacatatacacgaatcgtacatttttcacttaaaattttccaaaagtaaatttttttcactgaaaatagtttttaaaaacgttccctttttcaagaaaatcaaaaactctttacgatttatagttgttttatgatcaaaatttaattgtgtataagatggaactcaattcatagagaaatcttaatttttcattaaaattttttcgaaaaaatttttttttgtaataagaatttataaaataattttttttgccaaaaaaatcacaaaatgtgttagttttttgcttcaaagagaaagaactgttaattttaagcaagatacagagagaattttttagaaatcgtacattcttcaagaaaattttttcgaaactaaaattttgcactgaaaatggaggaaaaatgatttttttcgaaaaaaatcacaaaatgttttatttttattgccgaattagaatcagctaaaaattctctacaagataaaggcaaacatatacataaatcgtacatttttcacttaaaattttccaaaagtaaatttttttcactgaaaattgaggaaaaaatttttttttcagatgcctcctccatcaaatcaatcagcatggtcttcaaatcttctaaatgctcggaaacgacgatggaaaaaagatgaagaagacgaaaattcaaatactgaaagaaatgcttccctggacctaaacctgtcgaatgtagaattgacagaagctctcgacagaaacggagatgcgcagatcactcctgagagaagacgagtgtctccatttgtggattttgatgataatctggatgtggagtcgaaacgcaatatcgatttgaatctgacaataacacctgaaactgatattgtaagtgaattaatccatgctacaactcggatcctaagacgattataatattttaaaaatgatttttccagccgttcgacacacaaccatcaacttcttctattgcggaatcttcttcattacctccaagaacaccacactcttttgaaagatcgtcaagtccactctattcatcgagtgacagcgtcggcgtattttcgaatgagacaccatatgtgaagggaattagagaagacttgagtagagaacgaatgagattgctagaagttgattccattttggatagcatatcagaggtagatttatttctttgttgttgattgtttattcaattttccatttcagaaccttgacgccatcctgtccagctcaaaccattcggtccagctcaatccaagaacagacattgacgaagagacacttctaaatcggttgatggagagagacgacgagtttgccgttttttctgatgaatttttcatgcaaaggaaagagataatttcaataagaaaaacatcaaaacggaaaaccaacgaaaatctgatattaagaaaacaaatcgctcaaattacgccaagggcgaaaaagtacagaataatatggatatcagagttgcgaagaaggaatcgttcacaaagccgaaacgttgtcacgcttcccaagtataccatcgaatggagcaagctcgaaaataagagctctggaaaatctaggatagatttggcactggatttgcttcaacgaatcagtcgagaagaggatattcgtaatttcatgattgatttccagaaatatctccaacaaaatgattcgcagtccttccaagtccaattgacgcattggcaaacgattatttttcaagaaaaatgtagattctctaacaatcaattgcggcgggtgaagcaatactataagatgttcaccgatcttgaaatcatgccaactattaatttgacgatgcaactgaaaacgagaatgtcaactatcgacaattataaggccacaacatatatatcaaaaggagaaaaggtgaccgttgtcaaaattatcgatgtcgagaaaagtgtcattccgaagctggagcgtttttctgtatcaaaacaattacgccatgattcctataccgatggcaaaattgttatcggaattggtggagattcgggaggaggaacgacgaaactttgtcttctgatcggaaattgtgatcatgcgaactcgccgcaccgaatcgtccttctcgctgtttttgatgactctgattcgcgggaacttatcatggcctacctgtctgatcttatcgtgaagatcaacaacttcaccagcatcacctatatggaagatggaagggtggtgacacgtggcgttgtccaaaaagtagtcggtgattttaagtttacatgtgatttactgtcacataaaaaacaatgtgccacctatttttgcccgttttgtttcgaaaccaatccaagaggaggactgatgcaaaagcttaaagatctgaacttgcaaaagatatatttcttgagaacaatgaattcttacaagctcaattctaaatacggtagctttggtgtcagatgcggaagtggaccaattcttcaaaatgtcaagttggaacactacttgcctgcaatgctacacttgattgtcggactgttcacgaaatacatctttgagccgatttggatggccgttgtgagtttagacaataaaacatcatttgaaattcgaagaaataaagaagagacaaaaagagttgccgatttgaagattcaagacgcgaacaaaaaattcgaagcttcaccacttaagagaaaaagagagatgaaagcacagtttactgctctgaaagaagaaaaagtgttattggacgagacgctggatggacttgctggtggatatcttaaacaatttgagaatgatctggaagaagttggtgcaacacgccgagcgtggtttcaaatgtacactggaaatcacacgaagctgattttgtctgagaaaggtgtcacagctgcattcaaaaatctgaaaaaccacatgactccgatgcttctgaatgtcaagaacgcgatgagcaggctgtccaaaatcatgtcattgtcggccaatcgattgctctctgacgatgacatcagcgaattggatgagtcgatgaaggaattcgttgaattcctacaagctgcccacccagaagaatcaatcactcaaaaattgcatgtcctggttgctcacgtagtagaagtcgcaaaaacggaaaggagctgggaaggctttcggagcaaggaatcgaatcgcttcatgccgttttcaatcgcctcgaaagacgcttccactcggttagaaacacagggaaaagatacctgtacattgctaaggagctgtcatgtagcaatctaatttctgatatggaggaagtaagtggaacctttcaaaaaaacaaatcgtaacattttctttgcaggacgcctcgtcttcccaataaaccacccagcgctctccctcacacaattctgaaccttttcgcaattaaaatttcgggatttcctcaatatcataaaattcgggattttctatggttagttgtgcttaaaattattaaaaaattattaatttcattaaatttcaggtaaaaacattgacaagcagatactcggtgcaaatttgatggatttctcatcaatattttgtcgaggatttcataagcggagatgctccacatggtttctttcaacttggaaaattgtcgatgtgacaccttaggattctatcgaaaagctaaaattcaggatttccttcggtttgttgggattaaaattattcgaaaataattgattttatgaaatttcagattgaaaccttgtcaatgagctactcagcaaaatttcgatggatttatctcaaaaagttcaccaggagtcgaggcaattaagaaatttcaccgggtttccttattttcgaaagttttacctagaaaaatttcaacttgcatccgtaaagcgaagaaggaggctgattccacatattaaattcgttttttgaatcgatttgtatgtatttttcaattttcaatcttccgtttcgattatttctttgttttactgttcaaaattaatttttctgtgtttcaataattcaattctcaattgttagctactaaattgaataataaaatctattttttatttttttttcaccttgaaactattaattctgccttttttctgctaaaatttgcttcaaaaatctatttgccgctgcaatgttttgcggactacggtacgcaagtacgcaaacaccgcgatgacacattgcggaccatttcgctgcgtacctgcgagatctttctcaaattttacgagagatctagtttttgtgataccgtg', 'cacggcccggcggggggtacatggacgagaattctctaccgtattccaatttggctgactgcgtgctcaacgttgaatactcagtgtaaactttcgtacaccgttgcgtactgcacagcgcgcattttaattgacgacatttagcaaaaattgaacagaagatttttcggaattatgaagctcaattttcacaaaaataatgagttttttgtagaatttatgaaaaaacgtgaatatatagattttttgttcatgatattcaagaaaaatcgatttttagttcttcacagagtaatcctatcgcatttcacttgctcatgatgtttttgctcgactttaggacgataaaaatgcgaattgttgataaaatgaatgaacaatataaagaa', 'ggggctgctggaaccaatgtcggcatgacgagagttccggtcttctggatccatttcctgcgtgggctgtggcgacgagctgcacgtctgaaaatcaagtttttgtaatt', 'tttgggcgcatgatatggagctgaatcattcgattttagaatcagcaagcttttattcatattttaggatctttttaaaaaatctggaccaacagtttttgaaaaaaaatacttttcgttcagaaatgtactgattttccactgattttcacgaaatttgaaaaaatcaataatttgggcgcatgatattgagctgaatgtttcgaatttagaatcagcatgcttttattcatattttaggatctttttaaaaaatctggcccaacagttttcgaaaaaatttaatttttgttcagaaatgtgaatattcacgaaatcgaaaaaaataattgcaaaatccgtcagctgaacattcaaaacttatcaatttgaaatcagcatatttcagtgtataattaaaaaaggtttcaaaaattctgagaccaatttttattgagaaaaataatttttcgctcgaattattgaattttcactaaatgcaaaaaacagtaaacttgggcccgtgctacaagcctgaatctttcaaattaaaaaccagcatgattttttcaatattctaggacgtttaaaaaaaatctggaccaacagtttttgaggaacgtaattttttatacaaaaatgtactgatttttcactaaactcaaaaaaatagtcaagttgggcccatgctatacacctaaattattaaaattcagaaccgccatgtattttttcatactataggctctttaaaaaaaatctggaccaacagtttttgagatatttagaaaaacaactcacttttcgacgtttttcgccttttcgcggctcacccggtcgatttttgcggcgatttgtgttctttcgctgaaaatattatttttatttcaattattaacgaagaaaacaagaaaaaacgacgagaaaacatcaaaaaaacgcgaaaaaacatcgaaaaaccaccgaaacctcatgaaaaaaataaagcattgcagccgcgggattagttttcgcaactttctaggccatgtcccgttcgccgtgccgtg', 'aactagatctctcgtaaaatttgagaaagatctcgcaggtacgcagcgaaatggtccgcaatgtgtcatcgcggtgtttgcgtacttgcgtaccgtagtccgcaaaacattgcagcggcaaatagatttttgaagcaaattttagcagaaaaaaggcagaattaatagtttcaaggtgaaaaaaaaaaaaatagattttattattcaatttattagctaacaattgagaattgaattatcaaaacacagaaaaattaattttgaacagtaaaacaaagaaataatcgaaacggtagattgaaaattgaaaaatacatacaaaacgattcaaaaaacgaattaatatgtggaatcggcctccttcttcgctttacggatgcaagttgagatttttcttggaaaaactttcgaaaataaggaaatcagtgggaacttcttatttcctcgactcctgcaggatcctggtgaactttttctgttaaatccatcgaaattgtgcggagtagctcattgacaaggtttcaatctgaaattttgtgaaattttatatttttgaataattttaatcacagcaaacctagggaaatcccgaattcgagcctttcgataaaatccagagatgtcacatcgccacttttccaaaatgaaggaaaccaggtggagcttctcagctttttggcttcctggtcgaaatcttgatgaaaaaaccatcgaaatttacttgagcagcttcttggcaaggtttttgtctgaaattttaggattttaatgatttttaacatttttaaacacaactaaccataaacaatccggattttttcggttttgactgaatccttggatttatgtagaaaacatgcccagaaatcaaggaacgaggtggaacatctcatttttttgaaattctggtgtaattcttgatgaaaaatccatcgacattacgttgaacgtcttcttggcaaggtgttttcttctgaaaattcatga', 'ctgtaacatctaagcagtttttgaaaagttttcgaaaaaaaaataaatttcagtttttgagatttaattttcagtgaaaaaaatttacttttggaaaattttaagtgaaaaatgtaccgtttctgaaaatgtttgcttttatcgtgtagagaatttttagctggttctaatccggcaagaaaaacagaacattttgtgatcgttttcgaaaaaaaaatttttttctttaattttaagtgcaaaattttagtttcgataatttttctgtgaagaatgtacgatttctagaaaattctgcctgtatcttgcttaaaatgaacagttctttctttttaaagcaaaaaactaacacattttgtgattttttttggcaaaaaaaattattttataatttcttatttcaaaaaatgttttttcgaaaaaattttaatgaaaaattaatatttctttatgaacttagttccgtcttatacaaaatttaatgctgattataaaataactataaaacgtgaaga', 'cacggcccggcgaaagagacttggccgcgagagctgcgccggctaggccaccgcctcctatggttaagatttttgaacgaataaacatttttaatttggctgctaagctcatttattttcgttttttctcgtttttttctcatttttatcgataaaaatatattttttgttgcagaaaatcaaaaaaccgcgacaaaacagcactcaaccgccaactgggaggaggaaaatccgaaaaaagtgagtttttt', 'tgcgaaaaactgtttaaagtatcgattttcttcaatatcagcaacatacaatcctttaaaatgattattttttgtaaattcgataaaaattaatttatttttcacaatttctgcccgaaaattgccgaaataaccagcgtttctataactaaaacaagtgtcgtcaattaaaatgccgcatccgcaaaatgtcgtacgaaacttttcgctgagtatcaaacgttgaatattcagtcagccaaattttactacggtagagattttacagccacgtacggttcgccgggccgtg', 'ttgttcagtggagcgtatttgcataaatctagcaactgaaccaccatg', 'attgaccaaaatcgagaaacattgcgaaaaactgtttaaagtgtcgattttcttcaatatcagcaacatacaatactttcaaatgattattttctgtaaattcgataaaaatccatttatttttcacaatttctgcccgaaaattgccgaaataaccagcgtttctataactaagaaagcgtcgtcaattaaaatgccgcgtccgcaaaatgtcgtacgaaacttttcgctgagtatcaaacgttgaatattcagtcagccaaattttactacggtagagaatttacagccacgtacggttcgccgggccgtg', 'cacggcccggcgaaccgtacgtggctgtaaaatctctaccgtagtaaaatttggctgactgaatattcaacgtttgatactcagcgaaaagtttcgtacgacattttgcggacgcggcattttaattgacgacacttgttttagttatagaaacgctggttatttcggcaattttcgggcagaaattgtgacaaataaatgaatttttatcgaatttacaaaaaataatcattttaaaggattgtatgttgctgatattgaagaaaatcgatactttaaacagtttttcgca', 'aaattgtagtcagtatcactgcagatgctggagcaggaatcacaaagttttgtctgattatcgagaattgt', 'ttttcgagatattttggcaaaaacctcacttttcgtcgttttcctcctactgcgatcgattttcgccccgatgattagctgaaaataattttatatgttagttagtaacaaagaaaataagaaaaattgagaaaaaacaatcaaaaactcgagaaaa', 'cgaattgtcgtcaaattaaatgcgcgtacgcaaagtacgcaatagtgtacaaaacttcttcgcgttgaatattcagtcgcgactagtcagccaaatgggactactgtagaaatttcctcggccacgttccaaacgccgctccgtg', 'ttgttcgatggagcgcgtttgaacccatctagcaactgaaccaccgtg', 'cacggtggttcagttgctagttgggtgcaaacgcgctccaccgaacaa', 'cacggtggttcagttgctagttgggtgcaaacgcgctccaccgaacaa', 'cacggtagcacagaaactagatctctcgtaaaatttgagaaagatctcgcagcgtacgcagcgaaatggtccgcaatgtgtcatcgcggtgtttgcgtacttgcgtaccgtagtccgcaaaacaatgaagcggcaaatagatttttgaagcaaattttaacagaaaaacagcagaattaatagtttcaagatgaaaaaaaaaataaaaaatcagagtttattatgaaatttataaactaacaattgataattgaagtgttaaaacacaaaaaaagttgattttgaacagaaaaggaaagttatgaaataaaaaggcaatataaaattgagaaaacacacaatttgatacaaaaaacttaattaatat', 'tgcataaattcaatttcatcttgcccagaattttaagctgattcgaattcatatcaaaatctgaagattcttcaattttttttatcgaaaaatttttttttctaaattttgtgaaaaatttt', 'tttagcttcatcttgtagataactttaagctgagttcaatttgataacaaaatcgataacatttggggatttctccacataaaaaattacgatttttaattttttagtgaaaaatatttttactttcgaaattttgagcatgacacaaactttcaatcgaaaaaaaagcgtacatatctggtacagaattttatgtagattattttttcataaaaaactgatccaattttgggattcgttttcgagatagtgatcattttctatttttggtctaaaatttttgatttcgaggcaaaaaaaaattattctgaataaattaagttacatcttattcagaatttttagcttaatttctagctcgaattaaataaaaaatctgaagattcttcaatttttttttaccaaaaacagttttttttctaaactttgtgaaaaaatttttttaattcaaatggatctgacaaatttttttcaatctcaattattttagctttctcttgtacagaattttaagctgttttcaattcaataacaaaatcgataacatttgggggtttctcctcaaaaaaagttatgattttgcatttttcagcgtaaaataattttacttttaaaattttgagcacgatacaaactttcaatccaaaaaaaggtagtatttctagtacagaattttacgctgattattttttcataaaaaacttacataaatttgggattcattttcgagaaaatgatgattttctatttttggccgaaaaattgtaatttcgagacaaaaaaaaaccttttgttctgaataaattaagttatatcttcttcagaattttttgctctttctttttcaatcgaaaagtaatacattttttaattttttcaaagagaaatagtcaaaaaactcactgaattcgttttccttgttgctcacggtgacgttccttcgttttcctcgaaatgtttcgattactacctaaaacaagacacactttttcaattttgcaaaaaaaaataacaaaaaagatacggaaaaacggttaaaatcggcaaaaaacggagagaatccatgccgagcgaaaggcttgaaatttaaacgattgttcgcaatagagcgtgtttgccgccatctagcgactgaaccaccgtg', 'tcagttttcacaaaatgtcgtcaattaaaatgcgcgctgtgcagtacgcattcggtgtataaattgtaaac', 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcaattaata', 'tttgctggttacggtaaaaaagtacgcaaacaccaaacgtgaagtgcagacattgcgttttaccgtacttccgtgttcttttt', 'cacggcacggcgaacgggacatggcctagaaagttgcgaaaactagtcccgcggctgcaatgctttatttttttcatttggttgcggtgggcttttcgatgttttttcgtgttttttgaagttttttcgtcgttttttcttgttttcttcgttaataattgaatttaaaataatattttcagtaaaaggccacaaatcgccgcaaaaatcgaccgcgtgagccgcgaaacggcggaaaacgtctaaaagtgagttttttttctaaatatctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttgaaactgcactcaaatatgaagaaaagcatgattaaccagaaaaagtaacatttgagacccaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaa', 'tagaaagatttgagactcaagctcgcgtatttcagcttatttttttcatttggttgcggtgggcttttcgatgttttttcgtgttttttgaagttttttcgtcgttttttcttgttttcttcgttaataattgaatttaaaataatattttcagtaaaaggccacaaatcgccgcaaaaatcgaccgcgtgagccgcgaaacggcggaaaacgtctaaaagtgagttttttttctaaatatctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttgaaactgcactcaaatatgaagaaaagcatgattaaccagaaaaagtaacatttgagacccaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttattttttttgaaactgcactcaaataagaagaaaagcatgattaaccagaaaaagtaacatttgagacccaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttgaaactgcactcaaatatgaagaaaagcatgattaaccagaaaaagtaacatttgagacccaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttattttttttgaaactgcactcaaataagaagaaaagcatgattaaccagaaaaagtaacatttgagacccaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaaattgttttctcaaaaatttgacgcttattttttttgaaactgcactcaaatatgaagaaaagcatgattaaccagaaaaagtaacagttgagacccaagctcgcgtatttcatattttttttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaaattgttttctcaaaaatttgacgcttttttttttgaaactgcactcaaatatgaagaaaagcatgattaaccagaaaaagtaacatttgagactcaagctcgcgtatttcattttttttttcgattttagcacaaaatcaacactcttttgaaaaaaatattcgaaatattcgctcctaatttttaatgatttccagatgacaagctggttcacaccacattctactgaggaaatcgcttcaaataaccgaaatcatcccgacattggctccagcagctccgcaactacctacagctcgtcaaattgtttgaatatggatgaaggaattctgacgagagacgttgaaatggctttcagtgatgaaattccgactttttctaaaaccgtaatttttttaaaaatttcaaaaataacattatatgattttgcagaggacctccaattccgtgtttcctcctgcaccttatcgaaattcgttcagt', 'ttctttacattattcattcattttatcaacaattcgcacttctatcgctctaaagtcgatcaaaaaagcttatcagcaactgccgtcgagtgaaatgcgatacaatttgtctgtgaaaaaacaaatatcgattttccttgaatatcgtgaagaacaaatcttcatacttacgattattcacaaattcgatgaaaatctgatagttttttccaatttcagctctataattccaaaaaaaaaccttctgttaattttcgcgaaatgtcgtcaattaaaatgcgcgctgtgcagtacgcaacggtgtacgaaactttaaactgagtattcaacgttgagcacgcagtcagccaaattggaatacggtagagaattctcgtccatgtaccccccgccgggccgtg', 'gctgatattgacgaattgtcgtcgaataaaatgcgcgtacgcaaagtacgcaatagtgtacaaaacttattcgcgttgaatattcagtcagccaaatgggactactgtagaaatttcttcggccacgt', 'cacggtggttcaatcgctagatggaggcaaacacgctctattgcgaacaattgtttaaatttcaagcctttcactcggcatcgattctctccgttttttgccgattttaaccgtttttccgtatcttttttgttatttttttttgtaaaatcgaaaaagtgtgtcttgtttcaggtagtaagcgaaaaaattcgaggaaaacaaaggagcgtcggcgtgggcagcaaggaaaacgaattaagtgagttttttgacgagttctcttaaaaaatcacaaaatattttatttttttgattgaaaaagagagagctaaaaattctgaacaagatacgggccgaattttttagaaatcgtaccttcttcattaatttttttttcgaaactaaaattttgcactgaaaatggaggaaaaatgatttttttcgaaaaaaatcacaaaatgttatatttttattgccgaattagaatcagctaaaaaatctctacaagataaaggcaaacatatacacgaatcgtacatttttcacttaaaattttccaaaagtaaattattttcactgaaaatagtttttaaaaacgttccctttttcaagaaaatcaaaaactctttacgatttatagttgttttatgatcaaaatttaattgtgtataagatggaactcaattcatagagaaatcttaatttttcattaaaattttttcgaaaaaatttttttttgtaataagaatttataaaataattttttttgccaaaaaaatcacaaaatgtgttagttttttgcttcaaagagaaagaactgtttattttaagcaagatacagagagaattttttagaaatcgtacattcttcaagaaaattttttcgaaactaaaattttgcactgaaaatggaggaaaaatgatttttttcgaaaaaaatcacaaaatgttttatttttattgccgaattagaatcagctaaaaattctctacaagataaaggcaaacatatacacaaatcgtacatttttcacttaaaattttccaaaagtaaatttttttcactgaaaattgaggaaaaaatttttttttcagatgcctcctccatcaaatcaatcagcatggtcttcaaattttttaaatgctcggaaacgacgatggaaaaaagatgaagaagacgaaaattcaaatactgaaagaaatgcttccctggacctaaacctgtcgaatgtagaattgacagaagctcttgacagaaacggagatgcgccgatcactcctgagagaagacgagtgtctccatttgtggattttggtgataatctggatgtggagtcgaaacgcaatattgatttgaatctgacaataacacctgaaactgatattgtaagtgaattaatccatgctacaactcggatcctaagacgattataatattttaaaaataattttcccagccgttcgacacacaaccatcaacttcttctattgcggaatcttcttcatcacctccaagaacaccacactcttttgaaagatcgtcaagtccactctattcatcgagtgacagcgtcggcgtattttcgaatgagacaccatatgtgaagggaattagagaagacttgagtagagaacgaatgagattgctagaagttgattccattttggatagcatatcagaggtagatttatttctttgttgttgattgtttattcaattttccatttcagaaccttgacgccatcctgtccagctcaaaccattcggtccagctcaatccaagaacagacattgacgaagagacacttctaaatcggttgatggagagagacgacgagtttgccgttttttctgatgaatttttcatgcaaaggaaagagataatttcaataagaaaaacatcaaaacggaaaaccaacgaaaatctgatattaagaaaacaaatcgctcaaattacgtcaagggcgaaaaagtacagaataatatggatatcagagttgcgaagaaggaatcgttctcaaagccgaaacgttgtcacgcttcccaagtatatcatcgaatggagcaagctcgaaaataagagctctggaaaatctaggatagatttggcactggatttgcttcaacgaatcagtcgagaagaggatattcgtaatttcatgattgatttccagaaatatctccaacaaaatgattcgcagtccttccaagtccaattgacgcattggcaaacgattatttttcaagaaaaatgtagattctctaacaatcaattgcggcgggtgaagcaatactataagatgttcaccgatcttgaaatcatgccaactattaatttgacgatgcaactgaaaacgagaatgtcaactatcgacaattataaggccacaacatatatatcaaaaggagaaaaggtgaccgttgtcaaaattatcgatgtcgagaaaagtgtcattccgaagctggagcgtttgtctgtatcaaaacaattacgccatgattcctataccgatggcaaaattgttatcggaattggtggagattcgggaggaggaacgacgaaactttgtcttctgatcggaaattgtgatcatgcgaactcgccgcaccgaattgtccttctcgctgtttttgatgactctgattcgcgggaacttatcatggcctacctgtctgatcttatcgtgaagatcaacaacttcaccagcatcacctatatggaagatggaagggtggtgacacgtggcgttgtccaaaaagtagtcggtgattttaagtttacatgtgatttactgtcacataaaaaacaatgtgccacctatttttgcccgttttgtttcgaaaccaatccaagaggaggactgatgcaaaacttaaagatctgaacttgcaaaagatatatttcttgagaacaatgaattcttacaagctcaattctaaatacggtagctttggtgtcagatgcggaagtggaccaattcttcaaaatgtcaagttggaacactacttgcctgcaatgctacacttgattgtcggactgttcacgaaatacatctttgagccgatttggatggccgttgtgagtttagacaataaaacatcatttgaaattcgaagaaataaagaagagacaaaaagagttgccgatttgaagattcaagacgcgaacaaaaaattcgaagcttcaccacttaagagaaaaagagagatgaaagcacagtttactgctctgaaagaagaaaaagtgttattggacgagacgctggatggacttgctggtggatatcttaaacaatttgagaatgatctggaagaagttggtgcaacacgccgagcgtggtttcaaatgtacactggagctgattttgtctgagaaagctgattttgtctgagaaaggtgtcacagctgcattcaaaaatctgaaaaaccacatgactccgatgcttctgaatgtcaagaacgcgatgagcaggctgtccaaaatcatgtcattgtcggccaatcgattgctctctgacgatgacatcagcgaattggatgagtcgatgaaggaattcgttgaattcctacaagctgcccacccagaagaatcaatcactcaaaaattgcatgtcctggttgctcacgtagtagaagtcgcaaaaacggaaaggaatttgggaaggctttcggagcaaggaatcgaatcgcttcatgccgttttcaatcgcctcgaaagacgcttccactcggttagaaacacagggaaaagatacctgtacattgctaaggagctgtcatgtagcaatctaatttctgatatggaggaagtaagtggaacctttcaaaaaaacaaatcgtaacattttctttgcaggacgcctcgtcttcccaataaaccacccagcgctctccctcacacaattctgaaccttttcgcaattaaaatttcgggatttcctcaatatcataaaattcgggattttctatggttagttgtgcttaaaattattaaaaaattattaatttcattaaatttcaggtaaaaacattgacaagcagatactcggtgcaaatttgatggatttctcatcaatattttgtcgaggatttcataagcggagatgctccacatggtttctttcaacttggaaaattgtcgatgtgacaccttaggattctatcgaaaagctaaaattcaggatttccttcggtttgttgggattaaaattattcgaaaataattgattttatgaaatttcagattgaaaccttgtcaatgagctactcagcaaaatttcgatggatttatctcaaaaagttcaccaggagtcgaggcaattaagaaatttcaccgggtttccttattttcgaaagttttaccaagaaaaatttcaacttgcatccgtaaagcgaagaaggaggctgattccacatattaaattcgttttttgaatcgatttgtatgtatttttcaattttcaatcttccgtttcgattatttctttgttttactgttcaaaattaatttttctgtgtttcaataattcaattctcaattgttagctactaaattgaataataaaatctattttttatttttttttcaccttgaaactattaattctgccttttttctgctaaaatttgcttcaaaaatctatttgccgctgcaatgttttgcggactacggtacgcaagtacgcaaacaccgcgatgacacattgcggaccatttcgctgcgtacctgcgagatctttctcaaattttacgagagatctagtttttgtgataccgtg', 'atgtcgtcaactaaaatgccgcggtacacgtccgcagtcggtgaacaaaacttttcgttgaatactcagtcagccaaatttaactactgtagaaatttctccccacacgtcgcgattgccgctccgtg', 'ttgtttaaaataagtttccttttttttgaatacgcaaagaacttgatttttccaaaaaaaaaaattgttttcgaatttttatgataaaaaaaaatttttt', 'cacggcacggcgaacgggacatggcctagaaagttgcgaaaactagtcccgcggctgcaatgctttattttttcatttggttgcggtgggtttttcgatgttttttcgtgtttttttaatgttttttcgtcgttttttcttgttttcttcgttaataattgaatttaaaataatattttcagcaaaaggccacaaatcgccgcaaaaatcgaccgcgtgagccgcgaaacggcggaaaacgtctaaaagtgagttgtttttctaaatatctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttctgaattagaaagatttgagactcaagctcgcgtatttcaaatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaaattgttttctcaaaaatttgacgcttattttttttgaaactgcactcaaatatgaagaaaattttgattaaccggaaaaagtaacatttgagacccaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcattttcgaacaaaaaattcattttctcaaaaattcaacgcttattttttttgaaactgcactcaaatatgaagaaaaacatgcttgttctgaattagtaagaattgcgactcaagctcgcgtatctcatatttatttgttcgatttcagtggaaaatcaacacatttttgggaaaaattttttttcgaaa', 'ttctaatttttaatgattttcagatgacacgccggttcacaccacattctactgaggagatcgcttcaaataaccgaaaccatcccgacattggctcca', 'ctctatattattcattcattttatcaacaattcgcacttctatcgccctaacgtcgatcaaaaaagctcatcagcaactgccgtcgagtgaaatgcgatagaatttgtctgtgaataaccaaatatcgattttccttgtatatcgtgaagaacaaatcttcatatttacgattcttcacaaattcgatgaaaatctgatagttttttcaatttcagctctataattccaaaaaaaatcttttgtcaatttcgcgaaatgtcgtcaattaaaatgcgcgctgtgcagtacgcaacggtgtacaaaactttaaactgagtattcaacgttgagcacgcagtcagccaaattggaatacggtagagaattctcgtctatgtaccccccgccgggccgtg', 'gctgctgcagtgttttggcgactacggtacgctgttacgcaaaccgcgcaatgacaacatt', 'attgaacagggcatgaaaggattcaatgccctgctccgatgatcttccc', 'cacggcccggcgaaagagacgtggccgcgagagctgcgccggctaggccaccgcctcctatggttaagatttttgaacggataaaaatttttaatttggctgctaagctcatttatcttcgttttttctcgttttttctcatttttatcgataaaaatatattttttgttgcagaaaatcaaaaaaccacgacaaaacagcactcaaccgccaactgggaggaggaaaatccgaaaaaagtgagtttttt', 'tgcgaaaaactgtttaaagtatcgattttcttagtaaatatcagcatcataaaattatttaaaattattattttctgtaaattcgataaaaatccatttatttttcacaatttctgcccgaaaattaataaccagcgtttctataactaagaaggtgtcgtcaattaaaatgccgcgtccgcaaaatgtcgtacgaaacttttcgctgagtatcaaacgttgaatattcagtcagccaaattttactacggtagagattttacagccacgtacggttcgccgggccgtg', 'gttcgttcagctcaattttgtctactttaattaagttggagtagtttgttctaacttcaaaaacgtgattttcagacgtgcagctcgtcgccacaggctaatcaggaaatcagttcagaagatcgaaactcccaccatcccgacattgcttccagtagctccgcaactacatgcaggtcgtcaaattgtttcaatatggatgaaggaattctgacgagagacgtggaaatggattttagtgacgaaattccgactttttctcacaatgtaattttttttcaatatttcaaaactaatattatatgattttgcagagattctccaatcccgtgtttccaccgacaccttcacgaaactcgttcagtacaccgagggttcgaataatctccaatgttgagcaaattccgcagatcaaaatgctccgactaacgatcgaacgactgaaaaatgaaaaaatgaaatatcagaagaaatacaaaagcggaatggtgagtttgcggtgttttcggaagaaatactaatagtccgccaagaaaatatcactgtaagagcatcgaattcttcgctgcaaatggaagtatgcaatatgcgaacgatcgtcagtggtatagaaggtcgattggtgagaagctgggcagatgtgaagaaaagtggcaaagctatggttctgagtctacaaagacacgccgaagaaattgaaaacctcaaaaaatcagcttttaaagttaaattttcgaatctaagaaacaaaagttgtatcaaaaatcgatttcgcaaagctcttcacttgcttcaacacgaaatctgcctcgaggatgacgtcggacaatttattcgaaaattcacaaaattcctgaattcagaagagaatcaaagctacaaaaacaagctttcaaactttgaggccgttgactttttatcaaaatgcggattatctctaagtcagatggaaaaaatcaaaaactatttgactaattccattggctatgatttgcttccattggtaaagaacacaagagatttgtcaatccagttatcaatgatctccagtttcaaagtttccacttcgttcgacaataaaggaaaagtgattaccattgtgcagtgcatgaaaattgcagaagttttagcctatcgaatagagcttctatgcaattcaaatcaatttgtggacgatggctacacgaaaggagtaatcaagattggagtcgttggtgacgctggaggtggaagcacaaagttggcattggtaatcggaatgtatctcgaccgaattcttcgagacacgttatggtaattgcagtatatgacggatccgacaattactcgtgtctgaaaaaattcattcccgacgttcttgagcagcttggcaagctgacaaaaattcgatatttggacaaaggagttcaaaaaaccgcaaaaattgtgcaaattgcaactggagattgcaagtttcaatttgatattcttggacaccaaggacattcatcccacaatttctgcttcaagtgttttgcccagaatccacggggagctgaagagaggatgaaaatcaaggatatgaatgttgatgaaacattccacccacgaacaatcgatctttacaaagcttgttgtactccacttctgccacacgttgcgatcatgttctatttgattcctcttctgcacattatcatgggtatttttgacaaatatatcttcaatcccctctggaaatattctgtcactttggataacactacttgtttccctatcctgaaaacgcgaaaagcaactctaaaaaacgctgttaacctaatcaaaagtgcggaagaaaagtatcaagcagccaccggaaagatgaaattggagtcgcatgctgaactgaaagcacttcaatcggaaaaattgttgctcgacacaattgtgaatggtactcccggaggaacacttgaaaaaatggaacagtgttgggctaagtttggagcggacaagcaggcttggttccaatcattttgcggaaaccatttgaaattgctgctaacgccggctattgtagaggagactttcaacatttttggcccaaatttatgcccaatgttgctcggattgaaatcagcaatgggaaagctttcaactattatgtcactttcgggaaacaagtttttaaatgattccgacgtcttaatacttcaaaattcgattcgcggatttgtggaggatttgaaagtagcggttcccgaagaaacaattattttaaagctgcacttggtggtctaccatgcaccacaaatggcaaaagatgtgagaaacattggaaggatcacagaaccaggagttgaatctgtccacgcgattttcaatgcacttgagaggagattttgcaattaccgggacaagaaaaggcgttacatccatgttctgagagagctcatgtgtcgtaacatcatgaatgacatgactatggtaatttttgtacagaaaaaaatttctgcattcttaaaaacacaactatttcagaacacgtccattcccggtctttccatcgtaccaaaagatgcaacagttctgccaatttcttcgcaaaatgatcttcgggacccaaaagtagttattcgggacgtcacggctgcccaaaaaagaaaaaatttagcgaaaaaaaaaatcaattttagctccaagaaaaggcttccacaatcgacaaaatctcgaaatgctgtggtgagatttttatttttcacgaaaagtttcaaattttgaacattttcagcccgacttctcacaaaacgtacaaacttctattgcatttggaggcaattcaacgaaaattggattccgtggtctaccgagcaatgctctttctcgtccgaacccgatggtaggttcactgctttaaatcaactcaaaattgaaatcaaaaattatttcaggcaatcacacaaacttgctcgacaagccagccaatcactagaagccatccacttgttgctccaccagtatcttatcaacttccgaaatgcagtaatagtttagctgaagtgagtgtaaaggtgaaatgacaatttcaataagttaaaaattttagaaaattcgagcaagttgtgggcaagcaagaaggagtgctgcagaatcttcttcgactgaaagccacgtggacacaaattcgaagaagaaagtttacagactaaaattcgtccaaaaaagtcgtgaaattccgaagaaatagtaatatatcgattttctgttttttttattatgatttctaatatttttattttaatgcttaatttaaaatgaccgaaaataaagaattgattctcaaaaagtaatctcagagcaatatatgttgtttctttatataatttatccattttatcaacaaatcgcatttttatcgctcaaaaatcgataaaaacagctcatcagaaactttcatcgagtaaaatgagacatcattcgtctgtgaaaagctaaatatcgattttccttgaatatagtgaagaaaagatcttcatacttacgatttttcacaaattcgatgaaaatttgatagttttttttcaatttcagctctataattccgaaaaaaatcttctgataattttcgcgaaatgtcgtcaattaaaatgcgcgctgtgcagtacgcaacggtgtacgaaactttaaactgagtattcaacgttgagcacgcagtcagccaaattggaatacggtagagaattctcgtccatgtaccccccgccgggccgtg', 'ttgttcggtggatcgcgtttgcacccatctagcaactgaaccacagtg', 'GAAGCCAATGTCGGGATGGTTTCGGTTATTTGAAGCGATTTCCTTAGTAGAATGTGGTGTGAACCGGCGtgccatctggaaatc', 'tgccatctggaaatccttaaaaatttggtgcgaatatttcgaaaaaaagttttccaaaaatgtgttgattttccactaaaatcgaaaaaataaatatgaaatacgcgagcttgagtctcaattcttactaattcagaacaagcatttttttctccatattcgagtgcagtttcaaaaaaattaaacgttgaatttttgagaaaataatttttttgttcgaaaatgtgttgattttccactaacataaaatcgaaaaagt', 'cacggtggttcagttgctagatgggtgcaaacgcgctccaccgaataa', 'gtcttaatacttcaaaattcgattcgcggatttgtggaggatttgaaagtagctgttcccgaagaaacaattattttaaagctgcacttgttgatctaccatgcaccacaaatgacaaaagatctgagaaacattggaaagatcacagaacaaggagttgaatctgtccacgcgattttcaatgcacttgagaggagattttgcaattaccgggacaagaaaagacgttacatccatgttctgagagagctcatgtgtcgtaacatcatgaatggtaattttggtacagaaaaacatttctgcattcttaaaaacacaactatttcagaacacgtccattcccggtctttccatcgtaccaaaatatgcaacagttctgccaatttcttcgcaaaatgatcttcgggacccaaaagcagttatacgggacgtcacggctgcccaaaaaagaaaaaatttagcgaaaaaaaaaaatcaattttcgctccaagaaaaggcttccacaatcgacaaaatctcgaaatgctgtggtgagatttttatttttcacgaaaagttttaaattttgaaaattttcagcccgacttctcacaaaacgtacaaacttctatttcatttggaggcaattcgacgaaaattggattccgtggtctaccgagcaatgctctttctcgtccgaacccgatggtacgttcactgctttaaatccactcaaaattgaaatcaaaaattatttcagccaatcacacaaacttgctcaacaagccagcaaatcactagaagccatccacttgttgctccaccagtatcttatcaacttccgaaatgcagtaatagtttagctgaagtgagtgtaaacgcgaaatgacaatttcaataagttaaaaattttagaaaattcgagcgagttgtgggcaagcaagaaggagtgctgcagaatcttcttcgactcaaagccacgtggacacaaattcgaagaagaaagtttacagactaaaattcgttcaaaaaagtcgtgaaattccgaagaaatagtaatatatcgattttctgtttaaatatttatgatttctaatatttttattttaatgcttcatttaaaatgaccgaaaaataaagaatagattttcaaaaagtaatctcagagcaatatatgttgtttctttatataatttatccattttatcaacaaatcgcatttttatccctcaaaaatcgataaaaacagctca', 'ttgttcggtggagcgcgtttgcacccatttagcaactgaaccaccgtg', 'actacggtgtgcaagtacgcaaacaccgcggcggcaatttgc', 'ttcttcaaaaaaacttcttcgaaattcaaattttgcaccaaaaa', 'ttgttcggtggagcgcgtttgcacctatttaacaactgaaccaccgtg', 'tattaattgctaaaatttatgtggactacggtagtcaagtccgcaaacaccacg', 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcaattaata', 'aaattatacacgtttgttcggtggagcgagtttgcttccatctagcaactgaaccaccgtg', 'gtgtgcaacttgccgccgcggtgtttgcgtacttgcacaccgtagt']
Look at the repeat family/class names for the first several repeats in the roundworm database:
from operator import attrgetter
' '.join(map(attrgetter('rep_cl'), reps[:60]))
'Simple_repeat Simple_repeat Satellite Simple_repeat DNA/MULE-MuDR DNA DNA DNA Simple_repeat Unknown Unknown DNA/PiggyBac? DNA DNA DNA DNA DNA DNA DNA DNA/TcMar-Tc1? DNA/TcMar-Tc1? DNA DNA Simple_repeat DNA/hAT DNA/hAT DNA/MULE-MuDR DNA/hAT DNA/TcMar-Tc1? DNA/TcMar-Tc1? DNA/TcMar-Tc1? DNA/hAT DNA/MULE-MuDR Simple_repeat DNA/hAT Unknown Unknown Unknown DNA DNA DNA/CMC-Chapaev DNA/CMC-Chapaev DNA/CMC-Chapaev RC/Helitron Simple_repeat DNA DNA/TcMar-Tc1? DNA/TcMar-Tc1? DNA/TcMar-Tc1? DNA/TcMar-Tc1? DNA/TcMar-Pogo Simple_repeat DNA Simple_repeat DNA DNA DNA/TcMar-Tc1 DNA/TcMar-Tc1 DNA/hAT Simple_repeat'
You'll notice a few things. (1) The family names seem to have some hierarchical relationships; e.g. DNA/TcMar-Tc1
seems to be more specific than DNA
, (2) some of them end in a question mark, (3) some of them are Unknown
. I don't really know what these mean or what to do as a result -- you'll have to navigate that issue. Seems like you can often look up the family names on RepeatMasker's site and find more detailed info (e.g., here are the details for DNA/TcMar-Tc1).