PyRNA allows you to construct easily DNA and RNA molecules. An RNA molecule will automatically convert T residues into U.
from pyrna.features import DNA, RNA
rna = RNA(name = 'my_rna', sequence = 'AGGGGATTAACCCC')
print "%s: %s"%(rna.name, rna.sequence)
dna = DNA(name = 'my_dna', sequence = 'GGTTGGATTAACCCC')
print "%s: %s"%(dna.name, dna.sequence)
my_rna: AGGGGAUUAACCCC my_dna: GGTTGGATTAACCCC
RNA and DNA molecules can return their length, are slicable and iterable:
print "slice: %s"%rna[0:2]
print "length: %i"%len(rna)
slice: AG length: 14
You can easily get a single residue:
print rna[3]
G
The sequence can be easily changed by adding a new string at the end:
rna +'AAA'
print rna.sequence
AGGGGAUUAACCCCAAA
Or by removing some residues from the end:
rna-3
print rna.sequence
AGGGGAUUAACCCC
An RNA molecule is iterable over its primary sequence:
for index, residue in enumerate(rna):
print "residue n%i: %s"%(index+1, residue)
residue n1: A residue n2: G residue n3: G residue n4: G residue n5: G residue n6: A residue n7: U residue n8: U residue n9: A residue n10: A residue n11: C residue n12: C residue n13: C residue n14: C
With PyRNA, an object pyrna.features.TertiaryStructure is made with a single molecular chain. Since a PDB file can contains several molecules, the function parse_pdb() returns a list of such objects.
h = open('../data/1ehz.pdb')
pdb_content = h.read()
h.close()
from pyrna.parsers import parse_pdb
tertiary_structures = parse_pdb(pdb_content)
RNA molecules extracted from PDB files can contain modified residues. PyRNA converts them automatically into unmodified residues, and stores the modification in a dictionary.
for ts in tertiary_structures:
print ts.rna.name
print ts.rna.sequence
print ts.rna.modified_residues
A GCGGAUUUAGCUCAGUUGGGAGAGCGCCAGACUGAAGAUCUGGAGGUCCUGUGUUCGAUCCACAGAAUUCGCACCA [('2MG', 10), ('H2U', 16), ('H2U', 17), ('M2G', 26), ('OMC', 32), ('OMG', 34), ('YYG', 37), ('PSU', 39), ('5MC', 40), ('7MG', 46), ('5MC', 49), ('5MU', 54), ('PSU', 55), ('1MA', 58)]
If you want to parse a FASTA file, you have to precise the type of molecules stored. DNA molecules are faster to create since PyRNA will not try to identify modified residues.
h = open('../data/telomerases.fasta')
fasta_content = h.read()
h.close()
from pyrna.parsers import parse_fasta
#the default type is RNA
for rna in parse_fasta(fasta_content):
print "sequence of %s:"%rna.name
print "%s\n"%rna.sequence
sequence of telomerase 1: AGUUUCUCGAUAAUUGAUCUGUAGAAUCUGUCAAGCAAAACCCCAAAACCUUACACUGAGAGCAUUUAGCCUGAUUACUCUUUAAAUCAAAUCAGGCAAUAGAGAGAAACUCGAGAGGUGAAAACCCCACAGCAUUCUGAAAUGUAUUUGGGAGUAAUCUCAUAUUAGUUUGCUGUCCUCUCAUCUUUU sequence of telomerase 2: AUCCCCGCAAAUUCAUUCUGUUUGCAUUCAAACAGUCAUUCAACCCCAAAAAUCUAGACCAAAUAUUGUCUUCCCUUCUUGGCACAAACAAAGAAGAGACGCGGGAUAAAGAUACUCCGACGAUUGAUACAAUAUUUAUCAACGGGAGGUCUUACUUUU sequence of telomerase 3: UACCUCCUGUGGAUCCAUUCAGGAUUAAUGAAAUCCUGUCAUUCAACCCCAAAAAUCUUGUCAAAUUAUUGCCUCGUCUUUUGGGCACAAACAAAAGUCACGCAGGAGGUUCAGACAUUCGACAUAAGAUACACUAUUUAUCUUAUGGAAGGUCUAGUUUUU
An object RNA will automatically convert T residues into U.
h = open('../data/ft3100_from_FANTOM3_project.fasta')
fasta_content = h.read()
h.close()
for dna in parse_fasta(fasta_content, 'DNA'):
print "sequence as a DNA:"
print "%s\n"%dna.sequence
for rna in parse_fasta(fasta_content):
print "sequence as an RNA:"
print rna.sequence
sequence as a DNA: TAACAATCTGCTGAAAGGTACCGTCGGAGGGAGCTTTGTTGCCAGCGCCAGAAACGCCGGTTTAACCAGCGCCGAAGTGAGCGCAGTGATTAAAGCCATGCAGTGGCAAATGGATTTCCGCAAACTGAAAAAAGGCGATGAATTTGCGGT sequence as an RNA: UAACAAUCUGCUGAAAGGUACCGUCGGAGGGAGCUUUGUUGCCAGCGCCAGAAACGCCGGUUUAACCAGCGCCGAAGUGAGCGCAGUGAUUAAAGCCAUGCAGUGGCAAAUGGAUUUCCGCAAACUGAAAAAAGGCGAUGAAUUUGCGGU
DNA and RNA objects have a rich textual representation in Jupyter notebooks.
parse_fasta(fasta_content)[0]
1 UAACAAUCUGCUGAAAGGUACCGUCGGAGGGAGCUUUGUUGCCAGCGCCAGAAACGCCGG 61 UUUAACCAGCGCCGAAGUGAGCGCAGUGAUUAAAGCCAUGCAGUGGCAAAUGGAUUUCCG 121 CAAACUGAAAAAAGGCGAUGAAUUUGCGGU
You can load 3D structures directly from the Protein Databank
from pyrna.db import PDB
pdb = PDB()
pdb_content = pdb.get_entry('1GID')
With PyRNA, a pyrna.features.TertiaryStructure object is made with a single molecular chain. Since a PDB file can contains several molecules, the function parse_pdb returns a list of pyrna.features.TertiaryStructure.
from pyrna.parsers import parse_pdb
for tertiary_structure in parse_pdb(pdb_content):
print "molecular chain %s: %s"%(tertiary_structure.rna.name, tertiary_structure.rna.sequence)
molecular chain A: GAAUUGCGGGAAAGGGGUCAACAGCCGUUCAGUACCAAGUCUCAGGGGAAACUUUGAGAUGGCCUUGCAAAGGGUAUGGUAAUAAGCUGACGGACAUGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACAGAUCUUCUGUUGAUAUGGAUGCAGUUC molecular chain B: GAAUUGCGGGAAAGGGGUCAACAGCCGUUCAGUACCAAGUCUCAGGGGAAACUUUGAGAUGGCCUUGCAAAGGGUAUGGUAAUAAGCUGACGGACAUGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACAGAUCUUCUGUUGAUAUGGAUGCAGUUC