#!/usr/bin/env python # coding: utf-8 # # Pretty-printing DNA and protein sequences with `monoseq` # [`monoseq`](https://github.com/martijnvermaat/monoseq/) is a Python library for pretty-printing DNA and protein sequences using a monospace font. It also provides a simple command line interface. # # Sequences are pretty-printed in the traditional way using blocks of letters where each line is prefixed with the sequence position. User-specified regions are highlighted and the output format can be HTML or plaintext with optional styling using ANSI escape codes for use in a terminal. # # Here we show how `monoseq` can be used in the IPython Notebook environment. See the `monoseq` [documentation](https://monoseq.readthedocs.org/) for more. # # **Note:** Some applications (e.g., GitHub) will not show the annotation styling in this notebook. [View this notebook on nbviewer](http://nbviewer.ipython.org/github/martijnvermaat/monoseq/blob/master/doc/monoseq.ipynb) to see all styling. # ## Use in the IPython Notebook # If you haven't already done so, install `monoseq` using `pip`. # # pip install monoseq # # The `monoseq.ipynb` module provides `Seq`, a convenience wrapper around `monoseq.pprint_sequence` providing easy printing of sequence strings in an IPython Notebook. # In[1]: from monoseq.ipynb import Seq s = ('cgcactcaaaacaaaggaagaccgtcctcgactgcagaggaagcaggaagctgtc' 'ggcccagctctgagcccagctgctggagccccgagcagcggcatggagtccgtgg' 'ccctgtacagctttcaggctacagagagcgacgagctggccttcaacaagggaga' 'cacactcaagatcctgaacatggaggatgaccagaactggtacaaggccgagctc' 'cggggtgtcgagggatttattcccaagaactacatccgcgtcaag') Seq(s) # ### Block and line lengths # We can change the number of characters per block and the number of blocks per line. # In[2]: Seq(s, block_length=8, blocks_per_line=8) # ### Annotations # Let's say we want to highlight two subsequences because they are conserved between species. We define each region as a tuple *start,stop* (zero-based, stop not included) and include this in the *annotation* argument. # In[3]: conserved = [(11, 37), (222, 247)] Seq(s, annotations=[conserved]) # As a contrived example to show several levels of annotation, let's also annotate every 12th character and the middle third of the sequence. # In[4]: twelves = [(p, p + 1) for p in range(11, len(s), 12)] middle = [(len(s) / 3, len(s) / 3 * 2)] Seq(s, annotations=[conserved, twelves, middle]) # ### Custom styling # The default CSS that is applied can be overridden with the *style* argument. # In[5]: style = """ {selector} {{ background: beige; color: gray }} {selector} .monoseq-margin {{ font-style: italic; color: green }} {selector} .monoseq-annotation-0 {{ color: blue; font-weight: bold }} """ Seq(s, style=style, annotations=[conserved]) # See the string in `monoseq.ipynb.DEFAULT_STYLE` for a longer example.