1.1 Reading An Introduction to Applied Bioinformatics [edit]

Bioinformatics, as I see it, is the application of the tools of computer science (things like programming languages, algorithms, and databases) to address biological problems (for example, inferring the evolutionary relationship between a group of organisms based on fragments of their genomes, or understanding if or how the community of microorganisms that live in my gut changes if I modify my diet). Bioinformatics is a rapidly growing field, largely in response to the vast increase in the quantity of data that biologists now grapple with. Students from varied disciplines (e.g., biology, computer science, statistics, and biochemistry) and stages of their educational careers (undergraduate, graduate, or postdoctoral) are becoming interested in bioinformatics.

An Introduction to Applied Bioinformatics, or IAB, is a free, open access bioinformatics text available at http://readIAB.org. It introduces readers to the core concepts of bioinformatics in the context of their implementation and application to real-world problems and data. IAB makes extensive use of the scikit-bio Python package, which provides production-ready implementations of core bioinformatics algorithms and data structures. As readers are learning a concept, for example, pairwise sequence alignment, they are presented with its scikit-bio implementation directly in the text. scikit-bio code is well annotated (adhering to the pep8 and numpydoc conventions), so readers can use it to assist with their understanding of the concept. Readers of IAB also therefore learn the concepts in the context of tools they can use to develop their own bioinformatics software and pipelines, enabling them to rapidly get started on their own projects. While some theory is discussed, the focus of IAB is on what readers need to know to be effective, practicing bioinformaticians.

IAB is completely open access, with all software being BSD-licensed, and all text being licensed under Creative Commons Attribution Only (i.e., CC BY-NC-SA 4.0). All development and publication is coordinated under public revision control.

My goal for IAB is for it to make bioinformatics as accessible as possible to students from varied backgrounds, and to get more and diverse people into this hugely exciting field. I'm very interested in hearing from readers and instructors who are using IAB, so get in touch if you have corrections, suggestions for how to improve the content, or any other thoughts or comments on the text. In the spirit of openness, I'd prefer to be contacted via the IAB issue tracker. I'll respond to direct e-mail as well, but I'm often backlogged on e-mail (just ask my students), so e-mail responses are likely to be slower.

I hope you find IAB useful, and that you enjoy reading it!

1.1.1 Who should read IAB? [edit]

IAB is written for scientists, software developers, and students interested in understanding and applying bioinformatics methods, and ultimately in developing their own bioinformatics analysis pipelines or software.

IAB was initially developed for an undergraduate course cross-listed in computer science and biology with no pre-requisites. It therefore assumes little background in biology or computer science, however some basic background is very helpful. For example, an understanding of the roles of and relationship between DNA and protein in a cell, and the ability to read and follow well-annotated python code, are both helpful (but not necessary) to get started.

In the Getting started with Biology and Computer Science sections below I provide some suggestions for other texts that will help you to get started.

1.1.2 How to read IAB [edit]

IAB can be read interactively as a series of Jupyter Notebooks or read statically. Due to popular demand, a print version may ultimately be available for a fee, but the full and most recent version of IAB will always be available for free on the project website. The recommended way to read IAB is interactively as this allows readers to execute code directly in the text. For example, when learning pairwise alignment, users can align sequences provided in IAB (or their own sequences) and modify parameters (or even the algorithm itself) to see how changes affect the resulting alignments.

IAB is constantly being updated. As I teach with it, I will often update text or add new chapters in an effort to keep up with advances in the field. The project website contains the most up-to-date recommendations on how to read IAB or teach with IAB, including strategies for dealing with changing content. (For example, if you're teaching with IAB, you can fork the IAB repository and only pull updates into your fork when you're ready for them. If forking repositories and pulling updates are terms that don't mean anything to you right now, you can safely ignore this!)

IAB is split into four different sections: Getting started, Fundamentals, Applications, and Wrapping up. You should start reading IAB by working through the Getting started and Fundamentals chapters in order. You should then read the Applications chapters and Wrapping up in any order, based on your own interest.

1.1.3 Using Jupyter Notebooks to read IAB interactively [edit]

IAB can be read interactively as a series of Jupyter Notebooks. The main source for information about Jupyter Notebooks is the Jupyter website. You can find information there on how to use Jupyter Notebooks as well as setting up and running a Jupyter Notebook server (for example, if you'd like to make one available to your students).

Most of the code that is used in IAB comes from scikit-bio package, or other Python scientific computing tools. You can access these in the same way that you would in a Python script. For example:

In [1]:
import skbio
from IPython.core import page
page.page = print

We can then access functions, variables, and classes from these modules.

In [2]:
print(skbio.title)
print(skbio.art)

We'll inspect a lot of source code in IAB as we explore bioinformatics algorithms. If you're ever interested in seeing the source code for some functionality that we're using, you can do that using Jupyter's psource magic.

In [3]:
from skbio.alignment import TabularMSA
%psource TabularMSA.conservation

The documentation for scikit-bio is also very extensive. You can view the documentation for the TabularMSA object, for example, here. These documents will be invaluable for learning how to use the objects.

1.1.4 Reading list [edit]

1.1.4.1 Getting started with Biology [edit]

If you're new to biology, these are some books and resources that will help you get started.

  • The NIH Bookshelf A lot of free biology texts, some obviously better than others.

1.1.4.2 Getting started with Computer Science and programming [edit]

If you're new to Computer Science and programming, these are some books and resources that will help you get started.

  • Software Carpentry Online resources for learning scientific computing skills, and regular in-person workshops all over the world. Taking a Software Carpentry workshop will pay off for biology students interested in a career in research.
  • Practical Computing for Biologists by Steven Haddock and Casey Dunn. A great introduction to many computational skills that are required of modern biologists. I highly recommend this book to all Biology undergraduate and graduate students.
  • The Pragmatic Programmer by Andrew Hunt. A more advanced book on becoming a better programmer. This book is excellent, and I highly recommend it for anyone developing bioinformatics software. You should know how to program and have done some software development before jumping into this.

These are some books that I've enjoyed, that have also helped me think about biological systems. These are generally written for a more popular audience, so should be accessible to any readers of An Introduction to Applied Bioinformatics.

  • Ever Since Darwin by Stephen Jay Gould. This is the first book in a series of collections of short essays.

1.1.5 Need help? [edit]

If you're having issues getting An Introduction to Applied Bioinformatics running on your computer, or you have corrections or suggestions on the content, you should get in touch through the IAB issue tracker. This will generally be much faster than e-mailing the author directly, as there are multiple people who monitor the issue tracker. It also helps us manage our technical support load if we can consolidate all requests and responses in one place.

1.1.6 Contributing and Code of Conduct [edit]

If you're interested in contributing content or features to IAB, you should start by reviewing the project's Code of Conduct and Contributing Guide.

1.1.7 Acknowledgements [edit]

An Introduction to Applied Bioinformatics was funded in part by the Alfred P. Sloan Foundation. Initial prototyping was funded by Arizona's Technology and Research Initiative Fund. The style of the project was inspired by Bayesian Methods for Hackers.

See the repository's contributors page for information on who has contributed to the project.