Table of Contents 0. Getting started 0. Reading An Introduction to Applied Bioinformatics 0. Who should read IAB? 0. How to read IAB 0. Using Jupyter Notebooks to read IAB interactively 0. Reading list 0. Getting started with Biology 0. Getting started with Computer Science and programming 0. Philosophy of biology and popular science books 0. Need help? 0. Contributing and Code of Conduct 0. Acknowledgements 0. Biological Information 0. Central Dogma of Molecular Biology 0. Binary and decimal numerical systems 0. Encoding messages in bits 0. Protein sequences are encoded in a base 4 system 0. Quantifying information 0. The genetic code 0. Summary 0. Fundamentals 0. Pairwise sequence alignment 0. What is a sequence alignment? 0. A simple procedure for aligning a pair of sequences 0. Step 1: Create a blank matrix where the rows and columns represent the positions in the sequences. 0. Step 2: Add values to the cells in the matrix. 0. Step 3: Identify the longest diagonals. 0. Step 4: Transcribe some of the possible alignments that arise from this process. 0. Why this simple procedure is too simplistic 0. Differential scoring of matches and mismatches 0. A better approach for global pairwise alignment using the Needleman-Wunsch algorithm 0. Stepwise Needleman-Wunsch alignment 0. Step 1: Create blank matrices. 0. Step 2: Compute $F$ and $T$. 0. Step 3: Transcribe the alignment. 0. Automating Needleman-Wunsch alignment with Python 0. A note on computing $F$ and $T$ 0. Global versus local alignment 0. Smith-Waterman local sequence alignment 0. Step 1: Create blank matrices. 0. Step 2: Compute $F$ and $T$. 0. Step 3: Transcribe the alignment. 0. Automating Smith-Waterman alignment with Python 0. Differential scoring of gaps 0. How long does pairwise sequence alignment take? 0. Comparing implementations of Smith-Waterman 0. Analyzing Smith-Waterman run time as a function of sequence length 0. Conclusions on the scalability of pairwise sequence alignment with Smith-Waterman 0. Sequence homology searching 0. Defining the problem 0. Loading annotated sequences 0. Defining the problem 0. A complete homology search function 0. Reducing the runtime for database searches 0. Heuristic algorithms 0. Random reference sequence selection 0. Composition-based reference sequence collection 0. GC content 0. kmer content 0. Further optimizing composition-based approaches by pre-computing reference database information 0. Determining the statistical significance of a pairwise alignment 0. Metrics of alignment quality 0. False positives, false negatives, p-values, and alpha 0. Interpreting alignment scores in context 0. Exploring the limit of detection of sequence homology searches 0. Generalized dynamic programming for multiple sequence alignment 0. Progressive alignment 0. Building the guide tree 0. Generalization of Needleman-Wunsch (with affine gap scoring) for progressive multiple sequence alignment 0. Putting it all together: progressive multiple sequence alignment 0. Progressive alignment versus iterative alignment 0. Phylogenetic reconstruction 0. Why build phylogenies? 0. How phylogenies are reconstructed 0. Some terminology 0. Simulating evolution 0. A cautionary word about simulations 0. Visualizing trees with ete3 0. Distance-based approaches to phylogenetic reconstruction 0. Distances and distance matrices 0. Alignment-free distances between sequences 0. Alignment-based distances between sequences 0. Jukes-Cantor correction of observed distances between sequences 0. Phylogenetic reconstruction with UPGMA 0. Applying UPGMA from SciPy 0. Understanding the name 0. Phylogenetic reconstruction with neighbor-joining 0. Limitations of distance-based approaches 0. Bootstrap analysis 0. Parsimony-based approaches to phylogenetic reconstruction 0. How many possible phylogenies are there for a given collection of sequences? 0. Statistical approaches to phylogenetic reconstruction 0. Bayesian methods 0. Maximum likelihood methods 0. Rooted versus unrooted trees 0. Acknowledgements 0. Sequence mapping and clustering 0. De novo clustering of sequences by similarity 0. Furthest neighbor clustering 0. Nearest neighbor clustering 0. Centroid clustering 0. Three different definitions of OTUs 0. Comparing properties of our clustering algorithms 0. Reference-based clustering to assist with parallelization 0. Machine learning in bioinformatics (work-in-progress) 0. Defining a classification problem 0. Naive Bayes classifiers 0. Random Forest classifiers 0. Defining a dimensionality reduction problem 0. Applications 0. Studying Microbial Diversity 0. Getting started: the feature table 0. Terminology 0. Measuring alpha diversity 0. Observed species (or Observed OTUs) 0. A limitation of OTU counting 0. Phylogenetic Diversity (PD) 0. Even sampling 0. Measuring beta diversity 0. Distance metrics 0. Bray-Curtis 0. Unweighted UniFrac 0. Even sampling 0. Interpreting distance matrices 0. Distribution plots and comparisons 0. Hierarchical clustering 0. Ordination 0. Polar ordination 0. Determining the most important axes in polar ordination 0. Interpreting ordination plots 0. Tools for using ordination in practice: scikit-bio, pandas, and matplotlib 0. PCoA versus PCA: what's the difference? 0. Are two different analysis approaches giving me the same result? 0. Procrustes analysis 0. Where to go from here 0. Acknowledgements 0. Exercises 0. Local sequence alignment exercises 0. Purpose 0. Background 0. Goals 0. Hints 0. Getting started 0. Question 1 0. Question 2 0. Question 3 0. Question 4 0. More hints 0. Multiple sequence alignment exercises 0. Purpose 0. Goals 0. Hints 0. Functions that you will need to complete the exercise. 0. Question 1 0. Question 2 0. Question 3 0. Question 4 0. Question 5 0. Question 6 0. Question 7 0. Back Matter 0. About the author 0. Glossary 0. Pairwise alignment (noun) 0. kmer (noun)

An Introduction To Applied Bioinformatics [edit]¶