Notebook
Here is the raw output of the previous steps that can be easily put into the list base_region_boundaries in the main notebook: ('v2', 136, 1868), ('v2.v3', 136, 2232), ('v2.v4', 136, 4051), ('v2.v6', 136, 4932), ('v2.v8', 136, 6426), ('v2.v9', 136, 6791), ('v3', 1916, 2232), ('v3.v4', 1916, 4051), ('v3.v6', 1916, 4932), ('v3.v8', 1916, 6426), ('v3.v9', 1916, 6791), ('v4', 2263, 4051), ('v4.v6', 2263, 4932), ('v4.v8', 2263, 6426), ('v4.v9', 2263, 6791), ('v6', 4653, 4932), ('v6.v8', 4653, 6426), ('v6.v9', 4653, 6791), ('v9', 6450, 6791), ('full.length', 0, 7682), ('v2.150', 136, 702), ('v2.250', 136, 1752), ('v2.v3.150', 136, 702), ('v2.v3.250', 136, 1752), ('v2.v3.400', 136, 2036), ('v2.v4.150', 136, 702), ('v2.v4.250', 136, 1752), ('v2.v4.400', 136, 2036), ('v2.v6.150', 136, 702), ('v2.v6.250', 136, 1752), ('v2.v6.400', 136, 2036), ('v2.v8.150', 136, 702), ('v2.v8.250', 136, 1752), ('v2.v8.400', 136, 2036), ('v2.v9.150', 136, 702), ('v2.v9.250', 136, 1752), ('v2.v9.400', 136, 2036), ('v3.v4.150', 1916, 2235), ('v3.v4.250', 1916, 2493), ('v3.v4.400', 1916, 4014), ('v3.v6.150', 1916, 2235), ('v3.v6.250', 1916, 2493), ('v3.v6.400', 1916, 4014), ('v3.v8.150', 1916, 2235), ('v3.v8.250', 1916, 2493), ('v3.v8.400', 1916, 4014), ('v3.v9.150', 1916, 2235), ('v3.v9.250', 1916, 2493), ('v3.v9.400', 1916, 4014), ('v4.150', 2263, 3794), ('v4.250', 2263, 4046), ('v4.v6.150', 2263, 3794), ('v4.v6.250', 2263, 4046), ('v4.v6.400', 2263, 4574), ('v4.v8.150', 2263, 3794), ('v4.v8.250', 2263, 4046), ('v4.v8.400', 2263, 4574), ('v4.v9.150', 2263, 3794), ('v4.v9.250', 2263, 4046), ('v4.v9.400', 2263, 4574), ('v6.v8.150', 4653, 5085), ('v6.v8.250', 4653, 5903), ('v6.v8.400', 4653, 6419), ('v6.v9.150', 4653, 5085), ('v6.v9.250', 4653, 5903), ('v6.v9.400', 4653, 6419), But there are some duplicate regions here, such as v2.150 and v2.v3.150. These can be manually removed to get this: ('v2', 136, 1868), ('v2.v3', 136, 2232), ('v2.v4', 136, 4051), ('v2.v6', 136, 4932), ('v2.v8', 136, 6426), ('v2.v9', 136, 6791), ('v3', 1916, 2232), ('v3.v4', 1916, 4051), ('v3.v6', 1916, 4932), ('v3.v8', 1916, 6426), ('v3.v9', 1916, 6791), ('v4', 2263, 4051), ('v4.v6', 2263, 4932), ('v4.v8', 2263, 6426), ('v4.v9', 2263, 6791), ('v6', 4653, 4932), ('v6.v8', 4653, 6426), ('v6.v9', 4653, 6791), ('v9', 6450, 6791), ('full.length', 0, 7682), ('v2.150', 136, 702), ('v2.250', 136, 1752), ('v2.v3.400', 136, 2036), ('v3.v4.150', 1916, 2235), ('v3.v4.250', 1916, 2493), ('v3.v4.400', 1916, 4014), ('v4.150', 2263, 3794), ('v4.250', 2263, 4046), ('v4.v6.400', 2263, 4574), ('v6.v8.150', 4653, 5085), ('v6.v8.250', 4653, 5903), ('v6.v8.400', 4653, 6419) Which is the same subset of sequences used to define base_region_boundaries (copied from the main demonstration notebook): ('v2', 136, 1868), #27f-338r ('v2.v3', 136, 2232), ('v2.v4', 136, 4051), ('v2.v6', 136, 4932), ('v2.v8', 136, 6426), ('v2.v9', 136, 6791), ('v3', 1916, 2232), #349f-534r ('v3.v4', 1916, 4051), ('v3.v6', 1916, 4932), ('v3.v8', 1916, 6426), ('v3.v9', 1916, 6791), ('v4', 2263, 4051), #515f-806r ('v4.v6', 2263, 4932), ('v4.v8', 2263, 6426), ('v4.v9', 2263, 6791), ('v6', 4653, 4932), #967f-1048r ('v6.v8', 4653, 6426), ('v6.v9', 4653, 6791), ('v9', 6450, 6791), #1391f-1492r ('full.length', 0, 7682), # Start 150, 250, 400 base pair reads ('v2.150', 136, 702), ('v2.250', 136, 1752), ('v2.v3.400', 136, 2036), # Skips reads that are larger than amplicon size ('v3.v4.150', 1916, 2235), ('v3.v4.250', 1916, 2493), ('v3.v4.400', 1916, 4014), ('v4.150', 2263, 3794), ('v4.250', 2263, 4046), ('v4.v6.400', 2263, 4574), ('v6.v8.150', 4653, 5085), ('v6.v8.250', 4653, 5903), ('v6.v8.400', 4653, 6419)