In this example, we use data from the Red Sea metagenome dataset. This particular data was obtained from Songbird's GitHub repository in its data/redsea
folder, and is associated with the following paper:
Thompson, L. R., Williams, G. J., Haroon, M. F., Shibl, A., Larsen, P., Shorenstein, J., ... & Stingl, U. (2017). Metagenomic covariation along densely sampled environmental gradients in the Red Sea. The ISME Journal, 11(1), 138.
A lot has changed since we published these tools in 2019 and 2020! Notably, the pandas (and, as a result, QIIME 2 versions) required by Qurro (version 0.8.0 and higher) and Songbird are now incompatible, as of writing:
Tool | Required pandas version |
Required QIIME 2 version |
---|---|---|
Qurro | >= 1 |
>= 2020.11 |
Songbird | < 1 |
>= 2019.7, <= 2020.6 |
This implies that installing Qurro and Songbird into the same conda environment is not feasible. However, it's possible to install them into separate conda environments; the differentials output by Songbird are still completely compatible with Qurro.
To get around this issue for the purposes of this tutorial, we will run Songbird from within one QIIME 2 conda environment (version 2020.6
) and run Qurro from within another QIIME 2 conda environment (version 2022.2
). (Getting Jupyter and conda to play nicely can be a bit of a pain, but the nb_conda_kernels
package should help make it easier to switch between conda environments within a notebook. That being said, it'll probably be easier to replicate these analyses outside of a Jupyter notebook.)
For the most up-to-date details about how to install and run Songbird, please see its documentation.
This notebook relies on two QIIME 2 conda environments being installed, as discussed above: one containing Songbird, and one containing Qurro. See above for details on the exact versions required.
In this section, we replace the output directory with an empty directory. This just lets us run this notebook multiple times, without any tools complaining about overwriting files.
# Clear the output directory so we can write these files there
!rm -rf output
# Since git doesn't keep track of empty directories, create the output/ directory if it doesn't already exist
# (if it does already exist, -p ensures that an error won't be thrown)
!mkdir -p output
>= 2019.7, <= 2020.6
)¶This should be run from a QIIME 2 conda environment in which Songbird (but not Qurro) is installed.
If you just installed Songbird, it's advised that you run qiime dev refresh-cache
on your system afterwards in order to get QIIME 2 to "find" its QIIME 2 plugin.
In order to use this dataset's BIOM table in QIIME 2, we need to import it as a FeatureTable[Frequency]
QIIME 2 artifact.
!qiime tools import \
--input-path input/redsea.biom \
--output-path output/redsea.biom.qza \
--type FeatureTable[Frequency]
Imported input/redsea.biom as BIOMV210DirFmt to output/redsea.biom.qza
Now, we can run Songbird through QIIME 2 on our imported BIOM table. This produces three output files, but the main one we care about for Qurro is the FeatureData[Differential]
artifact (which will be stored in output/differentials.qza
). This artifact contains feature differentials: as Songbird's documentation puts it, these correspond to "...the ordering of the coefficients within a covariate."
Please see Songbird's documentation for more information about how it works and how its output files are formatted.
These hyperparameters (in particular, epochs
and differential-prior
) were selected based on experimentation with Tensorboard. See Songbird's FAQs for details on how to use Tensorboard and select these sort of hyperparameters for your own datasets (this is important, but the question of how to do this is beyond the scope of this tutorial).
!qiime songbird multinomial \
--i-table output/redsea.biom.qza \
--m-metadata-file input/redsea_metadata.txt \
--p-formula "Depth+Temperature+Salinity+Oxygen+Fluorescence+Nitrate" \
--p-epochs 10000 \
--p-differential-prior 0.5 \
--o-differentials output/differentials.qza \
--o-regression-stats output/regression-stats.qza \
--o-regression-biplot output/regression-biplot.qza
QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment. Saved FeatureData[Differential] to: output/differentials.qza Saved SampleData[SongbirdStats] to: output/regression-stats.qza Saved PCoAResults % Properties('biplot') to: output/regression-biplot.qza
>= 2020.11
)¶At this point, you should switch to a newer QIIME 2 environment with which Qurro will be compatible.
Since our "feature rankings" are the (sorted) feature differentials that Songbird just produced, we'll use the qiime qurro differential-plot
command.
!qiime qurro differential-plot --help
Usage: qiime qurro differential-plot [OPTIONS] Generates an interactive visualization of feature differentials in tandem with a visualization of the log-ratios of selected features' sample abundances. Inputs: --i-ranks ARTIFACT FeatureData[Differential] Feature differentials. [required] --i-table ARTIFACT FeatureTable[Frequency] A BIOM table describing the abundances of the ranked features in samples. Note that empty samples and features will be removed from the Qurro visualization. [required] Parameters: --m-sample-metadata-file METADATA... (multiple Sample metadata. In Qurro visualizations, you can use arguments will sample metadata fields to change the x-axis and colors be merged) in the sample plot. [required] --m-feature-metadata-file METADATA... (multiple Feature metadata (for example, if your features are arguments will ASVs or OTUs, this could be taxonomy). You can use be merged) feature metadata fields to filter features in the rank plot when selecting log-ratios. [optional] --p-extreme-feature-count INTEGER If specified, Qurro will only use this many "extreme" features from both ends of all of the rankings. This is useful when dealing with huge datasets (e.g. with BIOM tables exceeding 1 million entries), for which running Qurro normally might take a long amount of time or crash due to memory limits. Note that the automatic removal of empty samples and features from the table will be done *after* this filtering step. [optional] --p-debug / --p-no-debug If this flag is used, Qurro will output debug messages. Note that you'll also need to use the --verbose option to see these messages. [default: False] Outputs: --o-visualization VISUALIZATION [required] Miscellaneous: --output-dir PATH Output unspecified results to a directory --verbose / --quiet Display verbose output to stdout and/or stderr during execution of this action. Or silence output if execution is successful (silence is golden). --example-data PATH Write example data and exit. --citations Show citations and exit. --help Show this message and exit.
!qiime qurro differential-plot \
--i-ranks output/differentials.qza \
--i-table output/redsea.biom.qza \
--m-sample-metadata-file input/redsea_metadata.txt \
--m-feature-metadata-file input/feature_metadata.txt \
--verbose \
--o-visualization output/qurro_plot_q2.qzv
Saved Visualization to: output/qurro_plot_q2.qzv
That's it! Now, we've created a QZV file (describing a Qurro visualization) at output/qurro_plot_q2.qzv
. You can view this visualization in one of the following ways:
qiime tools view
.We don't need to use Songbird and Qurro through QIIME 2; if you want, you can run these tools outside of QIIME 2. Although this means you don't have access to some of QIIME 2's functionality (e.g. provenance tracking, or artifact semantic types), the results you get should be roughly the same. (We say "roughly" because some of the machine learning methods used by Songbird involve randomness.)
As with the QIIME 2 examples above, Songbird and Qurro are incompatible -- they have conflicting dependencies. We recommend using conda, so that you can install Songbird and Qurro into two separate environments (and switch between these as needed).
This should be run from a conda environment in which Songbird (but not Qurro) is installed.
!songbird multinomial \
--input-biom input/redsea.biom \
--metadata-file input/redsea_metadata.txt \
--formula "Depth+Temperature+Salinity+Oxygen+Fluorescence+Nitrate" \
--epochs 10000 \
--differential-prior 0.5 \
--summary-dir output/
WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/bin/songbird:191: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead. 2022-07-05 21:04:30.548298: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2022-07-05 21:04:30.572787: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1999965000 Hz 2022-07-05 21:04:30.573710: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55feb226f620 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2022-07-05 21:04:30.573769: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/bin/songbird:194: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead. WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/lib/python3.6/site-packages/songbird/multinomial.py:70: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.random.categorical` instead. WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/lib/python3.6/site-packages/songbird/multinomial.py:81: The name tf.random_normal is deprecated. Please use tf.random.normal instead. WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/lib/python3.6/site-packages/songbird/multinomial.py:86: Normal.__init__ (from tensorflow.python.ops.distributions.normal) is deprecated and will be removed after 2019-01-01. Instructions for updating: The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`. WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/lib/python3.6/site-packages/tensorflow_core/python/ops/distributions/normal.py:160: Distribution.__init__ (from tensorflow.python.ops.distributions.distribution) is deprecated and will be removed after 2019-01-01. Instructions for updating: The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`. WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/lib/python3.6/site-packages/songbird/multinomial.py:95: Multinomial.__init__ (from tensorflow.python.ops.distributions.multinomial) is deprecated and will be removed after 2019-01-01. Instructions for updating: The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`. WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/lib/python3.6/site-packages/songbird/multinomial.py:110: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead. WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/lib/python3.6/site-packages/songbird/multinomial.py:116: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead. WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/lib/python3.6/site-packages/tensorflow_core/python/ops/clip_ops.py:301: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/lib/python3.6/site-packages/songbird/multinomial.py:124: The name tf.summary.histogram is deprecated. Please use tf.compat.v1.summary.histogram instead. WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/lib/python3.6/site-packages/songbird/multinomial.py:125: The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead. WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/lib/python3.6/site-packages/songbird/multinomial.py:127: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead. WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/lib/python3.6/site-packages/songbird/multinomial.py:131: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead. WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/lib/python3.6/site-packages/songbird/multinomial.py:163: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead. 0%| | 0/80000 [00:00<?, ?it/s]WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/lib/python3.6/site-packages/songbird/multinomial.py:177: The name tf.RunOptions is deprecated. Please use tf.compat.v1.RunOptions instead. WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/lib/python3.6/site-packages/songbird/multinomial.py:179: The name tf.RunMetadata is deprecated. Please use tf.compat.v1.RunMetadata instead. 2022-07-05 21:04:30.895686: I tensorflow/core/profiler/lib/profiler_session.cc:205] Profiler session started. 10%|███▋ | 7964/80000 [00:09<01:17, 924.46it/s]2022-07-05 21:04:40.805710: I tensorflow/core/profiler/lib/profiler_session.cc:205] Profiler session started. 21%|███████▌ | 16805/80000 [00:19<01:18, 809.24it/s]2022-07-05 21:04:50.806916: I tensorflow/core/profiler/lib/profiler_session.cc:205] Profiler session started. 29%|██████████▎ | 23031/80000 [00:30<02:30, 379.62it/s]2022-07-05 21:05:00.810620: I tensorflow/core/profiler/lib/profiler_session.cc:205] Profiler session started. 35%|████████████▋ | 28123/80000 [00:39<01:11, 720.68it/s]2022-07-05 21:05:10.810196: I tensorflow/core/profiler/lib/profiler_session.cc:205] Profiler session started. 44%|███████████████▋ | 34870/80000 [00:49<01:02, 717.54it/s]2022-07-05 21:05:20.811089: I tensorflow/core/profiler/lib/profiler_session.cc:205] Profiler session started. 52%|██████████████████▌ | 41363/80000 [00:59<00:59, 651.93it/s]2022-07-05 21:05:30.817903: I tensorflow/core/profiler/lib/profiler_session.cc:205] Profiler session started. 60%|█████████████████████▌ | 47928/80000 [01:09<00:52, 608.16it/s]2022-07-05 21:05:40.821613: I tensorflow/core/profiler/lib/profiler_session.cc:205] Profiler session started. 68%|████████████████████████▋ | 54763/80000 [01:20<00:32, 765.67it/s]2022-07-05 21:05:50.822179: I tensorflow/core/profiler/lib/profiler_session.cc:205] Profiler session started. 77%|███████████████████████████▋ | 61429/80000 [01:29<00:37, 501.66it/s]2022-07-05 21:06:00.823158: I tensorflow/core/profiler/lib/profiler_session.cc:205] Profiler session started. 85%|██████████████████████████████▋ | 68137/80000 [01:39<00:17, 665.21it/s]2022-07-05 21:06:10.825128: I tensorflow/core/profiler/lib/profiler_session.cc:205] Profiler session started. 92%|█████████████████████████████████▎ | 73906/80000 [01:49<00:10, 593.96it/s]2022-07-05 21:06:20.826017: I tensorflow/core/profiler/lib/profiler_session.cc:205] Profiler session started. 99%|███████████████████████████████████▋| 79186/80000 [01:59<00:01, 678.98it/s]2022-07-05 21:06:30.828385: I tensorflow/core/profiler/lib/profiler_session.cc:205] Profiler session started. 100%|████████████████████████████████████| 80000/80000 [02:01<00:00, 660.34it/s] WARNING:tensorflow: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see: * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md * https://github.com/tensorflow/addons * https://github.com/tensorflow/io (for I/O related ops) If you depend on functionality not listed there, please file an issue.
This should be run from a conda environment in which Qurro (but not Songbird) is installed.
When we used Qurro through QIIME 2, we had to specify the differential-plot
command in order to let the Qurro QIIME 2 plugin know we were working with feature differentials.
Now that we're running Qurro outside of QIIME 2, we don't need to specify this; Qurro can accept either feature differentials or feature loadings as input.
!qurro --help
Usage: qurro [OPTIONS] Generates a visualization of feature rankings and log-ratios. The resulting visualization contains two plots. The first plot shows how features are ranked, and the second plot shows the log-ratio of "selected" features' abundances within samples. The visualization is interactive, so which features are "selected" to construct log-ratios -- as well as various other properties of the visualization -- can be changed by the user. Options: -r, --ranks TEXT Either feature differentials (contained in a TSV file, where each row describes a feature and each column describes a differential field) or a scikit-bio OrdinationResults file for a biplot (containing feature loadings). When sorted numerically, differentials and feature loadings alike provide 'rankings.' [required] -t, --table TEXT A BIOM table describing the abundances of the ranked features in samples. Note that empty samples and features will be removed from the Qurro visualization. [required] -sm, --sample-metadata TEXT Sample metadata, formatted as a TSV file (where each row describes a sample and each column describes a 'metadata' field, and the first column contains sample IDs). In Qurro visualizations, you can use sample metadata fields to change the x-axis and colors in the sample plot. [required] -fm, --feature-metadata TEXT Feature metadata, formatted as a TSV file (where each row describes a feature and each column describes a 'metadata' field, and the first column contains feature IDs). In Qurro visualizations, you can use feature metadata fields to filter features in the rank plot when selecting log-ratios. -o, --output-dir TEXT Directory to write the HTML/JS/... files defining a Qurro visualization to. If this directory already exists, files/directories already within it will be overwritten if necessary. Note that you need to keep the files in this directory together -- moving the index.html file in this directory to another location, without also moving the JS/etc. files, will break the visualization. [required] -x, --extreme-feature-count INTEGER If specified, Qurro will only use this many "extreme" features from both ends of all of the rankings. This is useful when dealing with huge datasets (e.g. with BIOM tables exceeding 1 million entries), for which running Qurro normally might take a long amount of time or crash due to memory limits. Note that the automatic removal of empty samples and features from the table will be done *after* this filtering step. --debug If this flag is used, Qurro will output debug messages. --version Show the version and exit. --help Show this message and exit.
!qurro \
--ranks output/differentials.tsv \
--table input/redsea.biom \
--sample-metadata input/redsea_metadata.txt \
--feature-metadata input/feature_metadata.txt \
--output-dir output/qurro_plot_standalone/
Successfully generated a visualization in the folder output/qurro_plot_standalone/.
We just generated a Qurro visualization in the folder output/qurro_plot_standalone/
. This visualization is analogous to the QZV file we generated above using QIIME 2. You can view this visualization by just opening up output/qurro_plot_standalone/index.html
in a modern web browser.
That's it! If you have any more questions about using Qurro, feel free to contact us (see the Qurro README for contact information).