Occasionally we might be only interested in the log-ratios between two features and not the ranks. In this case, it is useful to have a way to skip the step of running DEICODE/Songbird. This also has the advantage of allowing programmatic generation (through CLI or Python) of log-ratios for further visualization/analysis. We can perform this action using Qarcoal.
We will use the same dataset featured in the Qurro DEICODE tutorial (deicode_example.ipynb
).
This notebook relies on QIIME 2, DEICODE, and Qurro all being installed. You should be in a QIIME 2 conda environment.
In this section, we replace the output directory with an empty directory. This just lets us run this notebook multiple times, without any tools complaining about overwriting files.
# Clear the output directory so we can write these files there
!rm -rf output
# Since git doesn't keep track of empty directories, create the output/ directory if it doesn't already exist
# (if it does already exist, -p ensures that an error won't be thrown)
!mkdir -p output
!qiime tools import \
--input-path ../DEICODE_sleep_apnea/input/qiita_10422_table.biom \
--output-path output/qiita_10422_table.biom.qza \
--type FeatureTable[Frequency]
Imported ../DEICODE_sleep_apnea/input/qiita_10422_table.biom as BIOMV210DirFmt to output/qiita_10422_table.biom.qza
Now, we can run Qarcoal through Qiime2 on our imported BIOM table. This produces one output: a table of samples with their associated log-ratios of selected features. We will use g__Allobaculum
as our numerator string and g__Coprococcus
as our denominator string for demonstration.
Note that Qarcoal requires the taxonomy file to be a .qza
artifact rather than a .tsv
file.
!qiime qurro qarcoal --help
Usage: qiime qurro qarcoal [OPTIONS] Compute the log-ratio of two specified feature strings by searching taxonomy for incidence of each string, summing all relevant feature counts for each sample, and taking the natural log of the numerator sum divided by denominator sum. Inputs: --i-table ARTIFACT FeatureTable[Frequency] Feature table describing the abundances of the features in samples. [required] --i-taxonomy ARTIFACT FeatureData[Taxonomy] Taxonomy information to be used for selecting features in log-ratio. [required] Parameters: --p-num-string TEXT Numerator string to search for in taxonomy. [required] --p-denom-string TEXT Denominator string to search for in taxonomy. [required] --m-samples-to-use-file METADATA... (multiple Sample metadata file. If provided, log-ratios will arguments will be only be calculated from sample labels present in this merged) file. [optional] --p-allow-shared-features / --p-no-allow-shared-features Boolean value denoting handling of features shared between numerator and denominator. If False, an error is raised if features are shared. If True, shared features are retained in log-ratio computation. [default: False] Outputs: --o-qarcoal-log-ratios ARTIFACT SampleData[LogRatios] [required] Miscellaneous: --output-dir PATH Output unspecified results to a directory --verbose / --quiet Display verbose output to stdout and/or stderr during execution of this action. Or silence output if execution is successful (silence is golden). --example-data PATH Write example data and exit. --citations Show citations and exit. --help Show this message and exit.
!qiime qurro qarcoal \
--i-table output/qiita_10422_table.biom.qza \
--i-taxonomy ../DEICODE_sleep_apnea/input/taxonomy.qza \
--p-num-string g__Allobaculum \
--p-denom-string g__Coprococcus \
--o-qarcoal-log-ratios output/allobaculum_coprococcus_log_ratios.qza
Saved SampleData[LogRatios] to: output/allobaculum_coprococcus_log_ratios.qza
We just ran Qarcoal! The output log-ratios for each sample are contained in the output/allobaculum_coprococcus_log_ratios.qza
artifact that was just produced. You can do lots of things with this artifact, including loading it programmatically into Python using QIIME 2's Artifact API, running qiime metadata tabulate
on it to produce a sortable table showing the log-ratios, etc.
We're going to load the log-ratios we just produced in Qarcoal into Python using the Artifact API. This will let us demonstrate that Qarcoal's log-ratios match up with the same log-ratios Qurro would produce for the same data.
import pandas as pd
from qiime2 import Artifact, Metadata
qarcoal_log_ratios = Artifact.load("output/allobaculum_coprococcus_log_ratios.qza")
qarcoal_log_ratios_df = qarcoal_log_ratios.view(pd.DataFrame)
qarcoal_log_ratios_df.head()
Num_Sum | Denom_Sum | log_ratio | |
---|---|---|---|
Sample-ID | |||
10422.21.F.4 | 9.0 | 659.0 | -4.293499 |
10422.22.F.8 | 5.0 | 3266.0 | -6.481883 |
10422.24.F.5 | 36.0 | 2028.0 | -4.031286 |
10422.17.F.9 | 8.0 | 50.0 | -1.832581 |
10422.18.F.12 | 9.0 | 141.0 | -2.751535 |
In order to run Qurro, we need some sort of feature rankings (at least right now). So we're going to run DEICODE.
We're going to do this through the Artifact API, but you could just as easily run the following through the command line. (Please see the DEICODE example notebook in this repository, and the DEICODE documentation, for more details on using DEICODE.)
NOTE: By default, DEICODE filters your input feature table. We will override this by setting both min-feature-count
and min-sample-count
to 0. If you want to match the DEICODE default filtering settings, filter your feature table to match DEICODE and pass a QIIME2 Metadata file containing the sample IDs to Qarcoal with the --m-samples-to-use-file
flag.
from qiime2.plugins import deicode
table = Artifact.load("output/qiita_10422_table.biom.qza")
ordination, dist_matrix = deicode.actions.rpca(
table = table,
min_sample_count = 0,
min_feature_count = 0)
We can then input the ordination into Qurro, save the visualization, and compare our results.
from qiime2.plugins import qurro
metadata = Metadata.load("../DEICODE_sleep_apnea/input/qiita_10422_metadata.tsv")
taxonomy = Metadata.load("../DEICODE_sleep_apnea/input/taxonomy.tsv")
qurro_viz = qurro.actions.loading_plot(
ranks = ordination,
table = table,
sample_metadata = metadata,
feature_metadata = taxonomy)
qurro_viz.visualization.save("output/qurro_viz.qzv")
'output/qurro_viz.qzv'
g__Allobaculum
to g__Coprococcus
) in Qurro¶Open the visualization here and type in g__Allobaculum
in the numerator search bar and g__Coprococcus
in the denominator search bar. Make sure you select the option to filter features from "Taxon" rather than "Feature ID."
(This screenshot is from Qurro v0.4.0, so future versions of Qurro might look a bit different.)
Click the Export sample data
button and save the resulting sample_plot_data.tsv
file to the input/
directory. (NOTE: We've provided a pre-downloaded version of this file in the input/
directory so that you can run this entire notebook automatically, but feel free to overwrite the input/sample_plot_data.tsv
file with your own.)
We can now load the Qurro results and compare them with the Qarcoal results to make sure they match.
qurro_log_ratios_df = pd.read_csv("input/sample_plot_data.tsv", sep="\t", index_col=0)
qurro_log_ratios_df.head()
Current_Natural_Log_Ratio | age | age.1 | |
---|---|---|---|
Sample ID | |||
10422.21.F.3 | -6.082980 | 11.0 | 11.0 |
10422.28.F.9 | 4.656135 | 14.0 | 14.0 |
10422.24.F.10 | -4.964397 | 14.5 | 14.5 |
10422.21.F.10 | NaN | 14.5 | 14.5 |
10422.27.F.8 | 2.937259 | 13.5 | 13.5 |
We see that the Qurro results have at least one NaN
log-ratio (it's actually a null
in the TSV file, but Pandas treats it as a NaN
). This just means that for this sample, the log-ratio could not be calculated due to 0s.
We can drop samples with these invalid log-ratios from the DataFrame using pd.isna()
and some mildly fancy indexing.
qurro_log_ratios_df = qurro_log_ratios_df[~qurro_log_ratios_df["Current_Natural_Log_Ratio"].isna()]
qurro_log_ratios_df.head()
Current_Natural_Log_Ratio | age | age.1 | |
---|---|---|---|
Sample ID | |||
10422.21.F.3 | -6.082980 | 11.0 | 11.0 |
10422.28.F.9 | 4.656135 | 14.0 | 14.0 |
10422.24.F.10 | -4.964397 | 14.5 | 14.5 |
10422.27.F.8 | 2.937259 | 13.5 | 13.5 |
10422.19.F.4 | -6.411818 | 11.5 | 11.5 |
First, we can get a preliminary sense of how well the two methods coincide by looking at the number of samples present.
qurro_log_ratios_df.shape[0] == qarcoal_log_ratios_df.shape[0]
True
That's a good sign, but let's be more rigorous and make sure the samples are the same.
set(qurro_log_ratios_df.index) == set(qarcoal_log_ratios_df.index)
True
Finally, let's make sure the log-ratios themselves are the same. Note that Qurro calculates log-ratios using Javascript, while Qarcoal uses Python. As a result, the individual values may differ very slightly due to implementation of the logarithm function. We will use np.allclose
to check that the two are equal within a tolerance.
from numpy import allclose
qurro_values = qurro_log_ratios_df.sort_index()['Current_Natural_Log_Ratio'].to_numpy()
qarcoal_values = qarcoal_log_ratios_df.sort_index()['log_ratio'].to_numpy()
allclose(qurro_values, qarcoal_values)
True
Success! Our Qarcoal-generated log-ratios are approximately equal to our Qurro-generated ones.
We hope you find Qarcoal useful, and please contact us or open an issue if you having questions or suggestions about using Qarcoal.