Calculating Feature Log-Ratios Directly using Qarcoal

Occasionally we might be only interested in the log-ratios between two features and not the ranks. In this case, it is useful to have a way to skip the step of running DEICODE/Songbird. This also has the advantage of allowing programmatic generation (through CLI or Python) of log-ratios for further visualization/analysis. We can perform this action using Qarcoal.

We will use the same dataset featured in the Qurro DEICODE tutorial (deicode_example.ipynb).

Requirements

This notebook relies on QIIME 2, DEICODE, and Qurro all being installed. You should be in a QIIME 2 conda environment.

0. Setting Up

In this section, we replace the output directory with an empty directory. This just lets us run this notebook multiple times, without any tools complaining about overwriting files.

In [1]:
# Clear the output directory so we can write these files there
!rm -rf output/*
# Since git doesn't keep track of empty directories, create the output/ directory if it doesn't already exist
# (if it does already exist, -p ensures that an error won't be thrown)
!mkdir -p output

1. Using Qarcoal Through QIIME 2

Currently, Qarcoal can only be used through QIIME 2. However, we are working on a standalone version -- so stay tuned.

1.A. Import the feature table into a QIIME 2 artifact

In [2]:
!qiime tools import \
    --input-path ../DEICODE_sleep_apnea/input/qiita_10422_table.biom \
    --output-path output/qiita_10422_table.biom.qza \
    --type FeatureTable[Frequency]
Imported ../DEICODE_sleep_apnea/input/qiita_10422_table.biom as BIOMV210DirFmt to output/qiita_10422_table.biom.qza

1.B. Run Qarcoal!

Now, we can run Qarcoal through Qiime2 on our imported BIOM table. This produces one output: a table of samples with their associated log-ratios of selected features. We will use g__Allobaculum as our numerator string and g__Coprococcus as our denominator string for demonstration.

Note that Qarcoal requires the taxonomy file to be a .qza artifact rather than a .tsv file.

In [3]:
!qiime qurro qarcoal --help
Usage: qiime qurro qarcoal [OPTIONS]

  Compute the log-ratio of two specified feature strings by searching
  taxonomy for incidence of each string, summing all relevant feature counts
  for each sample, and taking the natural log of the numerator sum divided
  by denominator sum.

Inputs:
  --i-table ARTIFACT FeatureTable[Frequency]
                         Feature table describing the abundances of the
                         features in samples.                       [required]
  --i-taxonomy ARTIFACT FeatureData[Taxonomy]
                         Taxonomy information to be used for selecting
                         features in log-ratio.                     [required]
Parameters:
  --p-num-string TEXT    Numerator string to search for in taxonomy.
                                                                    [required]
  --p-denom-string TEXT  Denominator string to search for in taxonomy.
                                                                    [required]
  --m-samples-to-use-file METADATA...
    (multiple            Sample metadata file. If provided, log-ratios will
     arguments will be   only be calculated from sample labels present in this
     merged)             file.                                      [optional]
  --p-allow-shared-features / --p-no-allow-shared-features
                         Boolean value denoting handling of features shared
                         between numerator and denominator. If False, an error
                         is raised if features are shared. If True, shared
                         features are retained in log-ratio computation.
                                                              [default: False]
Outputs:
  --o-qarcoal-log-ratios ARTIFACT SampleData[LogRatios]
                                                                    [required]
Miscellaneous:
  --output-dir PATH      Output unspecified results to a directory
  --verbose / --quiet    Display verbose output to stdout and/or stderr
                         during execution of this action. Or silence output if
                         execution is successful (silence is golden).
  --citations            Show citations and exit.
  --help                 Show this message and exit.
In [4]:
!qiime qurro qarcoal \
    --i-table output/qiita_10422_table.biom.qza \
    --i-taxonomy ../DEICODE_sleep_apnea/input/taxonomy.qza \
    --p-num-string g__Allobaculum \
    --p-denom-string g__Coprococcus \
    --o-qarcoal-log-ratios output/allobaculum_coprococcus_log_ratios.qza
Saved SampleData[LogRatios] to: output/allobaculum_coprococcus_log_ratios.qza

We just ran Qarcoal! The output log-ratios for each sample are contained in the output/allobaculum_coprococcus_log_ratios.qza artifact that was just produced. You can do lots of things with this artifact, including loading it programmatically into Python using QIIME 2's Artifact API, running qiime metadata tabulate on it to produce a sortable table showing the log-ratios, etc.

2. Verifying Qarcoal's output against Qurro's output in Python

We're going to load the log-ratios we just produced in Qarcoal into Python using the Artifact API. This will let us demonstrate that Qarcoal's log-ratios match up with the same log-ratios Qurro would produce for the same data.

In [5]:
import pandas as pd
from qiime2 import Artifact, Metadata

2.A. Load Qarcoal Log-Ratios in Python using QIIME 2's Artifact API

In [6]:
qarcoal_log_ratios = Artifact.load("output/allobaculum_coprococcus_log_ratios.qza")
qarcoal_log_ratios_df = qarcoal_log_ratios.view(pd.DataFrame)
qarcoal_log_ratios_df.head()
Out[6]:
Num_Sum Denom_Sum log_ratio
Sample-ID
10422.20.F.8 2.0 113.0 -4.034241
10422.29.F.11 4.0 27.0 -1.909543
10422.30.F.11 2.0 279.0 -4.938065
10422.29.F.3 14.0 950.0 -4.217405
10422.26.F.4 2.0 991.0 -6.205567

2.B. Run DEICODE

In order to run Qurro, we need some sort of feature rankings (at least right now). So we're going to run DEICODE.

We're going to do this through the Artifact API, but you could just as easily run the following through the command line. (Please see the DEICODE example notebook in this repository, and the DEICODE documentation, for more details on using DEICODE.)

NOTE: By default, DEICODE filters your input feature table. We will override this by setting both min-feature-count and min-sample-count to 0. If you want to match the DEICODE default filtering settings, filter your feature table to match DEICODE and pass a QIIME2 Metadata file containing the sample IDs to Qarcoal with the --m-samples-to-use-file flag.

In [7]:
from qiime2.plugins import deicode

table = Artifact.load("output/qiita_10422_table.biom.qza")

ordination, dist_matrix = deicode.actions.rpca(
    table = table,
    min_sample_count = 0,
    min_feature_count = 0)
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/biom/table.py:4068: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  return constructor(mat, index=index, columns=columns)
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/sparse/frame.py:257: FutureWarning: SparseSeries is deprecated and will be removed in a future version.
Use a Series with sparse values instead.

    >>> series = pd.Series(pd.SparseArray(...))

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  sparse_index=BlockIndex(N, blocs, blens),
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/generic.py:4583: FutureWarning: SparseSeries is deprecated and will be removed in a future version.
Use a Series with sparse values instead.

    >>> series = pd.Series(pd.SparseArray(...))

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  return self._constructor(new_data).__finalize__(self)
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/sparse/frame.py:854: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  default_kind=self._default_kind,

2.C. Run Qurro

We can then input the ordination into Qurro, save the visualization, and compare our results.

In [8]:
from qiime2.plugins import qurro

metadata = Metadata.load("../DEICODE_sleep_apnea/input/qiita_10422_metadata.tsv")
taxonomy = Metadata.load("../DEICODE_sleep_apnea/input/taxonomy.tsv")

qurro_viz = qurro.actions.loading_plot(
    ranks = ordination,
    table = table,
    sample_metadata = metadata,
    feature_metadata = taxonomy)

qurro_viz.visualization.save("output/qurro_viz.qzv")
/Users/mfedarko/Dropbox/Work/KnightLab/qurro/qurro/_df_utils.py:126: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  table_sdf = pd.SparseDataFrame(table.matrix_data, default_fill_value=0.0)
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/frame.py:3471: FutureWarning: SparseSeries is deprecated and will be removed in a future version.
Use a Series with sparse values instead.

    >>> series = pd.Series(pd.SparseArray(...))

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  return klass(values, index=self.index, name=items, fastpath=True)
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/ops/__init__.py:1641: FutureWarning: SparseSeries is deprecated and will be removed in a future version.
Use a Series with sparse values instead.

    >>> series = pd.Series(pd.SparseArray(...))

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  return self._constructor(new_values, index=self.index, name=self.name)
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/sparse/frame.py:339: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  default_fill_value=self.default_fill_value,
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/generic.py:6289: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  return self._constructor(new_data).__finalize__(self)
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/generic.py:5884: FutureWarning: SparseSeries is deprecated and will be removed in a future version.
Use a Series with sparse values instead.

    >>> series = pd.Series(pd.SparseArray(...))

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  return self._constructor(new_data).__finalize__(self)
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/sparse/frame.py:785: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  return self._constructor(new_arrays, index=index, columns=columns).__finalize__(
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/generic.py:3606: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  result = self._constructor(new_data).__finalize__(self)
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/generic.py:1999: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  return self._constructor(result, **d).__finalize__(self)
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/sparse/frame.py:745: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  default_fill_value=self._default_fill_value,
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/generic.py:9126: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  return self._constructor(new_data).__finalize__(self)
Out[8]:
'output/qurro_viz.qzv'

2.D. Compute the same log-ratio as before (g__Allobaculum to g__Coprococcus) in Qurro

Open the visualization here and type in g__Allobaculum in the numerator search bar and g__Coprococcus in the denominator search bar. Make sure you select the option to filter features from "Taxon" rather than "Feature ID."

(This screenshot is from Qurro v0.4.0, so future versions of Qurro might look a bit different.)

img

Click the Export sample data button and save the resulting sample_plot_data.tsv file to the input/ directory. (NOTE: We've provided a pre-downloaded version of this file in the input/ directory so that you can run this entire notebook automatically, but feel free to overwrite the input/sample_plot_data.tsv file with your own.)

2.E. Comparing the output of Qurro and Qarcoal!

We can now load the Qurro results and compare them with the Qarcoal results to make sure they match.

In [9]:
qurro_log_ratios_df = pd.read_csv("input/sample_plot_data.tsv", sep="\t", index_col=0)
qurro_log_ratios_df.head()
Out[9]:
Current_Natural_Log_Ratio age age.1
Sample ID
10422.21.F.3 -6.082980 11.0 11.0
10422.28.F.9 4.656135 14.0 14.0
10422.24.F.10 -4.964397 14.5 14.5
10422.21.F.10 NaN 14.5 14.5
10422.27.F.8 2.937259 13.5 13.5

We see that the Qurro results have at least one NaN log-ratio (it's actually a null in the TSV file, but Pandas treats it as a NaN). This just means that for this sample, the log-ratio could not be calculated due to 0s.

We can drop samples with these invalid log-ratios from the DataFrame using pd.isna() and some mildly fancy indexing.

In [10]:
qurro_log_ratios_df = qurro_log_ratios_df[~qurro_log_ratios_df["Current_Natural_Log_Ratio"].isna()]
qurro_log_ratios_df.head()
Out[10]:
Current_Natural_Log_Ratio age age.1
Sample ID
10422.21.F.3 -6.082980 11.0 11.0
10422.28.F.9 4.656135 14.0 14.0
10422.24.F.10 -4.964397 14.5 14.5
10422.27.F.8 2.937259 13.5 13.5
10422.19.F.4 -6.411818 11.5 11.5

First, we can get a preliminary sense of how well the two methods coincide by looking at the number of samples present.

In [11]:
qurro_log_ratios_df.shape[0] == qarcoal_log_ratios_df.shape[0]
Out[11]:
True

That's a good sign, but let's be more rigorous and make sure the samples are the same.

In [12]:
set(qurro_log_ratios_df.index) == set(qarcoal_log_ratios_df.index)
Out[12]:
True

Finally, let's make sure the log-ratios themselves are the same. Note that Qurro calculates log-ratios using Javascript, while Qarcoal uses Python. As a result, the individual values may differ very slightly due to implementation of the logarithm function. We will use np.allclose to check that the two are equal within a tolerance.

In [13]:
from numpy import allclose

qurro_values = qurro_log_ratios_df.sort_index()['Current_Natural_Log_Ratio'].to_numpy()
qarcoal_values = qarcoal_log_ratios_df.sort_index()['log_ratio'].to_numpy()

allclose(qurro_values, qarcoal_values)
Out[13]:
True

Success! Our Qarcoal-generated log-ratios are approximately equal to our Qurro-generated ones.

We hope you find Qarcoal useful, and please contact us or open an issue if you having questions or suggestions about using Qarcoal.