Visualizing DEICODE feature loadings with Qurro

In this example, we use data from this Qiita study. It's associated with the following paper:

Tripathi, A., Melnik, A. V., Xue, J., Poulsen, O., Meehan, M. J., Humphrey, G., ... & Haddad, G. (2018). Intermittent hypoxia and hypercapnia, a hallmark of obstructive sleep apnea, alters the gut microbiome and metabolome. mSystems, 3(3), e00020-18.

Requirements

This notebook relies on QIIME 2, DEICODE, q2-emperor, and Qurro all being installed. You should be in a QIIME 2 conda environment.

0. Setting up

In this section, we replace the output directory with an empty directory. This just lets us run this notebook multiple times, without any tools complaining about overwriting files.

In [1]:
# Clear the output directory so we can write these files there
!rm -rf output/*
# Since git doesn't keep track of empty directories, create the output/ directory if it doesn't already exist
# (if it does already exist, -p ensures that an error won't be thrown)
!mkdir -p output

1. Using DEICODE and Qurro through QIIME 2

You can use DEICODE and Qurro inside or outside of QIIME 2. In this section, we'll use DEICODE and Qurro from within QIIME 2; in the next section, we'll use these tools outside of QIIME 2.

If you just installed DEICODE or Qurro, it's advised that you run qiime dev refresh-cache on your system afterwards in order to get QIIME 2 to "find" these tools' QIIME 2 plugins.

1. A. Using DEICODE through QIIME 2

In order to use this dataset's BIOM table in QIIME 2, we need to import it as a FeatureTable[Frequency] QIIME 2 artifact.

In [2]:
!qiime tools import \
    --input-path input/qiita_10422_table.biom \
    --output-path output/qiita_10422_table.biom.qza \
    --type FeatureTable[Frequency]
Imported input/qiita_10422_table.biom as BIOMV210DirFmt to output/qiita_10422_table.biom.qza

Now, we can run DEICODE through QIIME 2 on our imported BIOM table. This produces two output files: a biplot and a distance matrix. (We're going to use Qurro to visualize the feature loadings contained in the biplot output file.)

Please see DEICODE's official documentation for more information about how it works and how its output files are formatted.

In [3]:
!qiime deicode auto-rpca \
    --i-table output/qiita_10422_table.biom.qza \
    --o-biplot output/ordination.qza \
    --o-distance-matrix output/dist_matrix.qza
QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.
Saved PCoAResults % Properties('biplot') to: output/ordination.qza
Saved DistanceMatrix to: output/dist_matrix.qza

1. A. I. Optional: Visualizing the DEICODE biplot in Emperor

This step isn't required if you just want to use DEICODE with Qurro. However, it provides some interesting context about the biplot that DEICODE just generated.

To quote the DEICODE documentation linked above:

Biplots are exploratory visualization tools that allow us to represent the features (i.e. taxonomy or OTUs) that strongly influence the principal component axis as arrows. The interpretation of the compositional biplot differs slightly from classical biplot interpretation [...] The important features with regard to sample clusters are not a single arrow but [...] the log ratio between features represented by arrows pointing in different directions.

In [4]:
!qiime emperor biplot \
    --i-biplot output/ordination.qza \
    --m-sample-metadata-file input/qiita_10422_metadata.tsv \
    --m-feature-metadata-file input/taxonomy.tsv \
    --o-visualization output/biplot.qzv \
    --p-number-of-features 3
Saved Visualization to: output/biplot.qzv

The biplot.qzv file we just generated can be visualized in Emperor (either using qiime tools view or by uploading it to view.qiime2.org). As mentioned above, arrows in the biplot represent features; you can try changing the --p-number-of-features parameter to adjust how many arrows are shown in the biplot.

1. B. Using Qurro through QIIME 2

Since our "feature rankings" are the (sorted) feature loadings within the biplot DEICODE just produced, we'll use the qiime qurro loading-plot command.

In [5]:
!qiime qurro loading-plot --help
Usage: qiime qurro loading-plot [OPTIONS]

  Generates an interactive visualization of feature loadings in tandem with
  a visualization of the log-ratios of selected features' sample abundances.

Inputs:
  --i-ranks ARTIFACT PCoAResults % Properties('biplot')
                       A biplot containing feature loadings.        [required]
  --i-table ARTIFACT FeatureTable[Frequency]
                       A BIOM table describing the abundances of the ranked
                       features in samples. Note that empty samples and
                       features will be removed from the Qurro visualization.
                                                                    [required]
Parameters:
  --m-sample-metadata-file METADATA...
    (multiple          Sample metadata. In Qurro visualizations, you can use
     arguments will    sample metadata fields to change the x-axis and colors
     be merged)        in the sample plot.                          [required]
  --m-feature-metadata-file METADATA...
    (multiple          Feature metadata (for example, if your features are
     arguments will    ASVs or OTUs, this could be taxonomy). You can use
     be merged)        feature metadata fields to filter features in the rank
                       plot when selecting log-ratios.              [optional]
  --p-extreme-feature-count INTEGER
                       If specified, Qurro will only use this many "extreme"
                       features from both ends of all of the rankings. This is
                       useful when dealing with huge datasets (e.g. with BIOM
                       tables exceeding 1 million entries), for which running
                       Qurro normally might take a long amount of time or
                       crash due to memory limits. Note that the automatic
                       removal of empty samples and features from the table
                       will be done *after* this filtering step.    [optional]
  --p-debug / --p-no-debug
                       If this flag is used, Qurro will output debug
                       messages. Note that you'll also need to use the
                       --verbose option to see these messages.
                                                              [default: False]
Outputs:
  --o-visualization VISUALIZATION
                                                                    [required]
Miscellaneous:
  --output-dir PATH    Output unspecified results to a directory
  --verbose / --quiet  Display verbose output to stdout and/or stderr during
                       execution of this action. Or silence output if
                       execution is successful (silence is golden).
  --citations          Show citations and exit.
  --help               Show this message and exit.
In [6]:
!qiime qurro loading-plot \
    --i-ranks output/ordination.qza \
    --i-table output/qiita_10422_table.biom.qza \
    --m-sample-metadata-file input/qiita_10422_metadata.tsv \
    --m-feature-metadata-file input/taxonomy.tsv \
    --verbose \
    --o-visualization output/qurro_plot_q2.qzv
/Users/mfedarko/Dropbox/Work/KnightLab/qurro/qurro/_df_utils.py:126: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  table_sdf = pd.SparseDataFrame(table.matrix_data, default_fill_value=0.0)
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/sparse/frame.py:257: FutureWarning: SparseSeries is deprecated and will be removed in a future version.
Use a Series with sparse values instead.

    >>> series = pd.Series(pd.SparseArray(...))

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  sparse_index=BlockIndex(N, blocs, blens),
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/frame.py:3471: FutureWarning: SparseSeries is deprecated and will be removed in a future version.
Use a Series with sparse values instead.

    >>> series = pd.Series(pd.SparseArray(...))

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  return klass(values, index=self.index, name=items, fastpath=True)
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/ops/__init__.py:1641: FutureWarning: SparseSeries is deprecated and will be removed in a future version.
Use a Series with sparse values instead.

    >>> series = pd.Series(pd.SparseArray(...))

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  return self._constructor(new_values, index=self.index, name=self.name)
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/sparse/frame.py:339: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  default_fill_value=self.default_fill_value,
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/generic.py:6289: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  return self._constructor(new_data).__finalize__(self)
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/generic.py:5884: FutureWarning: SparseSeries is deprecated and will be removed in a future version.
Use a Series with sparse values instead.

    >>> series = pd.Series(pd.SparseArray(...))

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  return self._constructor(new_data).__finalize__(self)
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/sparse/frame.py:785: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  return self._constructor(new_arrays, index=index, columns=columns).__finalize__(
689 feature(s) in the BIOM table were not present in the feature rankings.
These feature(s) have been removed from the visualization.
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/generic.py:3606: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  result = self._constructor(new_data).__finalize__(self)
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/generic.py:1999: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  return self._constructor(result, **d).__finalize__(self)
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/sparse/frame.py:745: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  default_fill_value=self._default_fill_value,
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/generic.py:9126: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  return self._constructor(new_data).__finalize__(self)
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/sparse/frame.py:854: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  default_kind=self._default_kind,
Saved Visualization to: output/qurro_plot_q2.qzv

That's it! Now, we've created a QZV file (describing a Qurro visualization) at output/qurro_plot_q2.qzv. As with the biplot.qzv file created by step 1.B. above, you can view this visualization in one of the following ways:

  1. Upload the QZV file to view.qiime2.org.
  2. View the QZV file using qiime tools view.

2. Using DEICODE and Qurro as standalone tools

We don't need to use DEICODE and Qurro through QIIME 2; if you want, you can run these tools outside of QIIME 2. Although this means you don't have access to some of QIIME 2's functionality (e.g. provenance tracking, or artifact semantic types), the results you get should be the same.

2. A. Using DEICODE as a standalone tool

In [7]:
!deicode auto-rpca \
    --in-biom input/qiita_10422_table.biom \
    --output-dir output/
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/biom/table.py:4068: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  return constructor(mat, index=index, columns=columns)
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/sparse/frame.py:257: FutureWarning: SparseSeries is deprecated and will be removed in a future version.
Use a Series with sparse values instead.

    >>> series = pd.Series(pd.SparseArray(...))

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  sparse_index=BlockIndex(N, blocs, blens),
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/generic.py:4583: FutureWarning: SparseSeries is deprecated and will be removed in a future version.
Use a Series with sparse values instead.

    >>> series = pd.Series(pd.SparseArray(...))

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  return self._constructor(new_data).__finalize__(self)
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/sparse/frame.py:854: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  default_kind=self._default_kind,

DEICODE has now generated ordination.txt and distance-matrix.tsv files in the output directory. We can use the ordination.txt file with Qurro (when run outside of QIIME 2).

2. B. Using Qurro as a standalone tool

When we used Qurro through QIIME 2, we had to specify the loading-plot command in order to let the Qurro QIIME 2 plugin know we were working with feature loadings.

Now that we're running Qurro outside of QIIME 2, we don't need to specify this; Qurro can accept either feature differentials or feature loadings as input.

In [8]:
!qurro --help
Usage: qurro [OPTIONS]

  Generates a visualization of feature rankings and log-ratios.

  The resulting visualization contains two plots. The first plot shows how
  features are ranked, and the second plot shows the log-ratio of "selected"
  features' abundances within samples.

  The visualization is interactive, so which features are "selected" to
  construct log-ratios -- as well as various other properties of the
  visualization -- can be changed by the user.

Options:
  -r, --ranks TEXT                Either feature differentials (contained in a
                                  TSV file, where each row describes a feature
                                  and each column describes a differential
                                  field) or a scikit-bio OrdinationResults
                                  file for a biplot (containing feature
                                  loadings). When sorted numerically,
                                  differentials and feature loadings alike
                                  provide 'rankings.'  [required]
  -t, --table TEXT                A BIOM table describing the abundances of
                                  the ranked features in samples. Note that
                                  empty samples and features will be removed
                                  from the Qurro visualization.  [required]
  -sm, --sample-metadata TEXT     Sample metadata, formatted as a TSV file
                                  (where each row describes a sample and each
                                  column describes a 'metadata' field, and the
                                  first column contains sample IDs). In Qurro
                                  visualizations, you can use sample metadata
                                  fields to change the x-axis and colors in
                                  the sample plot.  [required]
  -fm, --feature-metadata TEXT    Feature metadata, formatted as a TSV file
                                  (where each row describes a feature and each
                                  column describes a 'metadata' field, and the
                                  first column contains feature IDs). In Qurro
                                  visualizations, you can use feature metadata
                                  fields to filter features in the rank plot
                                  when selecting log-ratios.
  -o, --output-dir TEXT           Directory to write the HTML/JS/... files
                                  defining a Qurro visualization to. If this
                                  directory already exists, files/directories
                                  already within it will be overwritten if
                                  necessary. Note that you need to keep the
                                  files in this directory together -- moving
                                  the index.html file in this directory to
                                  another location, without also moving the
                                  JS/etc. files, will break the visualization.
                                  [required]
  -x, --extreme-feature-count INTEGER
                                  If specified, Qurro will only use this many
                                  "extreme" features from both ends of all of
                                  the rankings. This is useful when dealing
                                  with huge datasets (e.g. with BIOM tables
                                  exceeding 1 million entries), for which
                                  running Qurro normally might take a long
                                  amount of time or crash due to memory
                                  limits. Note that the automatic removal of
                                  empty samples and features from the table
                                  will be done *after* this filtering step.
  --debug                         If this flag is used, Qurro will output
                                  debug messages.
  --version                       Show the version and exit.
  --help                          Show this message and exit.
In [9]:
!qurro \
    --ranks output/ordination.txt \
    --table input/qiita_10422_table.biom \
    --sample-metadata input/qiita_10422_metadata.tsv \
    --feature-metadata input/taxonomy.tsv \
    --output-dir output/qurro_plot_standalone/
/Users/mfedarko/Dropbox/Work/KnightLab/qurro/qurro/_df_utils.py:126: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  table_sdf = pd.SparseDataFrame(table.matrix_data, default_fill_value=0.0)
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/sparse/frame.py:257: FutureWarning: SparseSeries is deprecated and will be removed in a future version.
Use a Series with sparse values instead.

    >>> series = pd.Series(pd.SparseArray(...))

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  sparse_index=BlockIndex(N, blocs, blens),
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/frame.py:3471: FutureWarning: SparseSeries is deprecated and will be removed in a future version.
Use a Series with sparse values instead.

    >>> series = pd.Series(pd.SparseArray(...))

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  return klass(values, index=self.index, name=items, fastpath=True)
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/ops/__init__.py:1641: FutureWarning: SparseSeries is deprecated and will be removed in a future version.
Use a Series with sparse values instead.

    >>> series = pd.Series(pd.SparseArray(...))

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  return self._constructor(new_values, index=self.index, name=self.name)
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/sparse/frame.py:339: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  default_fill_value=self.default_fill_value,
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/generic.py:6289: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  return self._constructor(new_data).__finalize__(self)
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/generic.py:5884: FutureWarning: SparseSeries is deprecated and will be removed in a future version.
Use a Series with sparse values instead.

    >>> series = pd.Series(pd.SparseArray(...))

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  return self._constructor(new_data).__finalize__(self)
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/sparse/frame.py:785: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  return self._constructor(new_arrays, index=index, columns=columns).__finalize__(
689 feature(s) in the BIOM table were not present in the feature rankings.
These feature(s) have been removed from the visualization.
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/generic.py:3606: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  result = self._constructor(new_data).__finalize__(self)
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/generic.py:1999: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  return self._constructor(result, **d).__finalize__(self)
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/sparse/frame.py:745: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  default_fill_value=self._default_fill_value,
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/generic.py:9126: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  return self._constructor(new_data).__finalize__(self)
/Users/mfedarko/anaconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/sparse/frame.py:854: FutureWarning: SparseDataFrame is deprecated and will be removed in a future version.
Use a regular DataFrame whose columns are SparseArrays instead.

See http://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating for more.

  default_kind=self._default_kind,
Successfully generated a visualization in the folder output/qurro_plot_standalone/.

We just generated a Qurro visualization in the folder output/qurro_plot_standalone/. This visualization is analogous to the QZV file we generated above using QIIME 2. You can view this visualization by just opening up output/qurro_plot_standalone/index.html in a modern web browser.

That's it! If you have any more questions about using Qurro, feel free to contact us (see the Qurro README for contact information).

In [ ]: