Notebook

Access and Analyze `scanpy anndata` Objects from a Manuscript¶

This guide provides steps to access and analyze the scanpy anndata objects associated with a recent manuscript. These objects are essential for computational biologists and data scientists working in genomics and related fields. There are three replicates available for download:

Each anndata object contains several elements crucial for comprehensive data analysis:

.X: Filtered, normalized, and log-transformed count matrix.
.raw: Original, filtered raw count matrix.
.obsm['MAGIC_imputed_data']: Imputed count matrix using MAGIC algorithm.
.obsm['tsne']: t-SNE maps (as presented in the manuscript), generated using scaled diffusion components.
.obs['clusters']: Cell clustering information.
.obs['palantir_pseudotime']: Cell pseudo-time ordering, as determined by Palantir.
.obs['palantir_diff_potential']: Palantir-determined differentiation potential of cells.
.obsm['palantir_branch_probs']: Probabilities of cells branching into different lineages, according to Palantir.
.uns['palantir_branch_probs_cell_types']: Labels for Palantir branch probabilities.
.uns['ct_colors']: Color codes for cell types, as used in the manuscript.
.uns['cluster_colors']: Color codes for cell clusters, as used in the manuscript.

Python Code for Data Access:¶

In [1]:

import scanpy as sc

# Read in the data, with backup URLs provided
adata_Rep1 = sc.read(
    "../data/human_cd34_bm_rep1.h5ad",
    backup_url="https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep1.h5ad",
)
adata_Rep2 = sc.read(
    "../data/human_cd34_bm_rep2.h5ad",
    backup_url="https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep2.h5ad",
)
adata_Rep3 = sc.read(
    "../data/human_cd34_bm_rep3.h5ad",
    backup_url="https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep3.h5ad",
)

In [2]:

adata_Rep1

Out[2]:

AnnData object with n_obs × n_vars = 5780 × 14651
    obs: 'clusters', 'palantir_pseudotime', 'palantir_diff_potential'
    uns: 'cluster_colors', 'ct_colors', 'palantir_branch_probs_cell_types'
    obsm: 'tsne', 'MAGIC_imputed_data', 'palantir_branch_probs'

In [3]:

adata_Rep2

Out[3]:

AnnData object with n_obs × n_vars = 6501 × 14913
    obs: 'clusters', 'palantir_pseudotime', 'palantir_diff_potential'
    uns: 'cluster_colors', 'ct_colors', 'palantir_branch_probs_cell_types'
    obsm: 'tsne', 'MAGIC_imputed_data', 'palantir_branch_probs'

In [4]:

adata_Rep3

Out[4]:

AnnData object with n_obs × n_vars = 12046 × 14044
    obs: 'clusters', 'palantir_pseudotime', 'palantir_diff_potential'
    uns: 'cluster_colors', 'ct_colors', 'palantir_branch_probs_cell_types'
    obsm: 'tsne', 'MAGIC_imputed_data', 'palantir_branch_probs'

Converting `anndata` Objects to `Seurat` Objects Using R¶

For researchers working with R and Seurat, the process to convert anndata objects to Seurat objects involves the following steps:

Set Up R Environment and Libraries:
- Load the necessary libraries: Seurat and anndata.
Download and Read the Data:
- Use curl::curl_download to download the anndata from the provided URLs.
- Read the data using the read_h5ad method from the anndata library.
Create Seurat Objects:
- Use the CreateSeuratObject function to convert the data into Seurat objects, incorporating counts and metadata from the anndata object.
- Transfer additional data like tSNE embeddings, imputed gene expressions, and cell fate probabilities into the appropriate slots in the Seurat object.

R Code Snippet:¶

In [ ]:

# this cell only exists to allow running R code inside this python notebook using a conda kernel
import sys
import os

# Get the path to the python executable
python_executable_path = sys.executable

# Extract the path to the environment from the path to the python executable
env_path = os.path.dirname(os.path.dirname(python_executable_path))

print(
    f"Conda env path: {env_path}\n"
    "Please make sure you have R installed in the conda environment."
)

os.environ['R_HOME'] = os.path.join(env_path, 'lib', 'R')

%load_ext rpy2.ipython

In [6]:

%%R
library(Seurat)
library(anndata)

create_seurat <- function(url) {
  file_path <- sub("https://s3.amazonaws.com/dp-lab-data-public/palantir/", "../data/", url)
  if (!file.exists(file_path)) {
    curl::curl_download(url, file_path)
  }
  data <- read_h5ad(file_path)
  
  seurat_obj <- CreateSeuratObject(
    counts = t(data$X), 
    meta.data = data$obs,
    project = "CD34+ Bone Marrow Cells"
  )
  tsne_data <- data$obsm[["tsne"]]
  rownames(tsne_data) <- rownames(data$obs)
  colnames(tsne_data) <- c("tSNE_1", "tSNE_2")
  seurat_obj[["tsne"]] <- CreateDimReducObject(
    embeddings = tsne_data,
    key = "tSNE_"
  )
  imputed_data <- t(data$obsm[["MAGIC_imputed_data"]])
  colnames(imputed_data) <- rownames(data$obs)
  rownames(imputed_data) <- rownames(data$var)
  seurat_obj[["MAGIC_imputed"]] <- CreateAssayObject(counts = imputed_data)
  fate_probs <- as.data.frame(data$obsm[["palantir_branch_probs"]])
  colnames(fate_probs) <- data$uns[["palantir_branch_probs_cell_types"]]
  rownames(fate_probs) <- rownames(data$obs)
  seurat_obj <- AddMetaData(seurat_obj, metadata = fate_probs)

  return(seurat_obj)
}

human_cd34_bm_Rep1 <- create_seurat("https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep1.h5ad")
human_cd34_bm_Rep2 <- create_seurat("https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep2.h5ad")
human_cd34_bm_Rep3 <- create_seurat("https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep3.h5ad")

R[write to console]: Loading required package: SeuratObject

R[write to console]: Loading required package: sp

R[write to console]: 
Attaching package: ‘SeuratObject’


R[write to console]: The following object is masked from ‘package:base’:

    intersect

    WARNING: The R package "reticulate" only fixed recently
    an issue that caused a segfault when used with rpy2:
    https://github.com/rstudio/reticulate/pull/1188
    Make sure that you use a version of that package that includes
    the fix.

R[write to console]: 
Attaching package: ‘anndata’


R[write to console]: The following object is masked from ‘package:SeuratObject’:

    Layers


R[write to console]: Warning:
R[write to console]:  Feature names cannot have underscores ('_'), replacing with dashes ('-')

R[write to console]: Warning:
R[write to console]:  Data is of class matrix. Coercing to dgCMatrix.

R[write to console]: Warning:
R[write to console]:  Feature names cannot have underscores ('_'), replacing with dashes ('-')

R[write to console]: Warning:
R[write to console]:  Feature names cannot have underscores ('_'), replacing with dashes ('-')

R[write to console]: Warning:
R[write to console]:  Feature names cannot have underscores ('_'), replacing with dashes ('-')

R[write to console]: Warning:
R[write to console]:  Data is of class matrix. Coercing to dgCMatrix.

R[write to console]: Warning:
R[write to console]:  Feature names cannot have underscores ('_'), replacing with dashes ('-')

R[write to console]: Warning:
R[write to console]:  Feature names cannot have underscores ('_'), replacing with dashes ('-')

R[write to console]: Warning:
R[write to console]:  Feature names cannot have underscores ('_'), replacing with dashes ('-')

R[write to console]: Warning:
R[write to console]:  Data is of class matrix. Coercing to dgCMatrix.

R[write to console]: Warning:
R[write to console]:  Feature names cannot have underscores ('_'), replacing with dashes ('-')

R[write to console]: Warning:
R[write to console]:  Feature names cannot have underscores ('_'), replacing with dashes ('-')

In [7]:

%%R

human_cd34_bm_Rep1

An object of class Seurat 
29302 features across 5780 samples within 2 assays 
Active assay: RNA (14651 features, 0 variable features)
 1 layer present: counts
 1 other assay present: MAGIC_imputed
 1 dimensional reduction calculated: tsne

In [8]:

%%R

human_cd34_bm_Rep2

An object of class Seurat 
29826 features across 6501 samples within 2 assays 
Active assay: RNA (14913 features, 0 variable features)
 1 layer present: counts
 1 other assay present: MAGIC_imputed
 1 dimensional reduction calculated: tsne

In [9]:

%%R

human_cd34_bm_Rep3

An object of class Seurat 
28088 features across 12046 samples within 2 assays 
Active assay: RNA (14044 features, 0 variable features)
 1 layer present: counts
 1 other assay present: MAGIC_imputed
 1 dimensional reduction calculated: tsne

In [ ]:

Access and Analyze scanpy anndata Objects from a Manuscript¶

Python Code for Data Access:¶

Converting anndata Objects to Seurat Objects Using R¶

R Code Snippet:¶

Access and Analyze `scanpy anndata` Objects from a Manuscript¶

Converting `anndata` Objects to `Seurat` Objects Using R¶