scanpy anndata
Objects from a Manuscript¶This guide provides steps to access and analyze the scanpy anndata
objects associated with a recent manuscript. These objects are essential for computational biologists and data scientists working in genomics and related fields. There are three replicates available for download:
Each anndata
object contains several elements crucial for comprehensive data analysis:
.X
: Filtered, normalized, and log-transformed count matrix..raw
: Original, filtered raw count matrix..obsm['MAGIC_imputed_data']
: Imputed count matrix using MAGIC algorithm..obsm['tsne']
: t-SNE maps (as presented in the manuscript), generated using scaled diffusion components..obs['clusters']
: Cell clustering information..obs['palantir_pseudotime']
: Cell pseudo-time ordering, as determined by Palantir..obs['palantir_diff_potential']
: Palantir-determined differentiation potential of cells..obsm['palantir_branch_probs']
: Probabilities of cells branching into different lineages, according to Palantir..uns['palantir_branch_probs_cell_types']
: Labels for Palantir branch probabilities..uns['ct_colors']
: Color codes for cell types, as used in the manuscript..uns['cluster_colors']
: Color codes for cell clusters, as used in the manuscript.import scanpy as sc
# Read in the data, with backup URLs provided
adata_Rep1 = sc.read(
"../data/human_cd34_bm_rep1.h5ad",
backup_url="https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep1.h5ad",
)
adata_Rep2 = sc.read(
"../data/human_cd34_bm_rep2.h5ad",
backup_url="https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep2.h5ad",
)
adata_Rep3 = sc.read(
"../data/human_cd34_bm_rep3.h5ad",
backup_url="https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep3.h5ad",
)
adata_Rep1
AnnData object with n_obs × n_vars = 5780 × 14651 obs: 'clusters', 'palantir_pseudotime', 'palantir_diff_potential' uns: 'cluster_colors', 'ct_colors', 'palantir_branch_probs_cell_types' obsm: 'tsne', 'MAGIC_imputed_data', 'palantir_branch_probs'
adata_Rep2
AnnData object with n_obs × n_vars = 6501 × 14913 obs: 'clusters', 'palantir_pseudotime', 'palantir_diff_potential' uns: 'cluster_colors', 'ct_colors', 'palantir_branch_probs_cell_types' obsm: 'tsne', 'MAGIC_imputed_data', 'palantir_branch_probs'
adata_Rep3
AnnData object with n_obs × n_vars = 12046 × 14044 obs: 'clusters', 'palantir_pseudotime', 'palantir_diff_potential' uns: 'cluster_colors', 'ct_colors', 'palantir_branch_probs_cell_types' obsm: 'tsne', 'MAGIC_imputed_data', 'palantir_branch_probs'
anndata
Objects to Seurat
Objects Using R¶For researchers working with R and Seurat, the process to convert anndata
objects to Seurat objects involves the following steps:
Set Up R Environment and Libraries:
Seurat
and anndata
.Download and Read the Data:
curl::curl_download
to download the anndata
from the provided URLs.read_h5ad
method from the anndata
library.Create Seurat Objects:
CreateSeuratObject
function to convert the data into Seurat objects, incorporating counts and metadata from the anndata
object.# this cell only exists to allow running R code inside this python notebook using a conda kernel
import sys
import os
# Get the path to the python executable
python_executable_path = sys.executable
# Extract the path to the environment from the path to the python executable
env_path = os.path.dirname(os.path.dirname(python_executable_path))
print(
f"Conda env path: {env_path}\n"
"Please make sure you have R installed in the conda environment."
)
os.environ['R_HOME'] = os.path.join(env_path, 'lib', 'R')
%load_ext rpy2.ipython
%%R
library(Seurat)
library(anndata)
create_seurat <- function(url) {
file_path <- sub("https://s3.amazonaws.com/dp-lab-data-public/palantir/", "../data/", url)
if (!file.exists(file_path)) {
curl::curl_download(url, file_path)
}
data <- read_h5ad(file_path)
seurat_obj <- CreateSeuratObject(
counts = t(data$X),
meta.data = data$obs,
project = "CD34+ Bone Marrow Cells"
)
tsne_data <- data$obsm[["tsne"]]
rownames(tsne_data) <- rownames(data$obs)
colnames(tsne_data) <- c("tSNE_1", "tSNE_2")
seurat_obj[["tsne"]] <- CreateDimReducObject(
embeddings = tsne_data,
key = "tSNE_"
)
imputed_data <- t(data$obsm[["MAGIC_imputed_data"]])
colnames(imputed_data) <- rownames(data$obs)
rownames(imputed_data) <- rownames(data$var)
seurat_obj[["MAGIC_imputed"]] <- CreateAssayObject(counts = imputed_data)
fate_probs <- as.data.frame(data$obsm[["palantir_branch_probs"]])
colnames(fate_probs) <- data$uns[["palantir_branch_probs_cell_types"]]
rownames(fate_probs) <- rownames(data$obs)
seurat_obj <- AddMetaData(seurat_obj, metadata = fate_probs)
return(seurat_obj)
}
human_cd34_bm_Rep1 <- create_seurat("https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep1.h5ad")
human_cd34_bm_Rep2 <- create_seurat("https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep2.h5ad")
human_cd34_bm_Rep3 <- create_seurat("https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep3.h5ad")
R[write to console]: Loading required package: SeuratObject R[write to console]: Loading required package: sp R[write to console]: Attaching package: ‘SeuratObject’ R[write to console]: The following object is masked from ‘package:base’: intersect
WARNING: The R package "reticulate" only fixed recently an issue that caused a segfault when used with rpy2: https://github.com/rstudio/reticulate/pull/1188 Make sure that you use a version of that package that includes the fix.
R[write to console]: Attaching package: ‘anndata’ R[write to console]: The following object is masked from ‘package:SeuratObject’: Layers R[write to console]: Warning: R[write to console]: Feature names cannot have underscores ('_'), replacing with dashes ('-') R[write to console]: Warning: R[write to console]: Data is of class matrix. Coercing to dgCMatrix. R[write to console]: Warning: R[write to console]: Feature names cannot have underscores ('_'), replacing with dashes ('-') R[write to console]: Warning: R[write to console]: Feature names cannot have underscores ('_'), replacing with dashes ('-') R[write to console]: Warning: R[write to console]: Feature names cannot have underscores ('_'), replacing with dashes ('-') R[write to console]: Warning: R[write to console]: Data is of class matrix. Coercing to dgCMatrix. R[write to console]: Warning: R[write to console]: Feature names cannot have underscores ('_'), replacing with dashes ('-') R[write to console]: Warning: R[write to console]: Feature names cannot have underscores ('_'), replacing with dashes ('-') R[write to console]: Warning: R[write to console]: Feature names cannot have underscores ('_'), replacing with dashes ('-') R[write to console]: Warning: R[write to console]: Data is of class matrix. Coercing to dgCMatrix. R[write to console]: Warning: R[write to console]: Feature names cannot have underscores ('_'), replacing with dashes ('-') R[write to console]: Warning: R[write to console]: Feature names cannot have underscores ('_'), replacing with dashes ('-')
%%R
human_cd34_bm_Rep1
An object of class Seurat 29302 features across 5780 samples within 2 assays Active assay: RNA (14651 features, 0 variable features) 1 layer present: counts 1 other assay present: MAGIC_imputed 1 dimensional reduction calculated: tsne
%%R
human_cd34_bm_Rep2
An object of class Seurat 29826 features across 6501 samples within 2 assays Active assay: RNA (14913 features, 0 variable features) 1 layer present: counts 1 other assay present: MAGIC_imputed 1 dimensional reduction calculated: tsne
%%R
human_cd34_bm_Rep3
An object of class Seurat 28088 features across 12046 samples within 2 assays Active assay: RNA (14044 features, 0 variable features) 1 layer present: counts 1 other assay present: MAGIC_imputed 1 dimensional reduction calculated: tsne