library(splatter)
library(Seurat)
library(conos)
library(pagoda2)
Loading required package: SingleCellExperiment Loading required package: SummarizedExperiment Loading required package: GenomicRanges Loading required package: stats4 Loading required package: BiocGenerics Loading required package: parallel Attaching package: ‘BiocGenerics’ The following objects are masked from ‘package:parallel’: clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB The following objects are masked from ‘package:stats’: IQR, mad, sd, var, xtabs The following objects are masked from ‘package:base’: anyDuplicated, append, as.data.frame, basename, cbind, colMeans, colnames, colSums, dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rowMeans, rownames, rowSums, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which, which.max, which.min Loading required package: S4Vectors Attaching package: ‘S4Vectors’ The following object is masked from ‘package:base’: expand.grid Loading required package: IRanges Loading required package: GenomeInfoDb Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. Loading required package: DelayedArray Loading required package: matrixStats Attaching package: ‘matrixStats’ The following objects are masked from ‘package:Biobase’: anyMissing, rowMedians Loading required package: BiocParallel Attaching package: ‘DelayedArray’ The following objects are masked from ‘package:matrixStats’: colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges The following objects are masked from ‘package:base’: aperm, apply Loading required package: Matrix Attaching package: ‘Matrix’ The following object is masked from ‘package:S4Vectors’: expand Loading required package: igraph Attaching package: ‘igraph’ The following objects are masked from ‘package:DelayedArray’: path, simplify The following object is masked from ‘package:GenomicRanges’: union The following object is masked from ‘package:IRanges’: union The following object is masked from ‘package:S4Vectors’: union The following objects are masked from ‘package:BiocGenerics’: normalize, path, union The following objects are masked from ‘package:stats’: decompose, spectrum The following object is masked from ‘package:base’: union
Run Seurat 3/Conos integration benchmarking just like the other R methods in the previous benchmarking notebook.
seurat3_time = c()
conos_time = c()
options(future.globals.maxSize = 4096 * 1024^2)
for (i in 10:14)
{
print(i)
#prepare splatter data
params = newSplatParams()
params = setParam(params, "nGenes", 5000)
params = setParam(params, "de.prob", 1)
params = setParam(params, "batchCells", c(2^i,2^i))
params = setParam(params, "group.prob", c(0.5,0.5))
sim = splatSimulate(params, method="groups", verbose=FALSE)
#Seurat/pagoda prep
srat = CreateSeuratObject(counts(sim))
srat@meta.data[,'Batch'] = colData(sim)[,'Batch']
srat.list = SplitObject(srat, split.by='Batch')
pagoda.list = list()
for (j in 1:length(srat.list))
{
srat.list[[j]] = NormalizeData(srat.list[[j]], verbose=FALSE)
VariableFeatures(srat.list[[j]]) = rownames(srat@assays$RNA)
pagoda.list[[j]] = srat.list[[j]]@assays$RNA@counts
}
panel.preprocessed <- lapply(pagoda.list, basicP2proc, n.cores=4, min.cells.per.gene=0, n.odgenes=5e3, get.largevis=FALSE, make.geneknn=FALSE)
reference.list = srat.list[c('Batch1','Batch2')]
names(panel.preprocessed) = c('Batch1','Batch2')
con <- Conos$new(panel.preprocessed, n.cores=8)
#actually running Seurat 3 integration
t1 = Sys.time()
srat.anchors = FindIntegrationAnchors(object.list = reference.list, dims = 1:20, verbose=FALSE)
srat.integrated <- IntegrateData(anchorset = srat.anchors, dims = 1:20, verbose=FALSE)
t2 = Sys.time()
seurat3_time = c(seurat3_time, as.numeric(difftime(t2,t1,units='secs')))
print(tail(seurat3_time,n=1))
#actually running Conos
t1 = Sys.time()
con$buildGraph()
con$embedGraph()
t2 = Sys.time()
conos_time = c(conos_time, as.numeric(difftime(t2,t1,units='secs')))
print(tail(conos_time,n=1))
}
#write the run times out
fid = file('benchmark-times/seurat3.txt')
writeLines(as.character(seurat3_time),fid)
close(fid)
fid = file('benchmark-times/conos.txt')
writeLines(as.character(conos_time),fid)
close(fid)
[1] 10 1024 cells, 5000 genes; normalizing ... using plain model winsorizing ... log scale ... done. calculating variance fit ... using gam 600 overdispersed genes ... 600 persisting ... done. running PCA using 5000 OD genes .... done running tSNE using 4 cores: 1024 cells, 5000 genes; normalizing ... using plain model winsorizing ... log scale ... done. calculating variance fit ... using gam 614 overdispersed genes ... 614 persisting ... done. running PCA using 5000 OD genes .... done running tSNE using 4 cores: [1] 25.53425 found 0 out of 1 cached CPCA space pairs ... running 1 additional CPCA space pairs . done inter-sample links using mNN . done local pairs local pairs done building graph ..done Estimating embeddings. [1] 22.90113 [1] 11 2048 cells, 5000 genes; normalizing ... using plain model winsorizing ... log scale ... done. calculating variance fit ... using gam 829 overdispersed genes ... 829 persisting ... done. running PCA using 5000 OD genes .... done running tSNE using 4 cores: 2048 cells, 5000 genes; normalizing ... using plain model winsorizing ... log scale ... done. calculating variance fit ... using gam 845 overdispersed genes ... 845 persisting ... done. running PCA using 5000 OD genes .... done running tSNE using 4 cores: [1] 33.35196 found 0 out of 1 cached CPCA space pairs ... running 1 additional CPCA space pairs . done inter-sample links using mNN . done local pairs local pairs done building graph ..done Estimating embeddings. [1] 31.27305 [1] 12 4096 cells, 5000 genes; normalizing ... using plain model winsorizing ... log scale ... done. calculating variance fit ... using gam 1076 overdispersed genes ... 1076 persisting ... done. running PCA using 5000 OD genes .... done running tSNE using 4 cores: 4096 cells, 5000 genes; normalizing ... using plain model winsorizing ... log scale ... done. calculating variance fit ... using gam 1079 overdispersed genes ... 1079 persisting ... done. running PCA using 5000 OD genes .... done running tSNE using 4 cores: [1] 78.77169 found 0 out of 1 cached CPCA space pairs ... running 1 additional CPCA space pairs . done inter-sample links using mNN . done local pairs local pairs done building graph ..done Estimating embeddings. [1] 55.41871 [1] 13 8192 cells, 5000 genes; normalizing ... using plain model winsorizing ... log scale ... done. calculating variance fit ... using gam 1279 overdispersed genes ... 1279 persisting ... done. running PCA using 5000 OD genes .... done running tSNE using 4 cores: 8192 cells, 5000 genes; normalizing ... using plain model winsorizing ... log scale ... done. calculating variance fit ... using gam 1280 overdispersed genes ... 1280 persisting ... done. running PCA using 5000 OD genes .... done running tSNE using 4 cores: [1] 155.4581 found 0 out of 1 cached CPCA space pairs ... running 1 additional CPCA space pairs . done inter-sample links using mNN . done local pairs local pairs done building graph ..done Estimating embeddings. [1] 69.75872 [1] 14 16384 cells, 5000 genes; normalizing ... using plain model winsorizing ... log scale ... done. calculating variance fit ... using gam 1489 overdispersed genes ... 1489 persisting ... done. running PCA using 5000 OD genes .... done running tSNE using 4 cores: 16384 cells, 5000 genes; normalizing ... using plain model winsorizing ... log scale ... done. calculating variance fit ... using gam 1503 overdispersed genes ... 1503 persisting ... done. running PCA using 5000 OD genes .... done running tSNE using 4 cores: [1] 623.4906 found 0 out of 1 cached CPCA space pairs ... running 1 additional CPCA space pairs . done inter-sample links using mNN . done local pairs local pairs done building graph ..done Estimating embeddings. [1] 144.6998