In order to build count matrices for a small number of cells
import episcanpy.api as epi
import pandas as pd
import anndata as ad
We first define two set of features for the count matrices.
It requires to both load the set of annotations and generate windows (100kb long)
promoters = epi.ct.load_features('../methylation_play_data/mouse_epd_promoters.bed') # generate features
p_names = epi.ct.name_features(promoters) # extract name of the features - produce unique
windows = epi.ct.make_windows(100000) # generate features
w_names = epi.ct.name_features(windows) # extract name of the features
Now that the opbtained the potential feature spaces we need to identify the methylation summaries of all cells necessary to build the different count matrices.
The standard input correspond to methylpy methylation summaries.
path_cells = '../methylation_play_data/'
cells = ['cell1.tsv', 'cell2.tsv', 'cell3.tsv', 'cell4.tsv', 'cell5.tsv']
epi.ct.build_count_mtx(cells,
annotation=[windows, promoters], # all annotations you want to build your matrix for
path=path_cells,
output_file=['test_windows_CG.txt', 'test_promoters_CG.txt'], # output file names
meth_context='CG', # cytosine context to consider
feature_names= [w_names, p_names], # name of the features if you want to write them down
threshold=[1, 5])# minimum number of cytosine/reads to have at any given feature to
# not consider the feature to have a missing methylation level
To build a count matrix based on cytosine not in a CG context, simply change teh meth_context argument to 'CH'
epi.ct.build_count_mtx(cells,
annotation=[windows, promoters],
path=path_cells,
output_file=['test_windows_CH.txt', 'test_promoters_CH.txt'],
meth_context='CH',
feature_names= [w_names, p_names],
threshold=[1, 5])
If you don't want to write the count matrix down but keep it as loaded matrix:
w_mtx, p_mtx = epi.ct.build_count_mtx(cells,
annotation=[windows, promoters],
path=path_cells,
output_file=None,
meth_context='CG',
threshold=[1, 5])
p_mtx.shape
To load the count matrix produced as adata object:
adata_p = ad.AnnData(p_mtx, obs=pd.DataFrame(index=cells), var=pd.DataFrame(index=p_names))
adata_p
Finally load the metadata you have on your cells and save the matrix as an AnnData object.
epi.pp.load_metadata(adata_p, '../methylation_play_data/mouse_annot_5cells_Luo17.csv')
adata_p.write('promoter_matrix_test_CG.h5ad')
For further operation on the matrices see the second part the methylation tutorials on quality controls, processing, clustering and cell type identification.