Datasets2Tools API Manual

Denis Torre

September 20, 2017

1. Overview

This notebook explains how to extract data from the Datasets2Tools API using Python. The notebook can be downloaded at the following GitHub page: https://github.com/denis-torre/datasets2tools/tree/master/api.

Basics
  • The Datasets2Tools search API can be accessed at the following URL: http://amp.pharm.mssm.edu/datasets2tools/api/search.
  • Searches are refined by adding several parameters, which are explained in more detail below.
  • The API returns a list of JSON objects containing information about the search results.
Object Types

The Datasets2Tools API can be used to search three types of objects:

More detailed explanation on searching these objects is available below.

Demo

Here is an example of search results for the analyses endpoint.

In [1]:
# Import modules
import json
import requests
import pandas as pd

# Get API URL
url = 'http://amp.pharm.mssm.edu/datasets2tools/api/search'

# Search 5 analyses
data = {
    'object_type': 'canned_analysis',
    'page_size': 5
}

# Get response
response = requests.post(url, params=data)

# Read response
results = json.loads(response.text)

# Convert to dataframe
results_dataframe = pd.DataFrame(results)
results_dataframe
Out[1]:
canned_analysis_accession canned_analysis_description canned_analysis_title canned_analysis_url datasets date metadata tools
0 DCA00000024 Highly interactive web-based heatmap visualiza... Interactive heatmap visualization of RNA-seq d... http://amp.pharm.mssm.edu/datasets2tools/analy... [GSE16256] September 20, 2017 {} [ARCHS4]
1 DCA00000025 Highly interactive web-based heatmap visualiza... Interactive heatmap visualization of RNA-seq d... http://amp.pharm.mssm.edu/datasets2tools/analy... [GSE17312] September 20, 2017 {} [ARCHS4]
2 DCA00000026 Highly interactive web-based heatmap visualiza... Interactive heatmap visualization of RNA-seq d... http://amp.pharm.mssm.edu/datasets2tools/analy... [GSE18927] September 20, 2017 {} [ARCHS4]
3 DCA00000027 Highly interactive web-based heatmap visualiza... Interactive heatmap visualization of RNA-seq d... http://amp.pharm.mssm.edu/datasets2tools/analy... [GSE22959] September 20, 2017 {} [ARCHS4]
4 DCA00000028 Highly interactive web-based heatmap visualiza... Interactive heatmap visualization of RNA-seq d... http://amp.pharm.mssm.edu/datasets2tools/analy... [GSE24565] September 20, 2017 {} [ARCHS4]

2. Search Examples

For convenience, we define a function to search the API and return a pandas DataFrame.

In [2]:
# Import modules
import json
import requests
import pandas as pd

def search_datasets2tools(search_options):
    
    # Get API URL
    url = 'http://amp.pharm.mssm.edu/datasets2tools/api/search'

    # Get response
    response = requests.post(url, params=search_options)

    try:
        # Read response
        results_dict = json.loads(response.text)

        # Convert to dataframe
        results_dataframe = pd.DataFrame(results_dict)
        
        # Set index
        results_dataframe.set_index(search_options['object_type']+'_accession', inplace=True)
        
        return results_dataframe
        
    except:
        
        return 'Sorry, there has been an error.'

2.1 Canned Analyses

We can search canned analyses by text, dataset, tool, or metadata tags.

2.1.1 By Text

Search all canned analyses that contain the keyword prostate cancer.

In [3]:
results = search_datasets2tools({'object_type': 'canned_analysis',
                                 'q': 'prostate cancer'})
results.head()
Out[3]:
canned_analysis_description canned_analysis_title canned_analysis_url datasets date metadata tools
canned_analysis_accession
DCA00000060 Highly interactive web-based heatmap visualiza... Interactive heatmap visualization of RNA-seq d... http://amp.pharm.mssm.edu/datasets2tools/analy... [GSE35126] September 20, 2017 {} [ARCHS4]
DCA00000123 Highly interactive web-based heatmap visualiza... Interactive heatmap visualization of RNA-seq d... http://amp.pharm.mssm.edu/datasets2tools/analy... [GSE39509] September 20, 2017 {} [ARCHS4]
DCA00000139 Highly interactive web-based heatmap visualiza... Interactive heatmap visualization of RNA-seq d... http://amp.pharm.mssm.edu/datasets2tools/analy... [GSE40050] September 20, 2017 {} [ARCHS4]
DCA00000262 Highly interactive web-based heatmap visualiza... Interactive heatmap visualization of RNA-seq d... http://amp.pharm.mssm.edu/datasets2tools/analy... [GSE43986] September 20, 2017 {} [ARCHS4]
DCA00000448 Highly interactive web-based heatmap visualiza... Interactive heatmap visualization of RNA-seq d... http://amp.pharm.mssm.edu/datasets2tools/analy... [GSE48403] September 20, 2017 {} [ARCHS4]
2.1.2 By Dataset

Search all canned analyses associated to GEO dataset GSE775.

In [4]:
results = search_datasets2tools({'object_type': 'canned_analysis',
                                 'dataset_accession': 'GSE775'})
results.head()
Out[4]:
canned_analysis_description canned_analysis_title canned_analysis_url datasets date metadata tools
canned_analysis_accession
DCA00000002 An enrichment analysis was performed on the to... Enrichment analysis of genes downregulated in ... http://amp.pharm.mssm.edu/Enrichr/enrich?datas... [GSE775] September 19, 2017 {u'do_id': u'DOID:9408', u'cell_type': u'Heart... [Enrichr]
DCA00000003 An enrichment analysis was performed on the to... Enrichment analysis of genes upregulated in ac... http://amp.pharm.mssm.edu/Enrichr/enrich?datas... [GSE775] September 19, 2017 {u'do_id': u'DOID:9408', u'cell_type': u'Heart... [Enrichr]
DCA00000004 An enrichment analysis was performed on the to... Enrichment analysis of genes downregulated in ... http://amp.pharm.mssm.edu/Enrichr/enrich?datas... [GSE775] September 19, 2017 {u'do_id': u'DOID:9408', u'cell_type': u'Heart... [Enrichr]
DCA00000005 An enrichment analysis was performed on the to... Enrichment analysis of genes upregulated in ac... http://amp.pharm.mssm.edu/Enrichr/enrich?datas... [GSE775] September 19, 2017 {u'do_id': u'DOID:9408', u'cell_type': u'Heart... [Enrichr]
DCA00000006 The L1000 database was queried in order to ide... Small molecules which mimic acute myocardial i... http://amp.pharm.mssm.edu/L1000CDS2/#/result/5... [GSE775] September 19, 2017 {u'do_id': u'DOID:9408', u'direction': u'mimic... [L1000CDS2]
2.1.3 By Tool

Search all canned analyses generated by Enrichr.

In [5]:
results = search_datasets2tools({'object_type': 'canned_analysis',
                                 'tool_name': 'Enrichr'})
results.head()
Out[5]:
canned_analysis_description canned_analysis_title canned_analysis_url datasets date metadata tools
canned_analysis_accession
DCA00000002 An enrichment analysis was performed on the to... Enrichment analysis of genes downregulated in ... http://amp.pharm.mssm.edu/Enrichr/enrich?datas... [GSE775] September 19, 2017 {u'do_id': u'DOID:9408', u'cell_type': u'Heart... [Enrichr]
DCA00000003 An enrichment analysis was performed on the to... Enrichment analysis of genes upregulated in ac... http://amp.pharm.mssm.edu/Enrichr/enrich?datas... [GSE775] September 19, 2017 {u'do_id': u'DOID:9408', u'cell_type': u'Heart... [Enrichr]
DCA00000004 An enrichment analysis was performed on the to... Enrichment analysis of genes downregulated in ... http://amp.pharm.mssm.edu/Enrichr/enrich?datas... [GSE775] September 19, 2017 {u'do_id': u'DOID:9408', u'cell_type': u'Heart... [Enrichr]
DCA00000005 An enrichment analysis was performed on the to... Enrichment analysis of genes upregulated in ac... http://amp.pharm.mssm.edu/Enrichr/enrich?datas... [GSE775] September 19, 2017 {u'do_id': u'DOID:9408', u'cell_type': u'Heart... [Enrichr]
DCA00059407 An enrichment analysis was performed on the to... Enrichment analysis of genes downregulated in ... http://amp.pharm.mssm.edu/Enrichr/enrich?datas... [GSE775] September 20, 2017 {u'do_id': u'DOID:9408', u'cell_type': u'Heart... [Enrichr]
2.1.4 By Metadata

Search all canned analyses with the colon cancer disease name.

In [6]:
results = search_datasets2tools({'object_type': 'canned_analysis',
                                 'disease_name': 'colon cancer'})
results.head()
Out[6]:
canned_analysis_description canned_analysis_title canned_analysis_url datasets date metadata tools
canned_analysis_accession
DCA00032919 The analysis explores the gene interaction net... Interaction network and enrichment analysis of... http://genemania.org/#/search/mouse/Lgals6|Guc... [GSE2178] September 20, 2017 {u'do_id': u'DOID:219', u'cell_type': u'Intest... [GeneMANIA]
DCA00032920 The analysis explores the gene interaction net... Interaction network and enrichment analysis of... http://genemania.org/#/search/mouse/Slpi|Gcnt2... [GSE2178] September 20, 2017 {u'do_id': u'DOID:219', u'cell_type': u'Intest... [GeneMANIA]
DCA00033223 The analysis explores the gene interaction net... Interaction network and enrichment analysis of... http://genemania.org/#/search/human/RPS4Y1|NDR... [GSE4107] September 20, 2017 {u'do_id': u'DOID:219', u'cell_type': u'Intest... [GeneMANIA]
DCA00033224 The analysis explores the gene interaction net... Interaction network and enrichment analysis of... http://genemania.org/#/search/human/FOS|SH3KBP... [GSE4107] September 20, 2017 {u'do_id': u'DOID:219', u'cell_type': u'Intest... [GeneMANIA]
DCA00033763 The analysis explores the gene interaction net... Interaction network and enrichment analysis of... http://genemania.org/#/search/human/RPS26|RPL1... [GSE34299] September 20, 2017 {u'do_id': u'DOID:219', u'cell_type': u'HT29 C... [GeneMANIA]

Search all analyses generated by Enrichr on dataset GSE31106, where the geneset is upregulated.

In [7]:
results = search_datasets2tools({'object_type': 'canned_analysis',
                                 'tool_name': 'Enrichr',
                                 'dataset_accession': 'GSE31106',
                                 'geneset': 'upregulated'})
results.head()
Out[7]:
canned_analysis_description canned_analysis_title canned_analysis_url datasets date metadata tools
canned_analysis_accession
DCA00059528 An enrichment analysis was performed on the to... Enrichment analysis of genes upregulated in co... http://amp.pharm.mssm.edu/Enrichr/enrich?datas... [GSE31106] September 20, 2017 {u'do_id': u'DOID:0050861', u'cell_type': u'Co... [Enrichr]

2.2 Datasets

We can search datasets by accession, text-based search, names of tools which have analyzed them, accessions of canned analyses generated using them.

2.2.1 By Accession

Search dataset GSE775.

In [8]:
results = search_datasets2tools({'object_type': 'dataset',
                                 'dataset_accession': 'GSE775'})
results.head()
Out[8]:
analyses dataset_description dataset_landing_url dataset_title repository_name
dataset_accession
GSE775 42 Temporal analysis of acute myocardial infarcti... https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi... Myocardial infarction time course Gene Expression Omnibus
2.2.2 By Text

Search all datasets which contain the keyword asthma.

In [9]:
results = search_datasets2tools({'object_type': 'dataset',
                                 'q': 'asthma'})
results.head()
Out[9]:
analyses dataset_description dataset_landing_url dataset_title repository_name
dataset_accession
GSE43696 49 Analysis of bronchial epithelial cells from pa... https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi... Severe asthma: bronchial epithelial cell Gene Expression Omnibus
GSE31773 33 Analysis of circulating CD4+ and CD8+ T-cells ... https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi... Severe asthma: circulating CD4+ and CD8+ T-cells Gene Expression Omnibus
GSE27011 28 Analysis of white blood cells from children wi... https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi... Asthma: white blood cells Gene Expression Omnibus
GSE6858 7 Comparison of whole lungs of wild-type and rec... https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi... Asthma model: lungs Gene Expression Omnibus
GSE18965 7 Analysis of airway epithelial cells (AEC) from... https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi... Asthmatic atopic epithelium Gene Expression Omnibus
2.2.3 By Tool

Search all datasets which have been analyzed by Enrichr.

In [10]:
results = search_datasets2tools({'object_type': 'dataset',
                                 'tool_name': 'Enrichr'})
results.head()
Out[10]:
analyses dataset_description dataset_landing_url dataset_title repository_name
dataset_accession
GSE50588 294 One goal of human genetics is to understand ho... https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi... The Functional Consequences of Variation in Tr... Gene Expression Omnibus
GSE47856 119 Chemo-resistance to platinum such as cisplatin... https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi... Expression data from cultured human ovarian ca... Gene Expression Omnibus
GSE6930 119 Analysis of Ewings sarcoma A673 cells for up t... https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi... Cytosine arabinoside effect on Ewing's sarcoma... Gene Expression Omnibus
GSE7002 119 Analysis of nasal epithelia exposed to various... https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi... Formaldehyde effect on nasal epithelium: dose ... Gene Expression Omnibus
GSE35366 112 Analysis of brains from 4 models of human neur... https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi... Models of Neuronal Migration Defects: time course Gene Expression Omnibus
2.3.4 By Canned Analysis

Search all datasets which have been used to generate canned analysis DCA00000002.

In [11]:
results = search_datasets2tools({'object_type': 'dataset',
                                 'canned_analysis_accession': 'DCA00000002'})
results.head()
Out[11]:
analyses dataset_description dataset_landing_url dataset_title repository_name
dataset_accession
GSE775 42 Temporal analysis of acute myocardial infarcti... https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi... Myocardial infarction time course Gene Expression Omnibus

2.3 Tools

We can search tools by name, text-based search, accessions of analyzed datasets, accessions of canned analyses generated using them.

2.3.1 By Name

Search ARCHS4.

In [12]:
results = search_datasets2tools({'object_type': 'tool',
                                 'tool_name': 'ARCHS4'})
results.head()
Out[12]:
analyses articles tool_description tool_name
tool_accession
DCT00010052 4645 [https://doi.org/10.1101/189092] ARCHS4 provides access to gene counts from HiS... ARCHS4
2.3.2 By Text

Search all tools which contain the keyword enrichment.

In [13]:
results = search_datasets2tools({'object_type': 'tool',
                                 'q': 'enrichment'})
results.head()
Out[13]:
analyses articles tool_description tool_name
tool_accession
DCT00004702 7759 [https://doi.org/10.1093/nar/gkw377] A comprehensive gene set enrichment analysis w... Enrichr
DCT00010044 3879 [] Enrichment analysis tool implementing the prin... PAEA
DCT00000149 0 [https://doi.org/10.1093/bioinformatics/btq503] An R/C++ package to identify patterns and biol... CoGAPS
DCT00004852 0 [https://doi.org/10.1093/nar/gkx295] A web-based tool for comprehensive statistical... MicrobiomeAnalyst
DCT00002174 0 [https://doi.org/10.1093/bioinformatics/btw511] Translating PubMed and PMC texts to networks f... HiPub
2.3.3 By Dataset

Search all tools which have analyzed GEO dataset GSE775.

In [14]:
results = search_datasets2tools({'object_type': 'tool',
                                 'dataset_accession': 'GSE775'})
results.head()
Out[14]:
analyses articles tool_description tool_name
tool_accession
DCT00004702 7759 [https://doi.org/10.1093/nar/gkw377] A comprehensive gene set enrichment analysis w... Enrichr
DCT00010043 7756 [] An ultra-fast LINCS L1000 Characteristic Direc... L1000CDS2
DCT00003348 7435 [https://doi.org/10.1093/nar/gkq537] Biological network integration for gene priori... GeneMANIA
DCT00010044 3879 [] Enrichment analysis tool implementing the prin... PAEA
2.3.4 By Canned Analysis

Search all tools which have been used to generate canned analysis DCA00000002.

In [15]:
results = search_datasets2tools({'object_type': 'tool',
                                 'canned_analysis_accession': 'DCA00000002'})
results.head()
Out[15]:
analyses articles tool_description tool_name
tool_accession
DCT00004702 7759 [https://doi.org/10.1093/nar/gkw377] A comprehensive gene set enrichment analysis w... Enrichr
In [ ]: