Open In Colab

ISB-CGC Community Notebooks

Check out more notebooks at our Community Notebooks Repository!

Title:   How to Use ISB-CGC APIs
Author:  Lauren Hagen
Created: 2019-09-16
Purpose: Introduction to using ISB-CGC APIs with Python
URL:     https://github.com/isb-cgc/Community-Notebooks/blob/master/Notebooks/How_to_use_ISB-CGC_APIs.ipynb
Notes:

How to Use ISB-CGC APIs

Overview of Notebook

This notebook is designed as a quick introduction to the ISB-CGC APIs and how to access them with Python.

Topic Covered:

  • Overviews of APIs, Swagger, JSON, endpoints
  • Use Cases for ISB-CGC APIs
  • Examples of ISB-CGC API endpoints

About ISB-CGC APIs

ISB-CGC has created several APIs to interact with ISB-CGC and user data available on Google Cloud Platforms. They were created with Google’s OpenAPI Endpoints and can be accessed through a SwaggerUI interface. For more information on ISB-CGC APIs, please visit our documentation.

Overview of APIs

An API or application-programming interface is a software intermediary that allows two applications to talk to each other. In other words, an API is the messenger that delivers your request to the provider that you’re requesting it from and then delivers the response back to you (Wikipedia). Each action that an API can take is called an "endpoint".

Some useful tutorials and quick start guides on APIs are:

What is SwaggerUI?

SwaggerUI is a user interface that allows users to try out the APIs and view their documentation easily. A tutorial on how to use the ISB-CGC APIs on the SwaggerUI can be found here.

What is JSON?

JSON or JavaScript Object Notation is a lightweight data-interchange format that is easy for humans and machines to work with. More information can be found at json.org.

What is an endpoint?

An endpoint is the call for a specific functionally of an API. For example, /data/availabile at the end of the API request URL https://api-dot-isb-cgc.appspot.com/v4/data/available is an endpoint that returns (or GETs) information about the available programs and data sets.

Python library requests

In order to use the ISB-CGC APIs with Python, the requests library needs to be installed and then imported.

In [0]:
# Install requests if needed
# pip install requests

# Import the requests library
import requests

Use cases for ISB-CGC APIs

The ISB-CGC APIs can be used for a number of different tasks for interacting with the Google Cloud Platform and BigQuery. It can be used to subset data into cohorts or to access cohorts that have been created using the WebApp. It can also be used to interact with the user's GCP to retrieve available user projects along with registering projects with ISB-CGC.

Example: about Endpoint

We are first going to explore the about endpoint using the 'get' request to the API. This API will give you information about the ISB-CGC API such a link to the Swagger UI interface and the documentation.

In [0]:
# First submit the 'get' request to the API
about_req = requests.get('https://api-dot-isb-cgc.appspot.com/v4/about')

Now that we have the request response, we are going to check that we didn't receive an error code or if the request was successful. If the request was successful, then the status code will come back as 200 but if something went wrong then the status code may be something 404 or 503. If you have recieved any error codes, you can check out Google's Troubleshooting response errors guide.

In [0]:
# Check that there wasn't an error with the request
if about_req.status_code != 200:
  # Print the error code if something went wrong
  print(about_req.status_code)

Finally, we will print out the information that we have received from the API. This response returns as a dictionary though responses can also be a combination of dictionaries and lists depending on which endpoint is called. This means that you can access different data in the response the same way that you would access dictionaries and lists as demonstarted below.

In [4]:
# Print the full response
print("Full response:\n")
print(about_req.json(), end='\n\n')

# Print the message portion of the response
print("Message:\n")
print(about_req.json()['message'], end='\n\n')

# Print the documentation portion of the response
print("Documentation:\n")
print(about_req.json()['documentation'])
Full response:

{'code': 200, 'documentation': 'SwaggerUI interface available at <https://api-dot-isb-cgc.appspot.com/v4/swagger/>.Documentation available at <https://isb-cancer-genomics-cloud.readthedocs.io/en/latest/sections/progapi/progAPI-v4/Programmatic-Demo.html>', 'message': 'Welcome to the ISB-CGC API, Version 4.'}

Message:

Welcome to the ISB-CGC API, Version 4.

Documentation:

SwaggerUI interface available at <https://api-dot-isb-cgc.appspot.com/v4/swagger/>.Documentation available at <https://isb-cancer-genomics-cloud.readthedocs.io/en/latest/sections/progapi/progAPI-v4/Programmatic-Demo.html>

That wasn't difficult at all! Next we will cover a few of the other information APIs.

Example: /data/available Endpoint

The /data/available Endpoint is designed to return the data sets and programs available on the WebApp along with the projects or studies that are within those data sets and programs. This endpoint returns a more complicated JSON object which has a combination of lists and dictionaries. We will first retrieve the request and then view if there was an error code within the response.

In [0]:
# Retrieve the response from the API endpoint
programs_req = requests.get('https://api-dot-isb-cgc.appspot.com/v4/data/available')

# Check that there wasn't an error with the request
if programs_req.status_code != 200:
  # Print the error code if something went wrong
  print(programs_req.status_code)

We are going to use the library json in order to view the response more easily.

In [0]:
# Install requests if needed
# pip install requests

# install pip json
import json
In [16]:
# Create a variable with the JSON output
program_json = json.dumps(programs_req.json(), sort_keys=True, indent=4)

# Print the program JSON text
print(program_json)
{
    "code": 200,
    "datasets_for_registration": "None found",
    "programs_for_cohorts": [
        {
            "description": null,
            "name": "TCGA",
            "program_privacy": "Public",
            "projects": [
                {
                    "description": null,
                    "name": "ACC"
                },
                {
                    "description": null,
                    "name": "DLBC"
                },
                {
                    "description": null,
                    "name": "READ"
                },
                {
                    "description": null,
                    "name": "GBM"
                },
                {
                    "description": null,
                    "name": "LGG"
                },
                {
                    "description": null,
                    "name": "THCA"
                },
                {
                    "description": null,
                    "name": "STAD"
                },
                {
                    "description": null,
                    "name": "UCEC"
                },
                {
                    "description": null,
                    "name": "PCPG"
                },
                {
                    "description": null,
                    "name": "CESC"
                },
                {
                    "description": null,
                    "name": "UCS"
                },
                {
                    "description": null,
                    "name": "TGCT"
                },
                {
                    "description": null,
                    "name": "LIHC"
                },
                {
                    "description": null,
                    "name": "CHOL"
                },
                {
                    "description": null,
                    "name": "HNSC"
                },
                {
                    "description": null,
                    "name": "UVM"
                },
                {
                    "description": null,
                    "name": "SKCM"
                },
                {
                    "description": null,
                    "name": "COAD"
                },
                {
                    "description": null,
                    "name": "PAAD"
                },
                {
                    "description": null,
                    "name": "THYM"
                },
                {
                    "description": null,
                    "name": "LUSC"
                },
                {
                    "description": null,
                    "name": "MESO"
                },
                {
                    "description": null,
                    "name": "OV"
                },
                {
                    "description": null,
                    "name": "ESCA"
                },
                {
                    "description": null,
                    "name": "SARC"
                },
                {
                    "description": null,
                    "name": "KIRP"
                },
                {
                    "description": null,
                    "name": "BLCA"
                },
                {
                    "description": null,
                    "name": "LAML"
                },
                {
                    "description": null,
                    "name": "PRAD"
                },
                {
                    "description": null,
                    "name": "LUAD"
                },
                {
                    "description": null,
                    "name": "BRCA"
                },
                {
                    "description": null,
                    "name": "KIRC"
                },
                {
                    "description": null,
                    "name": "KICH"
                }
            ]
        },
        {
            "description": null,
            "name": "CCLE",
            "program_privacy": "Public",
            "projects": [
                {
                    "description": "Controls",
                    "name": "CNTL"
                },
                {
                    "description": "FFPE Pilot Phase II",
                    "name": "FPPP"
                },
                {
                    "description": "Sarcoma",
                    "name": "SARC"
                },
                {
                    "description": "Skin Cutaneous Melanoma",
                    "name": "SKCM"
                },
                {
                    "description": "Mesothelioma",
                    "name": "MESO"
                },
                {
                    "description": "Acute Myeloid Leukemia",
                    "name": "LAML"
                },
                {
                    "description": "Pheochromocytoma and Paraganglioma",
                    "name": "PCPG"
                },
                {
                    "description": "Prostate Adenocarcinoma",
                    "name": "PRAD"
                },
                {
                    "description": "Kidney Renal Clear Cell Carcinoma",
                    "name": "KIRC"
                },
                {
                    "description": "Esophageal Carcinoma",
                    "name": "ESCA"
                },
                {
                    "description": "Brain Lower Grade Glioma",
                    "name": "LGG"
                },
                {
                    "description": "Lung Adenocarcinoma",
                    "name": "LUAD"
                },
                {
                    "description": "Pancreatic Adenocarcinoma",
                    "name": "PAAD"
                },
                {
                    "description": "Kidney Chromophobe",
                    "name": "KICH"
                },
                {
                    "description": "Chronic Lymphocytic Leukemia",
                    "name": "LCLL"
                },
                {
                    "description": "Kidney Renal Papillary Cell Carcinoma",
                    "name": "KIRP"
                },
                {
                    "description": "Glioblastoma Multiforme",
                    "name": "GBM"
                },
                {
                    "description": "Miscellaneous",
                    "name": "MISC"
                },
                {
                    "description": "Lung Squamous Cell Carcinoma",
                    "name": "LUSC"
                },
                {
                    "description": "Thymoma",
                    "name": "THYM"
                },
                {
                    "description": "Head and Neck Squamous Cell Carcinoma",
                    "name": "HNSC"
                },
                {
                    "description": "Testicular Germ Cell Tumors",
                    "name": "TGCT"
                },
                {
                    "description": "Bladder Urothelial Carcinoma",
                    "name": "BLCA"
                },
                {
                    "description": "Thyroid Carcinoma",
                    "name": "THCA"
                },
                {
                    "description": "Uterine Carcinosarcoma",
                    "name": "UCS"
                },
                {
                    "description": "Cholangiocarcinoma",
                    "name": "CHOL"
                },
                {
                    "description": "Multiple Myeloma",
                    "name": "MM"
                },
                {
                    "description": "Breast Invasive Carcinoma",
                    "name": "BRCA"
                },
                {
                    "description": "Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma",
                    "name": "CESC"
                },
                {
                    "description": "Lymphoid Neoplasm Diffuse Large B-cell Lymphoma",
                    "name": "DLBC"
                },
                {
                    "description": "Uveal Melanoma",
                    "name": "UVM"
                },
                {
                    "description": "Liver Hepatocellular Carcinoma",
                    "name": "LIHC"
                },
                {
                    "description": "Colon Adenocarcinoma",
                    "name": "COAD"
                },
                {
                    "description": "Rectum Adenocarcinoma",
                    "name": "READ"
                },
                {
                    "description": "Uterine Corpus Endometrial Carcinoma",
                    "name": "UCEC"
                },
                {
                    "description": "Ovarian Serous Cystadenocarcinoma",
                    "name": "OV"
                },
                {
                    "description": "Stomach Adenocarcinoma",
                    "name": "STAD"
                },
                {
                    "description": "Adrenocortical Carcinoma",
                    "name": "ACC"
                },
                {
                    "description": "Chronic Myelogenous Leukemia",
                    "name": "LCML"
                }
            ]
        },
        {
            "description": null,
            "name": "TARGET",
            "program_privacy": "Public",
            "projects": [
                {
                    "description": "Rhabdoid Tumor",
                    "name": "RT"
                },
                {
                    "description": "Neuroblastoma",
                    "name": "NBL"
                },
                {
                    "description": "Clear Cell Sarcoma of the Kidney",
                    "name": "CCSK"
                },
                {
                    "description": "High-Risk Wilms Tumor",
                    "name": "WT"
                },
                {
                    "description": "Osteosarcoma",
                    "name": "OS"
                },
                {
                    "description": "Acute Lymphoblastic Leukemia - Phase II",
                    "name": "ALL-P2"
                },
                {
                    "description": "Acute Lymphoblastic Leukemia - Phase I",
                    "name": "ALL-P1"
                },
                {
                    "description": "Acute Myeloid Leukemia",
                    "name": "AML"
                }
            ]
        }
    ]
}

We can now see easily that our information that we are interested in is a combination of a dictionaries and lists. Next we will we will iterate over the JSON object to neatly return the data sets/programs along with which projects/studies are available.

In [17]:
# Create a variable with the dataset information
datasets = programs_req.json()['programs_for_cohorts']

# Create an empty dictionary for the program names
programs = {}

# For each dataset, create a list of available programs and add it to the
# dictionary with the dataset name.
for data in datasets:
  # Create a blank list for the projects
  projects = []
  # for each project, add it to the list of projects for the data set
  for project in data['projects']:
    projects.append(project['name'])
  # Add the program name and list of projects to the dictionary
  programs[data['name']] = projects
  # Print the name of the program and the number of projects available
  print("The {} program has {} projects.".format(data['name'], len(projects)))
The TCGA program has 33 projects.
The CCLE program has 39 projects.
The TARGET program has 8 projects.

We now have an easy dictionary of programs with lists of the projects for each program. Let us look at which projects are available in the TCGA data set.

In [18]:
for project in programs['TCGA']:
  print(project)
ACC
DLBC
READ
GBM
LGG
THCA
STAD
UCEC
PCPG
CESC
UCS
TGCT
LIHC
CHOL
HNSC
UVM
SKCM
COAD
PAAD
THYM
LUSC
MESO
OV
ESCA
SARC
KIRP
BLCA
LAML
PRAD
LUAD
BRCA
KIRC
KICH

Wow, that is a lot of projects/studies available within the TCGA data set. Descriptions of the different data sets and programs can be found in our documentation.

Example: cohort Endpoint

This last section will cover the get cohorts Endpoint which requires authorization before submitting the request to the API. This endpoint retrieves information about user generated cohorts from within the WebApp or with the ISB-CGC API.

Notes on Authorization and Credentials

In order to use several of the ISB-CGC APIs, you need have authorization with ISB-CGC.

The following steps are required to use an API that Requires Authorization:

  1. Create Google Cloud Project set up*
  2. Register with the ISB-CGC WebApp*
  3. Create a Credential File on your local machine by using the isb_auth.py script from the ISB-CGC-API Repository
    • This script can be run from the command line or from within Python but has to occur on your local machine.
  4. Find the location of the Credential File on your local machine
    • By default, it will save the file in the users folder of your local machine with the file name: ".isb_credentials"
  5. Load the Credential file into the cloud environment you are using (if needed)

*The 'Quick Start Guide to ISB-CGC' Notebook in the Community Notebook Repository and the How to Get Started on ISB-CGC can assist you with these steps.

In [0]:
# If you skipped earlier sections, you will need these two packages to run the
# code below
# Install requests if needed
#pip install requests

# Install pip json
#import json

# Import the requests library
#import requests
In [0]:
# Import files helper for Colab
from google.colab import files
In [20]:
# Upload your credentials to the cloud environment
uploaded = files.upload()
Upload widget is only available when the cell has been executed in the current browser session. Please rerun this cell to enable.
Saving .isb_credentials to .isb_credentials (2)

Now that we have the Credentials file created and uploaded to the cloud environment, we can open the file to create the header information need for the API to verify that you have Authorization.

In [0]:
# Open the credentials file
token = open(".isb_credentials", "r")
# Create a json object from teh credential file
token = json.loads(token.read())
# Get Credentials from the token
creds = token['token_response']['id_token']
# Create a json object for requests header
head = {'Authorization': 'Bearer ' + creds}

Note: the credentials file will expire after 1 hour and a new one will need to be generated. If a new file is not generated with the isb_auth script, you can delete the original file and try running the script again.

If you are having any issues, you can contact us at [email protected]

Finally, we can make a get request to the cohorts ISB-CGC API.

In [0]:
# Make API request
cohort_req = requests.get('https://api-dot-isb-cgc.appspot.com/v4/cohorts', headers=head)

Now we can format the response for easy view and view it.

In [24]:
cohorts_json = json.dumps(cohort_req.json(), sort_keys=True, indent=4)
print(cohorts_json)
{
    "code": 200,
    "data": [
        {
            "filters": {
                "TCGA": [
                    {
                        "name": "gender",
                        "program": "TCGA",
                        "value": "FEMALE"
                    },
                    {
                        "name": "age_at_diagnosis",
                        "program": "TCGA",
                        "value": "70 to 79"
                    }
                ]
            },
            "id": 1962,
            "name": "Test 1",
            "permission": "OWNER"
        },
        {
            "filters": {},
            "id": 1,
            "name": "All TCGA Data",
            "permission": "READER"
        },
        {
            "filters": {
                "TCGA": [
                    {
                        "name": "tumor_tissue_site",
                        "program": "TCGA",
                        "value": "Breast"
                    },
                    {
                        "name": "sample_type",
                        "program": "TCGA",
                        "value": "01"
                    },
                    {
                        "name": "vital_status",
                        "program": "TCGA",
                        "value": "Alive"
                    }
                ]
            },
            "id": 1972,
            "name": "Test 2",
            "permission": "OWNER"
        },
        {
            "filters": {
                "TCGA": [
                    {
                        "name": "disease_code",
                        "program": "TCGA",
                        "value": "PAAD"
                    }
                ]
            },
            "id": 2217,
            "name": "Test 4",
            "permission": "OWNER"
        }
    ]
}

Then we can retrieve the contents of the response and view which cohorts have been created.

In [25]:
# Create a variable with the dataset information
cohorts = cohort_req.json()['data']
print(cohorts)
[{'filters': {'TCGA': [{'name': 'gender', 'program': 'TCGA', 'value': 'FEMALE'}, {'name': 'age_at_diagnosis', 'program': 'TCGA', 'value': '70 to 79'}]}, 'id': 1962, 'name': 'Test 1', 'permission': 'OWNER'}, {'filters': {}, 'id': 1, 'name': 'All TCGA Data', 'permission': 'READER'}, {'filters': {'TCGA': [{'name': 'tumor_tissue_site', 'program': 'TCGA', 'value': 'Breast'}, {'name': 'sample_type', 'program': 'TCGA', 'value': '01'}, {'name': 'vital_status', 'program': 'TCGA', 'value': 'Alive'}]}, 'id': 1972, 'name': 'Test 2', 'permission': 'OWNER'}, {'filters': {'TCGA': [{'name': 'disease_code', 'program': 'TCGA', 'value': 'PAAD'}]}, 'id': 2217, 'name': 'Test 4', 'permission': 'OWNER'}]

We can then see what filters were applied by choosing the number of the cohort you wish to see.

In [26]:
# View the names of the cohorts
for k in cohorts:
    print(k['name'])
Test 1
All TCGA Data
Test 2
Test 4
In [27]:
# View the filters that have been applied to the first cohort
cohorts[0]['filters']
Out[27]:
{'TCGA': [{'name': 'gender', 'program': 'TCGA', 'value': 'FEMALE'},
  {'name': 'age_at_diagnosis', 'program': 'TCGA', 'value': '70 to 79'}]}