download_data
Pulls all of the necessary data from the net and constructs the file tree and data objects used in the rest of the analysis.
get_all_MAFs
Script to download and process updated MAF files from the TCGA Data Portal.
get_updated_clinical
Script to download and process updated clinical data from the TCGA Data Portal.
(There are dependencies among these, run them in order.)
HPV_Process_Data
Compile HPV status for all patient tumors.
Calculate global variables and meta features in the HPV- background.
binarize_clinical
Process clinical variables into binary matrix for use in prognostic screens.
Prognostic_Screen
Run the primary prognostic screen for HPV- HNSCC patients.
Secondary_Screen
Run the prognostic screen for HPV- HNSCC patients with the TP53-3p event.
HNSCC_figures
Generate some of the figure panels for the HNSCC discovery cohort. Some of the other figures and figure panels are generated inline with analysis.
UPMC_cohort
Validation of primary findings in independent patient cohort from University of Pittsburgh (Stansky et al.).
Validation of molecular associations in recent TCGA samples.
Reviewer_Response
Specific responses to reviewer comments.
HNSCC_clinical_characterization
Overview of clinical variables in the TCGA HNSCC cohort and their implications towards patient prognosis.
TP53_exploration
Detailed characterization of TP53 mutations and their predicted functional impact.
HPV_characterization
Detailed characterization of the clinical and molecular coorelates of HPV+ status.
copy_number_exploration
Exploration of chromosomal instability, 3p deletion, TP53 mutation and the relationships between these factors.
Clinical_Covariates
Exploration of primary subtypes within the context of a number of clinical variables.
Multivariate_Modeling
Exploration of primary subtypes within the context of a few different multivarite models including clinical variables.
This requires a number of additional dependencies for sequencing analysis and as well as function calls to proprietary software installed on our virtual machine hosed by Annai Systems. We have included all of the dependencies of this mutation calling step in the supplement as MAF files and highly recomend starting with these as opposed to recalling mutations.
muTect_streamline
This script is used to generate bash scripts to download and process additional TCGA data from CGHub.
new_data_process_TP53_Pancancer
Here we process the SNV and indel calls made by the variant calling tools, annotate them and consolidate them into a MAF file.