HNSCC HPV- Cohort TP53 Mutation Exploration

Here we conduct a general expration of TP53 mutations within the HNSCC discovery cohort. While we try and remain unbiased in our screen for molecular coorelates of survival, we do have much more information on TP53 mutations than most others.

In Poeta, a TP53 mutation is labeled as disruptive if it is either a stop mutation, or if is located at a binding site and induces a change in polarity of the encoded amino acid. Interestingly, we found that the polarity of the substitution had little effect on prognosis and that patients with a mutation to the L2 binding site had worse outcomes than patients with a mutation to the L3 binding site. In addition, within the context of the framework we set forth for biomarker discovery, we chose to ignore the classification of mutations (past silent/non-silent) in order to keep sample size high at the risk of false positives. For these reasons we elected to simply display the functional assignment of the mutations in Figure 1 rather than obscure these results with a classification scheme.

Import Data and Packages¶

For full list of data and packages imported see the Imports notebook.

In [1]:

import NotebookImport
from Imports import *

importing IPython notebook from Imports.ipynb
Populating the interactive namespace from numpy and matplotlib
changing to source dirctory
populating namespace with data

TP53 Mutation Clinical Coorelates¶

In [2]:

p53_mut = mut.df.ix['TP53'].ix[keepers_o].dropna().astype(int)

In [3]:

survival_and_stats(p53_mut, surv, figsize=(5,4), order=[2,1,0])

In [4]:

screen_feature(p53_mut>0, kruskal_pandas, clinical.processed.T).head()

Out[4]:

	H	p	q
spread_inferred	7.65	0.01	0.06
smoker_inferred	7.39	0.01	0.06
drinker_inferred	6.45	0.01	0.07
invasion_inferred	4.91	0.03	0.12
post_2000	0.69	0.41	1.00

In [5]:

ecs = clinical.clinical.presenceofpathologicalnodalextracapsularspread
ecs.name = 'Extra Capsular Spread'
pd.crosstab(p53_mut>0, ecs).T.plot(kind='bar', rot=15)

Out[5]:

<matplotlib.axes.AxesSubplot at 0x79a5f50>

TP53 Mutation Functional Assessment¶

Classification by functional domains¶

It is important to note, that here a patient with multiple mutation is counted multiple times.

In [6]:

import re as re
get_nums = lambda s: re.findall(r'\d+', s)

def is_disruptive(v):
    c = v.Variant_Classification
    if c != 'Missense_Mutation':
        if 'Ins' in c or 'Del' in c:
            return 'InDel'
        else:
            return v.Variant_Classification.split('_')[0]
    else:
        s = v.Protein_Change
        aa = int(get_nums(s)[0])
        if int(aa) in range(163,196):
            return 'L2'
        if int(aa) in range(236, 252):
            return 'L3'
    return 'other'

In [7]:

p53 = FH.get_submaf(run.data_path, cancer.name, ['TP53'], fields='All').ix['TP53']

In [8]:

dd = p53.apply(is_disruptive, 1)
dd = dd.replace('Silent',nan).dropna()
p53 = p53.ix[dd.index]
others = keepers_o.diff(p53.Tumor_Sample_Barcode.ix[dd.index]).intersection(mut.df.columns)
dd.index = p53.Tumor_Sample_Barcode.ix[dd.index]
dd = pd.concat([pd.Series('WT', others), dd])
dd = dd[[i in keepers_o for i in dd.index]]
pc = pd.Series(list(p53.Protein_Change), index=p53.Tumor_Sample_Barcode)
pc = pd.concat([pd.Series('WT', others), pc])
pc = pc[[i in keepers_o for i in pc.index]]

In [9]:

s2 = surv.unstack().ix[dd.index]
s2.index = range(len(dd))
s2 = s2.stack()
pats = pd.Series(dd.index, range(len(dd)))
dd.index = range(len(dd))
pc.index = range(len(dd))

In [10]:

df = pd.concat([pats, pc, dd, s2[:,'days'], s2[:,'event']],
               keys=['patient ID','Functional Class','Protien Change',
                     'Days to Death/Censoring', 'Death Indicator'],
               axis=1).sort(['patient ID'])
df = df.set_index('patient ID')
df.to_csv(FIGDIR + 'fig2b.csv')

Figure 2b¶

In [11]:

fig, ax = subplots(figsize=(3.5,2.7))
c={'WT': 'grey', 'Splice':colors[0], 'other': colors[5], 'L3': colors[1], 'L2':colors[2],
   'Nonsense': colors[3], 'InDel': colors[4]}
draw_survival_curve(dd, s2, colors=c, ax=ax)
ax.legend().set_visible(False)
prettify_ax(ax)
fig.tight_layout()
fig.savefig(FIGDIR + 'fig2b.pdf', transparent=True)

In [12]:

survival_and_stats(dd, s2, colors=colors[:6] + ['grey'] + colors[6:], figsize=(4.5,6))

Differing survival rates were observed depending on the TP53 protein domain affected by the mutation or its predicted functional status16 (log-rank P = .04, Extended Data Fig. 2).

In [13]:

get_surv_fit_lr(s2, dd[dd!='WT'])

Out[13]:

	Stats		Median Survival			5y Survival			Log-Rank
	# Patients	# Events	Median	Lower	Upper	Surv	Lower	Upper	chi2	p
									11.9	0.036
other	78	30	4	2.5	NaN	0.476	0.351	0.647
InDel	48	19	3.53	1.5	NaN	0.421	0.262	0.676
Nonsense	34	18	1.6	1.25	NaN	0.333	0.19	0.585
L3	31	12	2.16	1.5	NaN	0.229	0.0518	1
L2	30	18	1.08	0.986	NaN	0.251	0.124	0.512
Splice	17	10	1.43	0.767	NaN	0.247	0.0842	0.724

Bar Plot of Hazard Ratios for Supplement

In [14]:

dd = dd.replace('WT', 'aWT')
f = get_cox_ph(s2, dd, interactions=False)
ci = convert_robj(robjects.r.summary(f)[7])
ci.index = map(lambda s: s[7:], ci.index)
n = ci.ix[0]*0 +1
n.name = 'WT'
ci  = ci.append(n)

In [15]:

fig, ax = subplots(figsize=(7,4))
ci = ci.sort('exp(coef)')
haz = ci['exp(coef)']
b = haz.plot(kind='bar', ax=ax,
             yerr=[haz - ci['lower .95'], ci['upper .95'] - haz], ecolor='black', 
             rot=0, color=['grey', colors[5], colors[4], colors[0], colors[3],
                            colors[2], colors[1]])
prettify_ax(ax)
ax.set_ylabel('Hazard Ratio')

Out[15]:

<matplotlib.text.Text at 0x11358990>

P-values for Bar Comparisons

In [16]:

from itertools import combinations

In [17]:

sig = pd.Series({c: get_cox_ph_ms(s2, dd[dd.isin(c)], interactions=False)['LR']
                    for c in combinations(dd.unique(),2)})
sig.order()

Out[17]:

aWT       L2          4.06e-05
          Nonsense    1.83e-03
          Splice      2.61e-03
L2        other       4.88e-03
aWT       L3          1.18e-02
          InDel       1.68e-02
          other       2.74e-02
InDel     L2          3.58e-02
L3        L2          4.35e-02
Splice    other       8.19e-02
Nonsense  other       1.26e-01
          L2          1.57e-01
InDel     Splice      2.21e-01
Splice    L3          2.73e-01
InDel     Nonsense    3.87e-01
L3        other       4.57e-01
Nonsense  Splice      5.59e-01
          L3          5.66e-01
InDel     other       6.00e-01
Splice    L2          6.11e-01
InDel     L3          8.93e-01
dtype: float64

Classification of Mutations via Poeta et. al.¶

http://www.ncbi.nlm.nih.gov/pubmed/18094376
In this paper the authors sequence TP53 for 420 patients.
They classify a mutation as disruptive or non-disruptive if it:
a) contains a stop mutation
b) occurs at a binding site AND encodes an amino acid with a different polarity

In [18]:

lo = pd.read_csv('../Extra_Data/amino_acids.csv', index_col=1)
lo = lo.groupby(level=0).first()
def is_disruptive(s):
    if s.endswith('*'):
        return True
    if s.endswith('splice'):
        return False
    if 'fs' in s:
        return False
    
    aa = s[3:-1]
    try:
        if int(aa) in range(163,196) + range(236, 252):
            if lo.Polarity[s[2]] != lo.Polarity[s[-1]]:
                return True
    except:
        pass
    return False

In [19]:

p53 = FH.get_submaf(run.data_path, cancer.name, ['TP53'], fields='All').ix['TP53']
status = pd.concat([combine(p53.Protein_Change.map(is_disruptive), p53.is_silent==0), 
                    p53.Tumor_Sample_Barcode], axis=1, 
                   keys=['status','barcode'])
status = status.set_index('barcode')['status']
status = (status == 'both').groupby(level=0).sum().clip_upper(1.)
status = status.ix[mut.df.columns].fillna(-1).map({-1:'WT',0:'Non-Disruptive',1:'Disruptive'})
status = status.ix[keepers_o]

In [20]:

survival_and_stats(status, surv, colors=colors[:6] + ['grey'] + colors[6:], figsize=(7,5))

In [21]:

get_surv_fit_lr(surv, status[status.isin(['Non-Disruptive', 'WT'])])

Out[21]:

	Stats		Median Survival			5y Survival			Log-Rank
	# Patients	# Events	Median	Lower	Upper	Surv	Lower	Upper	chi2	p
									8.12	0.00437
Non-Disruptive	140	61	2.58	1.71	NaN	0.4	0.299	0.534
WT	45	10	NaN	4.71	NaN	0.664	0.494	0.893

Modified Classification of Mutations via Poeta et. al.¶

I find that the polarity of the mutation has no effect
I am also including splice-sites as disruptive
This introduces many more disruptive mutations and results in a larger effect size
In the manuscript we note that "Notably, patients with mutations predicted as non-disruptive of function nonetheless had worse prognosis than patients with wild-type TP53", thus we feel that using this modified metric is more conservative for this particular statistic, the p-value would be more extreme for the traditional definition

In [22]:

def is_disruptive_mod(s):
    if s.endswith('*'):
        return True
    if s.endswith('splice'):
        return True
    if 'fs' in s:
        return False
    
    aa = s[3:-1]
    try:
        if int(aa) in range(163,196) + range(236, 252):
            return True
    except:
        pass
    return False

In [23]:

p53 = FH.get_submaf(run.data_path, cancer.name, ['TP53'], fields='All').ix['TP53']
status = pd.concat([combine(p53.Protein_Change.map(is_disruptive_mod), p53.is_silent==0), 
                    p53.Tumor_Sample_Barcode], axis=1, 
                   keys=['status','barcode'])
status = status.set_index('barcode')['status']
status = (status == 'both').groupby(level=0).sum().clip_upper(1.)
status = status.ix[mut.df.columns].fillna(-1).map({-1:'WT',0:'Non-Disruptive',1:'Disruptive'})
status = status.ix[keepers_o]

In [24]:

survival_and_stats(status, surv, colors=colors[:6] + ['grey'] + colors[6:], figsize=(7,5))

Notably, patients with mutations predicted as non-disruptive of function nonetheless had worse prognosis than patients with wild-type TP53.

In [25]:

f = get_cox_ph(surv, status[status.isin(['Non-Disruptive', 'WT'])]=='Non-Disruptive', interactions=False, 
               print_desc=True);

        coef exp(coef) se(coef)    z     p
feature 0.79       2.2    0.353 2.24 0.025

Likelihood ratio test=5.81  on 1 df, p=0.0159  n= 150, number of events= 52

In [26]:

exp(.79), exp(.79) - exp(.79 - .353)

Out[26]:

(2.2033964262559369, 0.65534034919959683)

TP53 Mutation Hotspots in HNSCC¶

I also threw in a plot of the frequent mutations as well, can't get alot from it but it shows that they do matter, even though we don't have the power to really say anything else.
When we start looking across multiple cancers, or have 1000 person cohorts, we may be able to extract some meaningfull information by looking at the effects of individual mutations but that is a ways off.

In [27]:

cc = p53.set_index('Tumor_Sample_Barcode').Protein_Change
cc = pd.concat([pd.Series('WT', others), cc])
cc = cc[cc.isin(true_index(cc.value_counts() > 5))] 
s2 = surv.unstack().ix[cc.index]
s2.index = range(len(cc))
s2 = s2.stack()
cc.index = range(len(cc))

In [28]:

survival_and_stats(cc, s2, colors=['grey'] + colors, figsize=(7,5))