Compare raw cause fractions to GC-redistributed cause fractions

A scatter shows something (especially with log-log axes)

In [216]:
plt.figure(figsize=(10,5))
plt.plot(csmf.PR, csmf.ML, 'o', ms=15, alpha=.5)
plt.plot([1e-6,1], [1e-6,1], 'k--')
plt.loglog()
plt.xlabel('PR CSMF')
plt.ylabel('ML CSMF')
Out[216]:
<matplotlib.text.Text at 0x2af5a66526d0>

A mean-difference plot shows something different

In [217]:
plt.figure(figsize=(10,5))
mean = .5*(csmf.PR +  csmf.ML)
diff = csmf.ML - csmf.PR
plt.plot(mean, diff, 'o', ms=15, alpha=.5)
plt.plot([1e-6,1], [0,0], 'k--')
plt.semilogx()
plt.xlabel('Mean(ML, PR) CSMF')
plt.ylabel('ML - PR CSMF')
Out[217]:
<matplotlib.text.Text at 0x2af5a7263dd0>

And an interactive version of that is quick and useful:

In [222]:
plt.figure(figsize=(10,5))

mean = .5*(csmf.PR +  csmf.ML)
diff = csmf.ML - csmf.PR
rows = (mean > .001)

points, = plt.plot(100*mean[rows], 100*diff[rows], 'o', ms=15, alpha=.5)
labels = [tooltip_for(i) for i in mean[rows].index]


#rows &= pd.Series(mean.index, index=mean.index).str.contains('neo_')
#plt.plot(100*mean[rows], 100*diff[rows], 'o', ms=15, alpha=.5)

plt.plot([1e-9,100], [0,0], 'k--')
plt.semilogx()
plt.axis(xmin=.05, xmax=50)

plt.xlabel('Mean CSMF')
plt.ylabel('Redist - Raw CSMF')

tt = mpld3.plugins.PointHTMLTooltip(points, labels, hoffset=5)
mpld3.plugins.connect(plt.gcf(), tt, )
mpld3.display()
Out[222]: