In this notebook, we will query the SMASH DR1 catalog and apply constraints on the "sharp" and "prob" parameters to demonstrate ways to create samples of likely stars and galaxies.
We will use Datashader and Bokeh to make fast interactive plots.
The color-magnitude diagram is plotted upside-down from the usual sense due to a current limitation in the notebook's use of Datashader.
We need modules from the Bokeh library, Datashader, NumPy, and Pandas, as well as the Data Lab modules to connect to and query the database.
print "Start"
from cStringIO import StringIO
from dl import authClient
from dl import queryClient
import pandas as pd
import datashader as ds
import datashader.glyphs
import datashader.transfer_functions as tf
import bokeh.plotting as bp
from datashader.bokeh_ext import InteractiveImage
# Get the security token for the datalab demo user
token = authClient.login('anonymous')
print "Got token",token
Start Got token anonymous.0.0.anon_access
We will query the SMASH catalog over a range of fields to sample a variety of field properties. We set a constraint on the depthflag parameter to insist on detection in the deep exposures. To make the query go faster, one can restrict the range of fieldid in the query. The default field range will return roughly 6 million objects.
%%time
depth = 1 # minimum depth
raname = 'ra'
decname = 'dec'
mags = 'gmag,rmag'
dbase='smash_dr1.object'
# Create the query string.
query = ('select '+raname+','+decname+','+mags+',sharp,chi,prob from '+dbase+ \
' where (depthflag > %d and ' + \
' (gmag is not null) and ' + \
' (fieldid>55 and fieldid<70))') % \
(depth)
print "Your query is:", query
print "Making query"
# Call the Query Manager Service
response = queryClient.query(token, adql = query, fmt = 'csv')
df = pd.read_csv(StringIO(response))
print len(df), "objects found."
Your query is: select ra,dec,gmag,rmag,sharp,chi,prob from smash_dr1.object where (depthflag > 1 and (gmag is not null) and (fieldid>55 and fieldid<70)) Making query 6392274 objects found. CPU times: user 6.57 s, sys: 1.33 s, total: 7.9 s Wall time: 46.5 s
We will first make a single cut on the sharp parameter, and classify objects with sharp>0.7 as galaxies. Pandas allows us to add a Class column to the dataframe and specify that it should be considered a category. We will also add a g_r column to the Pandas dataframe.
sharpthresh=0.7
df["Class"]='Star'
df.loc[(abs(df["sharp"])>sharpthresh),"Class"]='Galaxy'
df["Class"]=df["Class"].astype('category')
df["g_r"]=df["gmag"]-df["rmag"]
df.tail()
ra | dec | gmag | rmag | sharp | chi | prob | Class | g_r | |
---|---|---|---|---|---|---|---|---|---|
6392269 | 107.060693 | -54.163880 | 99.99 | 99.99 | -0.846 | 0.78 | 0.73 | Galaxy | 0.0 |
6392270 | 107.059618 | -54.166219 | 99.99 | 99.99 | 0.965 | 0.71 | 0.92 | Galaxy | 0.0 |
6392271 | 107.058524 | -54.166433 | 99.99 | 99.99 | 0.778 | 0.83 | 0.56 | Galaxy | 0.0 |
6392272 | 107.058557 | -54.166684 | 99.99 | 99.99 | 3.260 | 0.97 | 0.96 | Galaxy | 0.0 |
6392273 | 107.059356 | -54.160246 | 99.99 | 99.99 | 0.335 | 0.59 | 0.72 | Star | 0.0 |
We now use Datashader and Bokeh to make an interactive plot of g magnitude vs. chi. Datashader assigns different colors to each category, in this case blue for objects labeled stars, and red for objects labeled galaxies. The plot shows that the sharp cut separates two sequences in the magnitude-chi plane, with galaxies having larger chi values. One might be tempted to use chi as an additional constraint. However, at bright magnitudes the chi for stars clearly increases, while at bright magnitudes the sequences overlap in chi. The sharp cut likely confuses some stars and galaxies, especially at faint magnitudes. One could increase the sharp threshold for a more complete star sample, or decrease it for a more pure one.
bp.output_notebook()
p = bp.figure(tools='pan,wheel_zoom,box_zoom,reset',x_range=(14,26), y_range=(0,5))
p.xaxis.axis_label='gmag'
p.yaxis.axis_label='chi'
def image_callback(x_range, y_range, w, h):
cvs = ds.Canvas(plot_width=w, plot_height=h, x_range=x_range, y_range=y_range)
agg = cvs.points(df, 'gmag', 'chi', ds.count_cat('Class'))
img = tf.shade(agg, how='log')
return tf.dynspread(img, how='add',threshold=0.5)
InteractiveImage(p, image_callback)
/dl1/sw/anaconda2/lib/python2.7/site-packages/datashader/transfer_functions.py:258: RuntimeWarning: invalid value encountered in true_divide r = (data.dot(rs)/total).astype(np.uint8) /dl1/sw/anaconda2/lib/python2.7/site-packages/datashader/transfer_functions.py:259: RuntimeWarning: invalid value encountered in true_divide g = (data.dot(gs)/total).astype(np.uint8) /dl1/sw/anaconda2/lib/python2.7/site-packages/datashader/transfer_functions.py:260: RuntimeWarning: invalid value encountered in true_divide b = (data.dot(bs)/total).astype(np.uint8)
Here we make an additional interactive plot featureing the g-r,g color-magnitude diagram. Note that the CMD is upside down from the usual sense.
p = bp.figure(tools='pan,wheel_zoom,box_zoom,reset',x_range=(-2,3), y_range=(14,26))
p.xaxis.axis_label='gmag-rmag'
p.yaxis.axis_label='gmag'
def image_callback(x_range, y_range, w, h):
cvs = ds.Canvas(plot_width=w, plot_height=h, x_range=x_range, y_range=y_range)
agg = cvs.points(df, 'g_r', 'gmag', ds.count_cat('Class'))
img = tf.shade(agg, how='log')
return tf.dynspread(img, how='add',threshold=0.5)
InteractiveImage(p, image_callback)
The simple sharp>0.7 cut potentially labels faint junk objects as galaxies. We will add a second cut on the prob parameter, which derives from SExtractor's Stellaricity Index, to get a cleaner sample of galaxies.
sharpthresh=0.7
probthresh=0.2
df["Class"]='Star'
df.loc[(abs(df["sharp"])>sharpthresh) & (df["prob"]<probthresh),"Class"]='Galaxy'
df["Class"]=df["Class"].astype('category')
df.tail()
ra | dec | gmag | rmag | sharp | chi | prob | Class | g_r | |
---|---|---|---|---|---|---|---|---|---|
6392269 | 107.060693 | -54.163880 | 99.99 | 99.99 | -0.846 | 0.78 | 0.73 | Star | 0.0 |
6392270 | 107.059618 | -54.166219 | 99.99 | 99.99 | 0.965 | 0.71 | 0.92 | Star | 0.0 |
6392271 | 107.058524 | -54.166433 | 99.99 | 99.99 | 0.778 | 0.83 | 0.56 | Star | 0.0 |
6392272 | 107.058557 | -54.166684 | 99.99 | 99.99 | 3.260 | 0.97 | 0.96 | Star | 0.0 |
6392273 | 107.059356 | -54.160246 | 99.99 | 99.99 | 0.335 | 0.59 | 0.72 | Star | 0.0 |
We will remake our gmag vs. chi and gmag-rmag vs. gmag plots with the new cuts. Most of the objects at the faint end are now considered stars rather than galaxies. Again, one can adjust the cuts in the hope of getting more complete or more pure samples.
bp.output_notebook()
p = bp.figure(tools='pan,wheel_zoom,box_zoom,reset',x_range=(14,26), y_range=(0,5))
p.xaxis.axis_label='gmag'
p.yaxis.axis_label='chi'
def image_callback(x_range, y_range, w, h):
cvs = ds.Canvas(plot_width=w, plot_height=h, x_range=x_range, y_range=y_range)
agg = cvs.points(df, 'gmag', 'chi', ds.count_cat('Class'))
img = tf.shade(agg, how='log')
return tf.dynspread(img, how='add',threshold=0.5)
InteractiveImage(p, image_callback)
p = bp.figure(tools='pan,wheel_zoom,box_zoom,reset',x_range=(-2,3), y_range=(14,26))
p.xaxis.axis_label='gmag-rmag'
p.yaxis.axis_label='gmag'
def image_callback(x_range, y_range, w, h):
cvs = ds.Canvas(plot_width=w, plot_height=h, x_range=x_range, y_range=y_range)
agg = cvs.points(df, 'g_r', 'gmag', ds.count_cat('Class'))
img = tf.shade(agg, how='log')
return tf.dynspread(img, how='add',threshold=0.5)
InteractiveImage(p, image_callback)