Reading data into Astropy Tables

Objectives

  • Read ASCII files with a defined format
  • Learn basic operations with astropy.tables
  • Ingest header information
  • VOTables

Reading data

Our first task with python was to read a csv file using np.loadtxt(). That function has few properties to define the dlimiter of the columns, skip rows, read commented lines, convert values while reading, etc.

However, the result is an array, without the information of the metadata that file may have included (name, units, ...).

Astropy offers a ascii reader that improves many of these steps while provides templates to read common ascii files in astronomy.

In [1]:
from astropy.io import ascii
In [2]:
# Read a sample file: sources.dat
data = ascii.read("sources.dat")
data
Out[2]:
<Table length=2>
obsidredshiftXYobject
int64float64int64int64str11
31020.3241674085Q1250+568-A
8770.2243783892Source 82

Automatically, read has identified the header and the format of each column. The result is a Table object, and that brings some additional properties.

In [3]:
# Show the info of the data read
data.info
/home/dvd/.conda/envs/swc/lib/python2.7/site-packages/astropy/table/column.py:268: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  return self.data.__eq__(other)
Out[3]:
<Table length=2>
  name    dtype 
-------- -------
   obsid   int64
redshift float64
       X   int64
       Y   int64
  object   str11
In [4]:
# Get the name of the columns
data.colnames
Out[4]:
['obsid', 'redshift', 'X', 'Y', 'object']
In [5]:
# Get just the values of a particular column
data['obsid']
Out[5]:
<Column name='obsid' dtype='int64' length=2>
3102
877
In [6]:
# get the first element
data['obsid', 'redshift'][0]
Out[6]:
<Row index=0>
obsidredshift
int64float64
31020.32

Astropy can read a variety of formats easily. The following example uses a quite

In [7]:
# Read the data from the source
table = ascii.read("ftp://cdsarc.u-strasbg.fr/pub/cats/VII/253/snrs.dat",
                   readme="ftp://cdsarc.u-strasbg.fr/pub/cats/VII/253/ReadMe")
Downloading ftp://cdsarc.u-strasbg.fr/pub/cats/VII/253/snrs.dat [Done]
Downloading ftp://cdsarc.u-strasbg.fr/pub/cats/VII/253/ReadMe [Done]
In [8]:
# See the stats of the table
table.info('stats')
<Table masked=True length=274>
   name         mean           std       min  max   n_bad
---------- -------------- -------------- --- ------ -----
       SNR             --             --  --     --     0
       RAh  16.0547445255  4.15229196762   0     23     0
       RAm  28.8576642336  16.9123382708   0     59     0
       RAs   28.102189781  18.5923556505   0     59     0
       DE-             --             --  --     --     0
       DEd   33.602189781  19.4333634671   0     72     0
       DEm  29.6459854015  17.7768672558   0     59     0
   MajDiam  30.9124087591  42.1254815567 1.5  310.0     0
       ---             --             --  --     --   164
   MinDiam  23.4909090909  33.3816758266 2.0  240.0   164
 u_MinDiam             --             --  --     --   243
      type             --             --  --     --     0
 l_S(1GHz)             --             --  --     --   270
   S(1GHz)  42.6488549618  212.136906631 0.3 2720.0    12
 u_S(1GHz)             --             --  --     --   157
  Sp-Index 0.486981132075 0.158350398486 0.0    1.0    62
u_Sp-Index             --             --  --     --   154
     Names             --             --  --     --   194
/home/dvd/.conda/envs/swc/lib/python2.7/site-packages/astropy/table/info.py:94: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  if np.all(info[name] == ''):
In [9]:
# If we want to see the first 10 entries
table[0:10]
Out[9]:
<Table masked=True length=10>
SNRRAhRAmRAsDE-DEdDEmMajDiam---MinDiamu_MinDiamtypel_S(1GHz)S(1GHz)u_S(1GHz)Sp-Indexu_Sp-IndexNames
hminsdegarcminarcminarcminJy
str11int64int64int64str1int64int64float64str1float64str1str2str1float64str1float64str1str26
G000.0+00.0174544-2903.5x2.5--S--100.0?0.8?Sgr A East
G000.3+00.0174615-283815.0x8.0--S--22.0--0.6----
G000.9+00.1174721-2898.0------C--18.0?--v--
G001.0-00.1174830-2898.0------S--15.0--0.6?--
G001.4-00.1174939-274610.0------S--2.0?--?--
G001.9+00.3174845-27101.5------S--0.6--0.6----
G003.7-00.2175526-255014.0x11.0--S--2.3--0.65----
G003.8+00.3175255-252818.0------S?--3.0?0.6----
G004.2-03.518855-27328.0------S--3.2?0.6?--
G004.5+06.8173042-21293.0------S--19.0--0.64--Kepler, SN1604, 3C358
In [10]:
# the units are also stored, we can extract them too
table['MajDiam'].quantity.to('rad')[0:3]
Out[10]:
$[0.0010181087,~0.0043633231,~0.0023271057] \; \mathrm{rad}$
In [11]:
# Adding values of different columns
(table['RAh'] + table['RAm'] + table['RAs'])[0:3]
Out[11]:
<MaskedColumn name='RAh' dtype='int64' unit='h' description='*Right Ascension J2000 hours' length=3>
106
78
85
In [12]:
# adding values of different columns but being aware of column units
(table['RAh'].quantity + table['RAm'].quantity + table['RAs'].quantity)[0:3]
Out[12]:
$[17.762222,~17.770833,~17.789167] \; \mathrm{h}$
In [13]:
# Create a new column in the table
table['RA'] = table['RAh'].quantity + table['RAm'].quantity + table['RAs'].quantity
In [14]:
# Show table's new column 
table['RA'][0:3]
Out[14]:
<MaskedColumn name='RA' dtype='float64' unit='h' length=3>
17.7622222222
17.7708333333
17.7891666667
In [15]:
# add a description to the new column
table['RA'].description = table['RAh'].description
In [16]:
# Now it does show the values
table['RA'][0:3]
Out[16]:
<MaskedColumn name='RA' dtype='float64' unit='h' description='*Right Ascension J2000 hours' length=3>
17.7622222222
17.7708333333
17.7891666667
In [17]:
# Using numpy to calculate the sin of the RA
import numpy as np
np.sin(table['RA'].quantity)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-17-56bceb528a84> in <module>()
      1 import numpy as np
----> 2 np.sin(table['RA'].quantity)

/home/dvd/.conda/envs/swc/lib/python2.7/site-packages/astropy/units/quantity.pyc in __array_prepare__(self, obj, context)
    321         # the unit the output from the ufunc will have.
    322         if function in UFUNC_HELPERS:
--> 323             converters, result_unit = UFUNC_HELPERS[function](function, *units)
    324         else:
    325             raise TypeError("Unknown ufunc {0}.  Please raise issue on "

/home/dvd/.conda/envs/swc/lib/python2.7/site-packages/astropy/units/quantity_helper.pyc in helper_radian_to_dimensionless(f, unit)
    174         raise TypeError("Can only apply '{0}' function to "
    175                         "quantities with angle units"
--> 176                         .format(f.__name__))
    177 
    178 UFUNC_HELPERS[np.cos] = helper_radian_to_dimensionless

TypeError: Can only apply 'sin' function to quantities with angle units
In [18]:
# Let's change the units...
import astropy.units as u
table['RA'].unit = u.hourangle
In [19]:
# does the sin now works?
np.sin(table['RA'].quantity)
Out[19]:
$[-0.99806309,~-0.9982008,~-0.99847709,~\dots, -0.99835018,~-0.99799924,~-0.99604107] \; \mathrm{}$

Properties when reading

the reading of the table has many properties, let's imagine the following easy example:

In [20]:
weather_data = """
# Country = Finland
# City = Helsinki
# Longitud = 24.9375
# Latitud = 60.170833
# Week = 32
# Year = 2015
day, precip, type
Mon,1.5,rain
Tues,,
Wed,1.1,snow
Thur,2.3,rain
Fri,0.2,
Sat,1.1,snow
Sun,5.4,snow
"""
In [21]:
# Read the table
weather = ascii.read(weather_data)
In [22]:
# Blank values are interpreted by default as bad/missing values
weather.info('stats')
<Table masked=True length=7>
 name       mean          std      min max n_bad
------ ------------- ------------- --- --- -----
   day            --            --  --  --     0
precip 1.93333333333 1.66999667332 0.2 5.4     1
  type            --            --  --  --     2
In [23]:
# Let's define missing values for the columns we want:
weather['type'].fill_value = 'N/A'
weather['precip'].fill_value = -999
In [24]:
# Use filled to show the value filled.
weather.filled()
Out[24]:
<Table length=7>
daypreciptype
str4float64str4
Mon1.5rain
Tues-999.0N/A
Wed1.1snow
Thur2.3rain
Fri0.2N/A
Sat1.1snow
Sun5.4snow
In [25]:
# We can see the meta as a dictionary, but not as key, value pairs
weather.meta
Out[25]:
OrderedDict([('comments',
              ['Country = Finland',
               'City = Helsinki',
               'Longitud = 24.9375',
               'Latitud = 60.170833',
               'Week = 32',
               'Year = 2015'])])
In [26]:
# To get it the header as a table
header = ascii.read(weather.meta['comments'], delimiter='=',
                    format='no_header', names=['key', 'val'])
print(header)
  key       val   
-------- ---------
 Country   Finland
    City  Helsinki
Longitud   24.9375
 Latitud 60.170833
    Week        32
    Year      2015

When the values are not empty, then the keyword fill_values on read has to be used.

Reading VOTables

VOTables are an special type of tables which should be self-consistent and can be tied to a particular scheme. This mean the file will contain where the data comes from (and which query produced it) and the properties for each field, making it easier to ingest by a machine.

In [27]:
from astropy.io.votable import parse_single_table
In [28]:
# Read the example table from HELIO (hfc_ar.xml)
table = parse_single_table("hfc_ar.xml")
WARNING: W03: hfc_ar.xml:10:0: W03: Implicitly generating an ID from a name 'OBSERVATORY,VIEW_AR_GUI' -> 'OBSERVATORY_VIEW_AR_GUI' [astropy.io.votable.exceptions]
WARNING:astropy:W03: hfc_ar.xml:10:0: W03: Implicitly generating an ID from a name 'OBSERVATORY,VIEW_AR_GUI' -> 'OBSERVATORY_VIEW_AR_GUI'
In [29]:
# See the fields of the table
table.fields
Out[29]:
[<FIELD ID="ID_AR" datatype="int" name="ID_AR" ucd="meta.id" utype="ar.id"/>,
 <FIELD ID="DATE_OBS" arraysize="*" datatype="char" name="DATE_OBS" ucd="time.start;obs" utype="image.time_obs | time_period.time_start.first_observation_time"/>,
 <FIELD ID="NOAA_NUMBER" datatype="int" name="NOAA_NUMBER" ucd="meta.id" utype="ar.noaa_solar_region_id"/>,
 <FIELD ID="FEAT_HG_LAT_DEG" datatype="float" name="FEAT_HG_LAT_DEG" ucd="pos.heliographic;pos.barycenter;pos.bodyrc.lat" unit="deg" utype="feature.centre.lat_hg"/>,
 <FIELD ID="FEAT_AREA_DEG2" datatype="float" name="FEAT_AREA_DEG2" ucd="phys.area" unit="deg2" utype="feature.area.area_deg_sq"/>,
 <FIELD ID="FEAT_X_ARCSEC" datatype="double" name="FEAT_X_ARCSEC" ucd="pos.wcs;pos.barycenter;Qualifier.DirectionAngle.AzimuthAngle" unit="arcs" utype="feature.centre.x_cart"/>,
 <FIELD ID="FEAT_Y_ARCSEC" datatype="double" name="FEAT_Y_ARCSEC" ucd="pos.wcs;pos.barycenter;Qualifier.DirectionAngle.ElevationAngle" unit="arcs" utype="feature.centre.y_cart"/>,
 <FIELD ID="FEAT_MAX_INT" datatype="float" name="FEAT_MAX_INT" ucd="MeasurementType.ImageIntensity;stat.max" unit="gauss" utype="feature.max_intensity"/>,
 <FIELD ID="FEAT_MIN_INT" datatype="float" name="FEAT_MIN_INT" ucd="MeasurementType.ImageIntensity;stat.min" unit="gauss" utype="feature.min_intensity"/>,
 <FIELD ID="FEAT_MEAN_INT" datatype="float" name="FEAT_MEAN_INT" ucd="MeasurementType.ImageIntensity;stat.mean" unit="gauss" utype="feature.mean_intensity"/>]
In [30]:
# extract one  (NOAA_NUMBER) or all of the columns
NOAA = table.array['NOAA_NUMBER']
In [31]:
# Show the data
NOAA.data
Out[31]:
array([      10321, -2147483648,       10325, ..., -2147483648,
             10332, -2147483648], dtype=int32)
In [32]:
# See the mask
NOAA.mask
Out[32]:
array([False,  True, False, ...,  True, False,  True], dtype=bool)
In [33]:
# Shee the whole array.
NOAA
Out[33]:
masked_array(data = [10321 -- 10325 ..., -- 10332 --],
             mask = [False  True False ...,  True False  True],
       fill_value = 999999)
In [34]:
# Convert the table to an astropy table
asttable = table.to_table()
In [35]:
# See the table
asttable
Out[35]:
<Table masked=True length=3037>
ID_ARDATE_OBSNOAA_NUMBERFEAT_HG_LAT_DEGFEAT_AREA_DEG2FEAT_X_ARCSECFEAT_Y_ARCSECFEAT_MAX_INTFEAT_MIN_INTFEAT_MEAN_INT
degdeg2arcsarcsgaussgaussgauss
int32objectint32float32float32float64float64float32float32float32
2475232003-04-01T00:00:00103214.87889190.33273.78399999999999185.9941789.54-2573.3201-0.46051401
2475282003-04-01T00:00:00---12.1797160.16-188.178-96.15412147.6201-1722.37-46.435699
2475392003-04-01T00:00:001032512.3932147.224-458.50900000000001298.3583202.8701-1377.66-2.20612
2475452003-04-01T00:00:00---15.9363129.17799-577.27200000000005-179.56999999999999766.06-2342.429932.803101
2475492003-04-01T00:00:0010323-7.57478108.752472.27600000000001-31.26532495.7-2757.1101-4.3433499
2475632003-04-01T00:00:0010318-14.978977.475998705.50699999999995-177.7261379.92-1706.51-18.2155
2475812003-04-01T00:00:001031912.003162.118799.56500000000005254.836000000000012780.72-1257.1-11.4169
2475912003-04-01T00:00:00--9.385919646.102001-251.09299999999999260.08199999999999902.163020.051.014099
2475952003-04-01T00:00:00---25.29835.77222.72-311.7950000000000280.717697-727.44299-37.111
..............................
2685972003-04-16T00:00:0010334-8.7012198.573997-17.854299999999999-52.3541000000000031070.77-2614.1399-8.5475798
2685992003-04-16T00:00:00--18.06800159.891998-175.74000000000001381.72199999999998840.00299-831.5900325.4533
2686022003-04-16T00:00:0010335-21.75709953.382-469.75299999999999-279.742000000000021158.71-1231.86-15.6804
2686072003-04-16T00:00:00---8.295880343.610001-238.803-48.5501773.19897-896.87598-4.4306302
2686102003-04-16T00:00:00---25.850529.035999-698.45000000000005-366.28100000000001884.698-856.461-6.8229899
2686112003-04-16T00:00:00--9.769459712.628750.81700000000001216.76499999999999766.763-150.49141.4245
2686152003-04-16T00:00:00---5.1010912.18-492.81999999999999-5.59067000000000030.0-517.284-44.7612
2686242003-04-16T00:00:00--16.019611.508-420.512341.9490000000000184.997002-611.87402-33.491901
2686372003-04-16T00:00:001033211.56119.0860004704.524250.667532.935-733.26202-6.8816299
2686422003-04-16T00:00:00--11.94258.5539999-381.387279.821000000000030.0-640.14502-37.867901
In [36]:
# Different results because quantities are not 
print(np.sin(asttable['FEAT_HG_LAT_DEG'][0:5]))
print(np.sin(asttable['FEAT_HG_LAT_DEG'][0:5].quantity))
FEAT_HG_LAT_DEG
---------------
      -0.986171
       0.377107
      -0.172306
       0.226358
      -0.961276
[ 0.08504982 -0.21097848  0.21461941 -0.27456847 -0.13182007]
In [37]:
# And it can also be converted to other units
print(asttable[0:5]['FEAT_AREA_DEG2'].quantity.to('arcmin2'))
[ 685188.       576576.       530006.375    465040.78125  391507.1875 ] arcmin2