Our first task with python was to read a csv
file using np.loadtxt()
.
That function has few properties to define the dlimiter of the columns, skip rows, read commented lines, convert values while reading, etc.
However, the result is an array, without the information of the metadata that file may have included (name, units, ...).
Astropy offers a ascii reader that improves many of these steps while provides templates to read common ascii files in astronomy.
from astropy.io import ascii
# Read a sample file: sources.dat
data = ascii.read("sources.dat")
data
obsid | redshift | X | Y | object |
---|---|---|---|---|
int64 | float64 | int64 | int64 | str11 |
3102 | 0.32 | 4167 | 4085 | Q1250+568-A |
877 | 0.22 | 4378 | 3892 | Source 82 |
Automatically, read has identified the header and the format of each column. The result is a Table
object, and that brings some additional properties.
# Show the info of the data read
data.info
/home/dvd/.conda/envs/swc/lib/python2.7/site-packages/astropy/table/column.py:268: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison return self.data.__eq__(other)
<Table length=2> name dtype -------- ------- obsid int64 redshift float64 X int64 Y int64 object str11
# Get the name of the columns
data.colnames
['obsid', 'redshift', 'X', 'Y', 'object']
# Get just the values of a particular column
data['obsid']
3102 |
877 |
# get the first element
data['obsid', 'redshift'][0]
obsid | redshift |
---|---|
int64 | float64 |
3102 | 0.32 |
Astropy can read a variety of formats easily. The following example uses a quite
# Read the data from the source
table = ascii.read("ftp://cdsarc.u-strasbg.fr/pub/cats/VII/253/snrs.dat",
readme="ftp://cdsarc.u-strasbg.fr/pub/cats/VII/253/ReadMe")
Downloading ftp://cdsarc.u-strasbg.fr/pub/cats/VII/253/snrs.dat [Done] Downloading ftp://cdsarc.u-strasbg.fr/pub/cats/VII/253/ReadMe [Done]
# See the stats of the table
table.info('stats')
<Table masked=True length=274> name mean std min max n_bad ---------- -------------- -------------- --- ------ ----- SNR -- -- -- -- 0 RAh 16.0547445255 4.15229196762 0 23 0 RAm 28.8576642336 16.9123382708 0 59 0 RAs 28.102189781 18.5923556505 0 59 0 DE- -- -- -- -- 0 DEd 33.602189781 19.4333634671 0 72 0 DEm 29.6459854015 17.7768672558 0 59 0 MajDiam 30.9124087591 42.1254815567 1.5 310.0 0 --- -- -- -- -- 164 MinDiam 23.4909090909 33.3816758266 2.0 240.0 164 u_MinDiam -- -- -- -- 243 type -- -- -- -- 0 l_S(1GHz) -- -- -- -- 270 S(1GHz) 42.6488549618 212.136906631 0.3 2720.0 12 u_S(1GHz) -- -- -- -- 157 Sp-Index 0.486981132075 0.158350398486 0.0 1.0 62 u_Sp-Index -- -- -- -- 154 Names -- -- -- -- 194
/home/dvd/.conda/envs/swc/lib/python2.7/site-packages/astropy/table/info.py:94: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal if np.all(info[name] == ''):
# If we want to see the first 10 entries
table[0:10]
SNR | RAh | RAm | RAs | DE- | DEd | DEm | MajDiam | --- | MinDiam | u_MinDiam | type | l_S(1GHz) | S(1GHz) | u_S(1GHz) | Sp-Index | u_Sp-Index | Names |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
h | min | s | deg | arcmin | arcmin | arcmin | Jy | ||||||||||
str11 | int64 | int64 | int64 | str1 | int64 | int64 | float64 | str1 | float64 | str1 | str2 | str1 | float64 | str1 | float64 | str1 | str26 |
G000.0+00.0 | 17 | 45 | 44 | - | 29 | 0 | 3.5 | x | 2.5 | -- | S | -- | 100.0 | ? | 0.8 | ? | Sgr A East |
G000.3+00.0 | 17 | 46 | 15 | - | 28 | 38 | 15.0 | x | 8.0 | -- | S | -- | 22.0 | -- | 0.6 | -- | -- |
G000.9+00.1 | 17 | 47 | 21 | - | 28 | 9 | 8.0 | -- | -- | -- | C | -- | 18.0 | ? | -- | v | -- |
G001.0-00.1 | 17 | 48 | 30 | - | 28 | 9 | 8.0 | -- | -- | -- | S | -- | 15.0 | -- | 0.6 | ? | -- |
G001.4-00.1 | 17 | 49 | 39 | - | 27 | 46 | 10.0 | -- | -- | -- | S | -- | 2.0 | ? | -- | ? | -- |
G001.9+00.3 | 17 | 48 | 45 | - | 27 | 10 | 1.5 | -- | -- | -- | S | -- | 0.6 | -- | 0.6 | -- | -- |
G003.7-00.2 | 17 | 55 | 26 | - | 25 | 50 | 14.0 | x | 11.0 | -- | S | -- | 2.3 | -- | 0.65 | -- | -- |
G003.8+00.3 | 17 | 52 | 55 | - | 25 | 28 | 18.0 | -- | -- | -- | S? | -- | 3.0 | ? | 0.6 | -- | -- |
G004.2-03.5 | 18 | 8 | 55 | - | 27 | 3 | 28.0 | -- | -- | -- | S | -- | 3.2 | ? | 0.6 | ? | -- |
G004.5+06.8 | 17 | 30 | 42 | - | 21 | 29 | 3.0 | -- | -- | -- | S | -- | 19.0 | -- | 0.64 | -- | Kepler, SN1604, 3C358 |
# the units are also stored, we can extract them too
table['MajDiam'].quantity.to('rad')[0:3]
# Adding values of different columns
(table['RAh'] + table['RAm'] + table['RAs'])[0:3]
106 |
78 |
85 |
# adding values of different columns but being aware of column units
(table['RAh'].quantity + table['RAm'].quantity + table['RAs'].quantity)[0:3]
# Create a new column in the table
table['RA'] = table['RAh'].quantity + table['RAm'].quantity + table['RAs'].quantity
# Show table's new column
table['RA'][0:3]
17.7622222222 |
17.7708333333 |
17.7891666667 |
# add a description to the new column
table['RA'].description = table['RAh'].description
# Now it does show the values
table['RA'][0:3]
17.7622222222 |
17.7708333333 |
17.7891666667 |
# Using numpy to calculate the sin of the RA
import numpy as np
np.sin(table['RA'].quantity)
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-17-56bceb528a84> in <module>() 1 import numpy as np ----> 2 np.sin(table['RA'].quantity) /home/dvd/.conda/envs/swc/lib/python2.7/site-packages/astropy/units/quantity.pyc in __array_prepare__(self, obj, context) 321 # the unit the output from the ufunc will have. 322 if function in UFUNC_HELPERS: --> 323 converters, result_unit = UFUNC_HELPERS[function](function, *units) 324 else: 325 raise TypeError("Unknown ufunc {0}. Please raise issue on " /home/dvd/.conda/envs/swc/lib/python2.7/site-packages/astropy/units/quantity_helper.pyc in helper_radian_to_dimensionless(f, unit) 174 raise TypeError("Can only apply '{0}' function to " 175 "quantities with angle units" --> 176 .format(f.__name__)) 177 178 UFUNC_HELPERS[np.cos] = helper_radian_to_dimensionless TypeError: Can only apply 'sin' function to quantities with angle units
# Let's change the units...
import astropy.units as u
table['RA'].unit = u.hourangle
# does the sin now works?
np.sin(table['RA'].quantity)
the reading of the table has many properties, let's imagine the following easy example:
weather_data = """
# Country = Finland
# City = Helsinki
# Longitud = 24.9375
# Latitud = 60.170833
# Week = 32
# Year = 2015
day, precip, type
Mon,1.5,rain
Tues,,
Wed,1.1,snow
Thur,2.3,rain
Fri,0.2,
Sat,1.1,snow
Sun,5.4,snow
"""
# Read the table
weather = ascii.read(weather_data)
# Blank values are interpreted by default as bad/missing values
weather.info('stats')
<Table masked=True length=7> name mean std min max n_bad ------ ------------- ------------- --- --- ----- day -- -- -- -- 0 precip 1.93333333333 1.66999667332 0.2 5.4 1 type -- -- -- -- 2
# Let's define missing values for the columns we want:
weather['type'].fill_value = 'N/A'
weather['precip'].fill_value = -999
# Use filled to show the value filled.
weather.filled()
day | precip | type |
---|---|---|
str4 | float64 | str4 |
Mon | 1.5 | rain |
Tues | -999.0 | N/A |
Wed | 1.1 | snow |
Thur | 2.3 | rain |
Fri | 0.2 | N/A |
Sat | 1.1 | snow |
Sun | 5.4 | snow |
# We can see the meta as a dictionary, but not as key, value pairs
weather.meta
OrderedDict([('comments', ['Country = Finland', 'City = Helsinki', 'Longitud = 24.9375', 'Latitud = 60.170833', 'Week = 32', 'Year = 2015'])])
# To get it the header as a table
header = ascii.read(weather.meta['comments'], delimiter='=',
format='no_header', names=['key', 'val'])
print(header)
key val -------- --------- Country Finland City Helsinki Longitud 24.9375 Latitud 60.170833 Week 32 Year 2015
When the values are not empty, then the keyword fill_values
on read
has to be used.
VOTables are an special type of tables which should be self-consistent and can be tied to a particular scheme. This mean the file will contain where the data comes from (and which query produced it) and the properties for each field, making it easier to ingest by a machine.
from astropy.io.votable import parse_single_table
# Read the example table from HELIO (hfc_ar.xml)
table = parse_single_table("hfc_ar.xml")
WARNING: W03: hfc_ar.xml:10:0: W03: Implicitly generating an ID from a name 'OBSERVATORY,VIEW_AR_GUI' -> 'OBSERVATORY_VIEW_AR_GUI' [astropy.io.votable.exceptions] WARNING:astropy:W03: hfc_ar.xml:10:0: W03: Implicitly generating an ID from a name 'OBSERVATORY,VIEW_AR_GUI' -> 'OBSERVATORY_VIEW_AR_GUI'
# See the fields of the table
table.fields
[<FIELD ID="ID_AR" datatype="int" name="ID_AR" ucd="meta.id" utype="ar.id"/>, <FIELD ID="DATE_OBS" arraysize="*" datatype="char" name="DATE_OBS" ucd="time.start;obs" utype="image.time_obs | time_period.time_start.first_observation_time"/>, <FIELD ID="NOAA_NUMBER" datatype="int" name="NOAA_NUMBER" ucd="meta.id" utype="ar.noaa_solar_region_id"/>, <FIELD ID="FEAT_HG_LAT_DEG" datatype="float" name="FEAT_HG_LAT_DEG" ucd="pos.heliographic;pos.barycenter;pos.bodyrc.lat" unit="deg" utype="feature.centre.lat_hg"/>, <FIELD ID="FEAT_AREA_DEG2" datatype="float" name="FEAT_AREA_DEG2" ucd="phys.area" unit="deg2" utype="feature.area.area_deg_sq"/>, <FIELD ID="FEAT_X_ARCSEC" datatype="double" name="FEAT_X_ARCSEC" ucd="pos.wcs;pos.barycenter;Qualifier.DirectionAngle.AzimuthAngle" unit="arcs" utype="feature.centre.x_cart"/>, <FIELD ID="FEAT_Y_ARCSEC" datatype="double" name="FEAT_Y_ARCSEC" ucd="pos.wcs;pos.barycenter;Qualifier.DirectionAngle.ElevationAngle" unit="arcs" utype="feature.centre.y_cart"/>, <FIELD ID="FEAT_MAX_INT" datatype="float" name="FEAT_MAX_INT" ucd="MeasurementType.ImageIntensity;stat.max" unit="gauss" utype="feature.max_intensity"/>, <FIELD ID="FEAT_MIN_INT" datatype="float" name="FEAT_MIN_INT" ucd="MeasurementType.ImageIntensity;stat.min" unit="gauss" utype="feature.min_intensity"/>, <FIELD ID="FEAT_MEAN_INT" datatype="float" name="FEAT_MEAN_INT" ucd="MeasurementType.ImageIntensity;stat.mean" unit="gauss" utype="feature.mean_intensity"/>]
# extract one (NOAA_NUMBER) or all of the columns
NOAA = table.array['NOAA_NUMBER']
# Show the data
NOAA.data
array([ 10321, -2147483648, 10325, ..., -2147483648, 10332, -2147483648], dtype=int32)
# See the mask
NOAA.mask
array([False, True, False, ..., True, False, True], dtype=bool)
# Shee the whole array.
NOAA
masked_array(data = [10321 -- 10325 ..., -- 10332 --], mask = [False True False ..., True False True], fill_value = 999999)
# Convert the table to an astropy table
asttable = table.to_table()
# See the table
asttable
ID_AR | DATE_OBS | NOAA_NUMBER | FEAT_HG_LAT_DEG | FEAT_AREA_DEG2 | FEAT_X_ARCSEC | FEAT_Y_ARCSEC | FEAT_MAX_INT | FEAT_MIN_INT | FEAT_MEAN_INT |
---|---|---|---|---|---|---|---|---|---|
deg | deg2 | arcs | arcs | gauss | gauss | gauss | |||
int32 | object | int32 | float32 | float32 | float64 | float64 | float32 | float32 | float32 |
247523 | 2003-04-01T00:00:00 | 10321 | 4.87889 | 190.33 | 273.78399999999999 | 185.994 | 1789.54 | -2573.3201 | -0.46051401 |
247528 | 2003-04-01T00:00:00 | -- | -12.1797 | 160.16 | -188.178 | -96.1541 | 2147.6201 | -1722.37 | -46.435699 |
247539 | 2003-04-01T00:00:00 | 10325 | 12.3932 | 147.224 | -458.50900000000001 | 298.358 | 3202.8701 | -1377.66 | -2.20612 |
247545 | 2003-04-01T00:00:00 | -- | -15.9363 | 129.17799 | -577.27200000000005 | -179.56999999999999 | 766.06 | -2342.4299 | 32.803101 |
247549 | 2003-04-01T00:00:00 | 10323 | -7.57478 | 108.752 | 472.27600000000001 | -31.2653 | 2495.7 | -2757.1101 | -4.3433499 |
247563 | 2003-04-01T00:00:00 | 10318 | -14.9789 | 77.475998 | 705.50699999999995 | -177.726 | 1379.92 | -1706.51 | -18.2155 |
247581 | 2003-04-01T00:00:00 | 10319 | 12.0031 | 62.118 | 799.56500000000005 | 254.83600000000001 | 2780.72 | -1257.1 | -11.4169 |
247591 | 2003-04-01T00:00:00 | -- | 9.3859196 | 46.102001 | -251.09299999999999 | 260.08199999999999 | 902.16302 | 0.0 | 51.014099 |
247595 | 2003-04-01T00:00:00 | -- | -25.298 | 35.77 | 222.72 | -311.79500000000002 | 80.717697 | -727.44299 | -37.111 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
268597 | 2003-04-16T00:00:00 | 10334 | -8.70121 | 98.573997 | -17.854299999999999 | -52.354100000000003 | 1070.77 | -2614.1399 | -8.5475798 |
268599 | 2003-04-16T00:00:00 | -- | 18.068001 | 59.891998 | -175.74000000000001 | 381.72199999999998 | 840.00299 | -831.59003 | 25.4533 |
268602 | 2003-04-16T00:00:00 | 10335 | -21.757099 | 53.382 | -469.75299999999999 | -279.74200000000002 | 1158.71 | -1231.86 | -15.6804 |
268607 | 2003-04-16T00:00:00 | -- | -8.2958803 | 43.610001 | -238.803 | -48.5501 | 773.19897 | -896.87598 | -4.4306302 |
268610 | 2003-04-16T00:00:00 | -- | -25.8505 | 29.035999 | -698.45000000000005 | -366.28100000000001 | 884.698 | -856.461 | -6.8229899 |
268611 | 2003-04-16T00:00:00 | -- | 9.7694597 | 12.628 | 750.81700000000001 | 216.76499999999999 | 766.763 | -150.491 | 41.4245 |
268615 | 2003-04-16T00:00:00 | -- | -5.10109 | 12.18 | -492.81999999999999 | -5.5906700000000003 | 0.0 | -517.284 | -44.7612 |
268624 | 2003-04-16T00:00:00 | -- | 16.0196 | 11.508 | -420.512 | 341.94900000000001 | 84.997002 | -611.87402 | -33.491901 |
268637 | 2003-04-16T00:00:00 | 10332 | 11.5611 | 9.0860004 | 704.524 | 250.667 | 532.935 | -733.26202 | -6.8816299 |
268642 | 2003-04-16T00:00:00 | -- | 11.9425 | 8.5539999 | -381.387 | 279.82100000000003 | 0.0 | -640.14502 | -37.867901 |
# Different results because quantities are not
print(np.sin(asttable['FEAT_HG_LAT_DEG'][0:5]))
print(np.sin(asttable['FEAT_HG_LAT_DEG'][0:5].quantity))
FEAT_HG_LAT_DEG --------------- -0.986171 0.377107 -0.172306 0.226358 -0.961276 [ 0.08504982 -0.21097848 0.21461941 -0.27456847 -0.13182007]
# And it can also be converted to other units
print(asttable[0:5]['FEAT_AREA_DEG2'].quantity.to('arcmin2'))
[ 685188. 576576. 530006.375 465040.78125 391507.1875 ] arcmin2