2016 - IPGP - ObsPy Tutorial

# #

Introduction to File Formats and read/write in ObsPy

# #

# This is oftentimes not taught, but fairly important to understand, at least at a basic level. This also teaches you how to work with these in ObsPy. # In[ ]: #%matplotlib inline from __future__ import print_function import matplotlib.pylab as plt plt.switch_backend("nbagg") plt.style.use('ggplot') plt.rcParams['figure.figsize'] = 12, 8 # ## SEED Identifiers # # According to the [SEED standard](www.fdsn.org/seed_manual/SEEDManual_V2.4.pdf), which is fairly well adopted, the following nomenclature is used to identify seismic receivers: # # * **Network code**: Identifies the network/owner of the data. Assigned by the FDSN and thus unique. # * **Station code**: The station within a network. *NOT UNIQUE IN PRACTICE!* Always use together with a network code! # * **Location ID**: Identifies different data streams within one station. Commonly used to logically separate multiple instruments at a single station. # * **Channel codes**: Three character code: 1) Band and approximate sampling rate, 2) The type of instrument, 3) The orientation # # This results in full ids of the form **NET.STA.LOC.CHAN**, e.g. **IV.PRMA..HHE**. # # # --- # # # In seismology we generally distinguish between three separate types of data: # # 1. **Waveform Data** - The actual waveforms as time series. # 2. **Station Data** - Information about the stations' operators, geographical locations, and the instrument's responses. # 3. **Event Data** - Information about earthquakes. # # Some formats have elements of two or more of these. # ## Waveform Data # # ![stream](images/Stream_Trace.svg) # # There are a myriad of waveform data formats but in Europe and the USA two formats dominate: **MiniSEED** and **SAC** # # # ### MiniSEED # # * This is what you get from datacenters and also what they store, thus the original data # * Can store integers and single/double precision floats # * Integer data (e.g. counts from a digitizer) are heavily compressed: a factor of 3-5 depending on the data # * Can deal with gaps and overlaps # * Multiple components per file # * Contains only the really necessary parameters and some information for the data providers # In[ ]: import obspy # ObsPy automatically detects the file format. st = obspy.read("data/example.mseed") print(st) # Fileformat specific information is stored here. print(st[0].stats.mseed) # In[ ]: st.plot() # In[ ]: # This is a quick interlude to teach you basics about how to work # with Stream/Trace objects. # Most operations work in-place, e.g. they modify the existing # objects. We'll create a copy here. st2 = st.copy() # To use only part of a Stream, use the select() function. print(st2.select(component="Z")) # Stream objects behave like a list of Trace objects. tr = st2[0] tr.plot() # Some basic processing. Please note that these modify the # existing object. tr.detrend("linear") tr.taper(type="hann", max_percentage=0.05) tr.filter("lowpass", freq=0.5) tr.plot() # In[ ]: # You can write it again by simply specifing the format. st.write("temp.mseed", format="mseed") # ### SAC # # * Custom format of the `sac` code. # * Simple header and single precision floating point data. # * Only a single component per file and no concept of gaps/overlaps. # * Used a lot due to `sac` being very popular and the additional basic information that can be stored in the header. # In[ ]: st = obspy.read("data/example.sac") print(st) st[0].stats.sac.__dict__ # In[ ]: st.plot() # In[ ]: # You can once again write it with the write() method. st.write("temp.sac", format="sac") # ## Station Data # # ![inv](images/Inventory.svg) # # Station data contains information about the organziation that collections the data, geographical information, as well as the instrument response. It mainly comes in three formats: # # * `(dataless) SEED`: Very complete but pretty complex and binary. Still used a lot, e.g. for the Arclink protocol # * `RESP`: A strict subset of SEED. ASCII based. Contains **ONLY** the response. # * `StationXML`: Essentially like SEED but cleaner and based on XML. Most modern format and what the datacenters nowadays serve. **Use this if you can.** # # # ObsPy can work with all of them but today we will focus on StationXML. # They are XML files: # In[ ]: get_ipython().system('head data/all_stations.xml') # In[ ]: import obspy # Use the read_inventory function to open them. inv = obspy.read_inventory("data/all_stations.xml") print(inv) # You can see that they can contain an arbirary number of networks, stations, and channels. # In[ ]: # ObsPy is also able to plot a map of them. inv.plot(projection="local"); # In[ ]: # As well as a plot the instrument response. inv.select(network="IV", station="SALO", channel="BH?").plot_response(0.001); # In[ ]: # Coordinates of single channels can also be extraced. This function # also takes a datetime arguments to extract information at different # points in time. inv.get_coordinates("IV.SALO..BHZ") # In[ ]: # And it can naturally be written again, also in modified state. inv.select(channel="BHZ").write("temp.xml", format="stationxml") # ## Event Data # # ![events](./images/Event.svg) # # Event data is essentially served in either very simple formats like NDK or the CMTSOLUTION format used by many waveform solvers: # In[ ]: get_ipython().system('cat data/GCMT_2014_04_01__Mw_8_1') # Datacenters on the hand offer **QuakeML** files, which are surprisingly complex in structure but can store complex relations. # In[ ]: # Read QuakeML files with the read_events() function. cat = obspy.read_events("data/GCMT_2014_04_01__Mw_8_1.xml") print(cat) # In[ ]: print(cat[0]) # In[ ]: cat.plot(projection="ortho"); # In[ ]: # Once again they can be written with the write() function. cat.write("temp_quake.xml", format="quakeml") # To show off some more things, I added a file containing all events from 2014 in the GCMT catalog. # In[ ]: import obspy cat = obspy.read_events("data/2014.ndk") print(cat) # In[ ]: cat.plot(); # In[ ]: cat.filter("depth > 100000", "magnitude > 7") # ## Acknowledgements # # Background picture at the very top is from Matthias Meschede.