Seismic acquisition fiddling

This notebook is to accompany the blog post published on January 8, 2015: It goes in the bin at Agile Geoscience.

The idea is to replicate what we've done so far but with 3 enhancements:

  • With a Survey object to hold the various features of a survey.
  • With more GeoPandas stuff, and less fussing with (x,y)'s directly.
  • Making bins and assigning midpoints to them.

We'll start with the usual prelims...

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from shapely.geometry import Point, LineString
import geopandas as gpd
import pandas as pd
from fiona.crs import from_epsg

%matplotlib inline

Survey object

In [2]:
class Survey:
    """
    A seismic survey.
    """

    def __init__(self, params):
        
        # Assign the variables from the parameter dict,
        # using dict.items() for Python 3 compatibility.
        for k, v in params.items(): 
            setattr(self, k, v)
          
        # These are just a convenience; we could use the
        # tuples directly, or make objects with attrs.
        self.xmi = self.corner[0]
        self.ymi = self.corner[1]
        
        self.x = self.size[0]
        self.y = self.size[1]
        
        self.SL = self.line_spacing[0]
        self.RL = self.line_spacing[1]
        
        self.si = self.point_spacing[0]
        self.ri = self.point_spacing[1]
        
        self.shiftx = -self.si/2.
        self.shifty = -self.ri/2.
           
    @property
    def lines(self):
        """
        Returns number of (src, rcvr) lines.
        """
        slines = int(self.x/self.SL) + 1
        rlines = int(self.y/self.RL) + 1
        return slines, rlines

    @property
    def points_per_line(self):
        """
        Returns number of (src, rcvr) points per line.
        """
        spoints = int(self.y/self.si) + 2
        rpoints = int(self.x/self.ri) + 2
        return spoints, rpoints
    
    @property
    def src(self):
        s = [Point(self.xmi+line*self.SL, self.ymi+s*self.si)
             for line in range(self.lines[0])
             for s in range(self.points_per_line[0])
             ]
        S = gpd.GeoSeries(s)
        S.crs = from_epsg(26911)
        return S

    @property
    def rcvr(self):
        r = [Point(self.xmi + r*self.ri + self.shiftx, self.ymi + line*self.RL - self.shifty)
             for line in range(self.lines[1])
             for r in range(self.points_per_line[1])
             ]
        R = gpd.GeoSeries(r)
        R.crs = from_epsg(self.epsg)
        return R
    
    @property
    def layout(self):
        """
        Provide a GeoDataFrame of all points,
        labelled as columns and in hierarchical index.
        """
        # Feels like there might be a better way to do this...
        sgdf = gpd.GeoDataFrame({'geometry': self.src, 'station': 'src'})
        rgdf = gpd.GeoDataFrame({'geometry': self.rcvr, 'station': 'rcvr'})

        # Concatenate with a hierarchical index
        layout = pd.concat([sgdf,rgdf], keys=['sources','receivers'])
        layout.crs = from_epsg(self.epsg)

        return layout

Perhaps s and r should be objects too. I think you might want to have survey.receivers.x for the list of x locations, for example.

Instantiate and plot

In [3]:
params = {'corner': (5750000,4710000),
          'size': (3000,1800),
          'line_spacing': (600,600),
          'point_spacing': (100,100),
          'epsg': 26911 # http://spatialreference.org/ref/epsg/26911/
          }

survey = Survey(params)
In [4]:
s = survey.src
r = survey.rcvr
r[:10]
Out[4]:
0    POINT (5749950 4710050)
1    POINT (5750050 4710050)
2    POINT (5750150 4710050)
3    POINT (5750250 4710050)
4    POINT (5750350 4710050)
5    POINT (5750450 4710050)
6    POINT (5750550 4710050)
7    POINT (5750650 4710050)
8    POINT (5750750 4710050)
9    POINT (5750850 4710050)
dtype: object
In [5]:
layout = survey.layout
layout[:10]
Out[5]:
geometry station
sources 0 POINT (5750000 4710000) src
1 POINT (5750000 4710100) src
2 POINT (5750000 4710200) src
3 POINT (5750000 4710300) src
4 POINT (5750000 4710400) src
5 POINT (5750000 4710500) src
6 POINT (5750000 4710600) src
7 POINT (5750000 4710700) src
8 POINT (5750000 4710800) src
9 POINT (5750000 4710900) src

With a hierarchical index you can do cool things, e.g. show the last five sources:

In [6]:
layout.ix['sources'][-5:]
Out[6]:
geometry station
115 POINT (5753000 4711500) src
116 POINT (5753000 4711600) src
117 POINT (5753000 4711700) src
118 POINT (5753000 4711800) src
119 POINT (5753000 4711900) src
In [7]:
layout.crs
Out[7]:
{'init': 'epsg:26911', 'no_defs': True}
In [8]:
ax = layout.plot()

Export GeoDataFrames to GIS shapefile.

In [9]:
# gdf.to_file('src_and_rcvr.shp')

Midpoint calculations

We need midpoints. There is a midpoint between every source-receiver pair.

Hopefully it's not too inelegant to get to the midpoints now that we're using this layout object thing.

In [10]:
midpoint_list = [LineString([r, s]).interpolate(0.5, normalized=True)
                  for r in layout.ix['receivers'].geometry
                  for s in layout.ix['sources'].geometry
                  ]

As well as knowing the (x,y) of the midpoints, we'd also like to record the distance from each s to each live r (each r in the live patch). This is easy enough to compute:

Point(x1, y1).distance(Point(x2, y2))

Then we can make a list of all the offsets when we count the midpoints into the bins.

In [11]:
offsets = [r.distance(s)
           for r in layout.ix['receivers'].geometry
           for s in layout.ix['sources'].geometry
           ]
In [12]:
azimuths = [(180.0/np.pi) * np.arctan((r.x - s.x)/(r.y - s.y))
            for r in layout.ix['receivers'].geometry
            for s in layout.ix['sources'].geometry
            ]
In [13]:
offsetx = np.array(offsets)*np.cos(np.array(azimuths)*np.pi/180.)
offsety = np.array(offsets)*np.sin(np.array(azimuths)*np.pi/180.)

Make a Geoseries of the midpoints, offsets and azimths:

In [14]:
midpoints = gpd.GeoDataFrame({
                   'geometry' : midpoint_list,
                   'offset' : offsets,
                   'azimuth': azimuths,
                   'offsetx' : offsetx,
                   'offsety' : offsety
                   })
midpoints[:5]
Out[14]:
azimuth geometry offset offsetx offsety
0 -45.000000 POINT (5749975 4710025) 70.710678 50 -50
1 45.000000 POINT (5749975 4710075) 70.710678 50 50
2 18.434949 POINT (5749975 4710125) 158.113883 150 50
3 11.309932 POINT (5749975 4710175) 254.950976 250 50
4 8.130102 POINT (5749975 4710225) 353.553391 350 50
In [15]:
ax = midpoints.plot()

Save to a shapefile if desired.

In [16]:
#midpt.to_file('CMPs.shp')

Spider plot

In [17]:
midpoints[:5].offsetx # Easy!
Out[17]:
0     50
1     50
2    150
3    250
4    350
Name: offsetx, dtype: float64
In [18]:
midpoints.ix[3].geometry.x # Less easy :(
Out[18]:
5749975.0

We need lists (or arrays) to pass into the matplotlib quiver plot. This takes four main parameters: x, y, u, and v, where x, y will be our coordinates, and u, v will be the offset vector for that midpoint.

We can get at the GeoDataFrame's attributes easily, but I can't see how to get at the coordinates in the geometry GeoSeries (seems like a user error — it feels like it should be really easy) so I am resorting to this:

In [19]:
x = [m.geometry.x for i, m in midpoints.iterrows()]
y = [m.geometry.y for i, m in midpoints.iterrows()]
In [20]:
fig = plt.figure(figsize=(12,8))
plt.quiver(x, y, midpoints.offsetx, midpoints.offsety, units='xy', width=0.5, scale=1/0.025, pivot='mid', headlength=0)
plt.axis('equal')
plt.show()

Bins

The bins are a new geometry, related to but separate from the survey itself, and the midpoints. We will model them as a GeoDataFrame of polygons. The steps are:

  1. Compute the bin centre locations with our usual list comprehension trick.
  2. Buffer the centres with a square.
  3. Gather the buffered polygons into a GeoDataFrame.
In [21]:
# Factor to shift the bins relative to source and receiver points
jig = survey.si / 4.
bin_centres = gpd.GeoSeries([Point(survey.xmi + 0.5*r*survey.ri - jig, survey.ymi + 0.5*s*survey.si + jig)
                             for r in range(2*(survey.points_per_line[1]-1))
                             for s in range(2*(survey.points_per_line[0]-1))
                            ])

# Buffers are diamond shaped so we have to scale and rotate them.
scale_factor = np.sin(np.pi/4.)/2.
bin_polys = bin_centres.buffer(scale_factor*survey.ri, 1).rotate(-45)
bins = gpd.GeoDataFrame(geometry=bin_polys)

bins[:3]
Out[21]:
geometry
0 POLYGON ((5750000 4709999.999999999, 5749950 4...
1 POLYGON ((5750000 4710049.999999999, 5749950 4...
2 POLYGON ((5750000 4710100, 5749950 4710100, 57...

Suspect there's a super easy way to get all midpoints in a bin poly, without stepping over all bins.

WARNING This step is very slow for more than a few thousand midpoints.

In [22]:
# Make a copy because I'm going to drop points as I
# assign them to polys, to speed up subsequent search.
midpts = midpoints.copy()

offsets, azimuths = [], [] # To hold complete list.

# Loop over bin polygons with index i.
for i, bin_i in bins.iterrows():
    
    o, a = [], [] # To hold list for this bin only.
    
    # Now loop over all midpoints with index j.
    for j, midpt_j in midpts.iterrows():
        if bin_i.geometry.contains(midpt_j.geometry):
            # Then it's a hit! Add it to the lists,
            # and drop it so we have less hunting.
            o.append(midpt_j.offset)
            a.append(midpt_j.azimuth)
            midpts = midpts.drop([j])
            
    # Add the bin_i lists to the master list
    # and go around the outer loop again.
    offsets.append(o)
    azimuths.append(a)
    
# Add everything to the dataframe.    
bins['offsets'] = gpd.GeoSeries(offsets)
bins['azimuths'] = gpd.GeoSeries(azimuths)
In [23]:
bins[:10]
Out[23]:
geometry offsets azimuths
0 POLYGON ((5750000 4709999.999999999, 5749950 4... [70.7106781187] [-45.0]
1 POLYGON ((5750000 4710049.999999999, 5749950 4... [70.7106781187] [45.0]
2 POLYGON ((5750000 4710100, 5749950 4710100, 57... [158.113883008] [18.4349488229]
3 POLYGON ((5750000 4710150, 5749950 4710150, 57... [254.95097568] [11.309932474]
4 POLYGON ((5750000 4710200, 5749949.999999999 4... [353.553390593] [8.13010235416]
5 POLYGON ((5750000 4710250, 5749950 4710250, 57... [452.769256907] [6.34019174591]
6 POLYGON ((5750000 4710300, 5749950 4710300, 57... [552.268050859, 651.92024052] [5.19442890773, -4.398705355]
7 POLYGON ((5750000 4710349.999999999, 5749950 4... [651.92024052, 552.268050859] [4.398705355, -5.19442890773]
8 POLYGON ((5750000 4710400, 5749950 4710400, 57... [751.664818919, 452.769256907] [3.81407483429, -6.34019174591]
9 POLYGON ((5750000 4710450, 5749950 4710450, 57... [851.469318296, 353.553390593] [3.36646066343, -8.13010235416]

We can compute the fold from the length of the list of offsets in each bin. We use a mini-function, called a lambda, to do this. This piece of code applies a lambda to each row in the GeoDataFrame. Essentially it says:

set each row in the 'fold' column in my `bins` GeoDataFrame to the length of the offsets list for that row. 
In [24]:
bins['fold'] = bins.apply(lambda row: len(row.offsets), axis=1)

Now we can use the GeoDataFrame's built-in plot() method to plot these:

In [25]:
ax = bins.plot(column="fold")

We can use a similar trick to compute the minimum offset, but with an added test for there being valid data in the bin:

In [26]:
bins['min_offset'] = bins.apply(lambda row: min(row.offsets) if row.fold > 0 else None, axis=1)
In [27]:
ax = bins.plot(column="min_offset")