#!/usr/bin/env python
# coding: utf-8

# # Plotting SHARP keywords and images with python

# In this notebook, we will be plotting keywords and images, from data taken by the Helioseismic and Magnetic Imager (HMI) instrument on NASA's Solar Dynamics Observatory (SDO) satellite. SDO takes about a terabyte and a half of data a day, which is more data than any other satellite in the NASA Heliophysics Division. 
# 
# Data from the HMI and Atmospheric Imaging Assembly (AIA) instruments aboard SDO are stored at Stanford University. The metadata are stored in a pSQL database called the Data Record Management System, or DRMS. The image data are stored separately, in storage units called the Storage Unit Management System, or SUMS. Data are merged together, upon export from both systems, as FITS files. DRMS and SUMS together constitute the Joint Science Operations Center, or JSOC.
# 
# The easiest way to access SDO HMI and AIA data is via the python `drms` module, available at [PyPI](https://pypi.python.org/pypi/drms). In addition to the numerous tutorials on both the [Read the Docs](https://drms.readthedocs.io/en/stable/tutorial.html) and [Github](https://github.com/kbg/drms/tree/master/examples), all the examples below utilize the `drms` module. First we'll import the module, and some others:

# In[1]:


import drms
import json, numpy as np, matplotlib.pylab as plt, matplotlib.ticker as mtick
from datetime import datetime as dt_obj
import urllib
from astropy.io import fits
from sunpy.visualization.colormaps import color_tables as ct
from matplotlib.dates import *
import matplotlib.image as mpimg
import sunpy.map
import sunpy.io
from IPython.display import Image
get_ipython().run_line_magic('matplotlib', 'inline')
get_ipython().run_line_magic('config', "InlineBackend.figure_format='retina'")


# The first step in querying for SDO HMI and AIA data is to establish a connection to JSOC. This can be done with the `Client()` class.

# In[2]:


import drms
c = drms.Client()


# The `Client()` class allows one to access both metadata and image data simultaneously via a data series. A data series contains all of particular type of data — e.g. there is a series for continuum intensity data, another for magnetic field data, and so forth. Read Section 4 of the [SDO Analysis Guide](https://www.lmsal.com/sdodocs/doc/dcur/SDOD0060.zip/zip/entry/) for more information about how to build a data series query. For example, to find all the SHARP data series, execute the following regular expression query:

# In[3]:


c.series(r'hmi\.sharp_')


# In[4]:


# Set a series
si = c.info('hmi.sharp_cea_720s')


# To find more information about the FITS header keywords that belong to any given series, we can use the following command:

# In[5]:


si.keywords


# To find more information about the FITS image data, or segments, that belong to any given series, we can use the following command:

# In[6]:


# To see all the segments associated with the series hmi.sharp_cea_720s:
si.segments


# ## Plotting the metadata

# The query below retrieves both metadata and image data for active region 11158, which produced an X2.2-class flare on February 15, 2011 at 1:56 UT, from the SHARP data series. The [SHARP data series](https://link.springer.com/article/10.1007%2Fs11207-014-0529-3) include patches of vector magnetic field data taken by the HMI instrument. These patches encapsulate automatically-detected active regions that are tracked throughout the entirety of their disk passage. The `c.query()` method takes three arguments:
# 1. The first argument is the data series. In the example below, the data series is called `hmi.sharp_cea_720s`. This series is appended with two prime keys: the HARP number (in this case, 377) and the time range (in this case, 2011.02.14_15:00:00/12h). These two prime keys appear in the first two brackets. The HARP number refers to the active region number (see [here](http://jsoc.stanford.edu/doc/data/hmi/harpnum_to_noaa/all_harps_with_noaa_ars.txt) for a mapping between HARP numbers and NOAA active region numbers). A prime key, or set of prime keys, is a unique identifier. The third bracket, with the argument `[? (QUALITY<65536) ?]`, filters out data where the value of the `QUALITY` keyword is greater than 65536. (See [here](http://jsoc.stanford.edu/doc/data/hmi/Quality_Bits/QUALITY.txt) for the definition of the `QUALITY` keyword). While this third bracket is not necessary, it can be a powerful tool for filtering data based on keyword values.
# 2. The second argument in the search query is a list of keywords. In this example, we will query for the keywords `T_REC`, `USFLUX`, and `ERRVF`. 
# 3. The third argument in the search query is a list of segments. In this example, we will query for the segment `Br`, or the radial component of the photospheric magnetic field.

# In[7]:


keys, segments = c.query('hmi.sharp_cea_720s[377][2011.02.14_15:00:00/12h][? (QUALITY<65536) ?]', key='T_REC, USFLUX, ERRVF', seg='Br')


# To convert the keyword `T_REC` into a datetime object, we can use the function below.

# In[8]:


def parse_tai_string(tstr,datetime=True):
    year   = int(tstr[:4])
    month  = int(tstr[5:7])
    day    = int(tstr[8:10])
    hour   = int(tstr[11:13])
    minute = int(tstr[14:16])
    if datetime: return dt_obj(year,month,day,hour,minute)
    else: return year,month,day,hour,minute


# In[9]:


t_rec = np.array([parse_tai_string(keys.T_REC[i],datetime=True) for i in range(keys.T_REC.size)])


# Now for some plotting! matplotlib.pyplot generates two objects: a figure and axes. The data are ascribed to the axes. The time axes in particular requires some formatting; in order to free it of clutter, we'll plot tick marks every three hours and label them accordingly.

# In[10]:


fig, ax = plt.subplots(figsize=(8,7))      # define the size of the figure
orangered = (1.0,0.27,0,1.0)                # create an orange-red color

# define some style elements
marker_style = dict(linestyle='', markersize=8, fillstyle='full',color=orangered,markeredgecolor=orangered)
text_style = dict(fontsize=16, fontdict={'family': 'monospace'})
ax.tick_params(labelsize=14)
ax.yaxis.set_major_formatter(mtick.FormatStrFormatter('%.2f'))

# ascribe the data to the axes
ax.plot(t_rec, (keys.USFLUX)/(1e22),'o',**marker_style)
ax.errorbar(t_rec, (keys.USFLUX)/(1e22), yerr=(keys.ERRVF)/(1e22), linestyle='',color=orangered)

# format the x-axis with universal time
locator = AutoDateLocator()
locator.intervald[HOURLY] = [3] # only show every 3 hours
formatter = DateFormatter('%H')
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(formatter)

# set yrange 
ax.set_ylim([2.4,2.8])

# label the axes and the plot
ax.set_xlabel('time in UT',**text_style)
ax.set_ylabel('maxwells x 1e22',**text_style)
ax.set_title('total unsigned flux starting at '+str(t_rec[0])+' UT',**text_style) # annotate the plot with a start time


# ## Querying the data

# The example above shows a simple query for 12 hours of data from one HARPNUM. But we can also perform more complex queries to identify active regions of interest. Here are a few examples.
# 
# **Example 1.** Suppose we want to create a magnetic field model of an active region and we need observations of a strong-field active region near disk center. This query identifies strong-field regions near disk center during a two year period. We define strong active regions as those with a total unsigned flux (`USFLUX`) greater than $4x10^{22}$ Mx and near disk center as those with a Carrington Longitude (`CRLN_OBS`) less than $1^{\circ}$. The two year period spans between January 2014 and January 2016.

# In[11]:


keys = c.query('hmi.sharp_cea_720s[][2014.01.01 - 2016.01.01][? (CRLN_OBS < 1) AND (USFLUX > 4e22) ?]', key='T_REC, HARPNUM, USFLUX, CRLT_OBS, CRLN_OBS, AREA_ACR')


# In[12]:


keys


# **Example 2.** Suppose we are doing a study on flux emergence and we want to identify active regions that live for a long time. This query identifies long-lived active regions within a six year period. We define long-lived active regions as those with a minimum number of observations (`N_PATCH1`) equal to 1800 (1800 observations with a 12 minute gap between observations means a minimum observation time of 15 days). The six year period spans between January 2012 and January 2018. The `T_FRST1=T_REC` clause identifies the first observation time in the sequence.

# In[13]:


keys = c.query('hmi.sharp_cea_720s[][2012.01.01 - 2018.01.01][? (N_PATCH1 > 1800) AND (T_FRST1=T_REC) ?]', key='T_REC, HARPNUM, NOAA_ARS, N_PATCH1, AREA_ACR')


# In[14]:


keys


# **Example 3.** Suppose we are doing a study on flare prediction. [Schrijver (2007)](https://doi.org/10.1086/511857) derived a parameter, called $R$, which quantifies the flux near an active region neutral line. The study concluded that an active region will flare if the log of R is greater than 5. [Bobra & Couvidat (2015)](https://dx.doi.org/10.1088/0004-637X/798/2/135) also identified a large total unsigned helicity as a relevant characteristic of flaring active regions. This query identifies active regions with a log of R (`R_VALUE`) greater than 5.5 or a total unsigned helicity (`TOTUSJH`) greater than 8900 $\frac{G^{2}}{m}$ during a two year period between January 2012 and January 2014. 

# In[15]:


keys = c.query('hmi.sharp_cea_720s[][2012.01.01 - 2014.01.01][? (R_VALUE > 5.5 AND R_VALUE < 6.0) OR (TOTUSJH >= 8900)?]', key='T_REC,HARPNUM,R_VALUE,TOTUSJH')


# In[16]:


keys


# ## Plotting the image data

# We can also query for and plot image data. There are two ways to do this.
# 
# 1. We can download the image data, as unmerged FITS file, and header data separately. An unmerged FITS file contains the image data, but almost no header metadata (except for a few keywords). This is the quickest and easiest option as the `drms.Client()` class can query the header and image data at the same time and store the keyword metadata and URLs to the image data in a Pandas dataframe. This eliminates the need to store FITS files locally. This method is also faster, as there is no need to wait for the exportdata system to generate FITS files. We can then download and open the unmerged FITS files via the [`astropy` package for FITS file handling](https://docs.astropy.org/en/stable/io/fits/index.html). 
# 
# 1. We can download the merged FITS file, which merges the header metadata and the image data together, and use [SunPy Map](https://docs.sunpy.org/en/stable/code_ref/map.html) to plot the image data. This is the easiest way to put the image data into a coordinate system, as the SunPy Map object will automatically use the WCS keyword data to plot the image data in the correct coordinate system. We can also read merged FITS via the [`astropy` package for FITS file handling](https://docs.astropy.org/en/stable/io/fits/index.html).
# 
# We go through each option below using an image of the radial component of the photospheric magnetic field as an example.

# #### Option 1: Download the image data, as unmerged FITS file, and header data separately

# Query the image data and header metadata using `drms`, then download and open the unmerged FITS file with `astropy`:

# In[17]:


hmi_query_string = 'hmi.sharp_cea_720s[377][2011.02.15_02:12:00_TAI]'
keys, segments = c.query(hmi_query_string, key='T_REC, USFLUX, ERRVF', seg='Br')
url = 'http://jsoc.stanford.edu' + segments.Br[0]   # add the jsoc.stanford.edu suffix to the segment name
photosphere_image = fits.open(url)                  # download and open the unmerged FITS file via astropy


# Plot the image data with `matplotlib`:

# In[18]:


hmimag = plt.get_cmap('hmimag')
plt.imshow(photosphere_image[1].data,cmap=hmimag,origin='lower',vmin=-3000,vmax=3000)
print('The dimensions of this image are',photosphere_image[1].data.shape[0],'by',photosphere_image[1].data.shape[1],'.')


# There are only a few keywords associated with the unmerged FITS file:

# In[19]:


photosphere_image[1].header


# But we can get all the header metadata information we like from the `drms` query:

# In[20]:


keys


# #### Option 2: Download the merged FITS file and use [SunPy Map](https://docs.sunpy.org/en/stable/code_ref/map.html) to plot the image data

# First we will download the FITS file from JSOC. 
# 
# <b> n.b. The code below will only work with a valid e-mail address. </b> In order to obtain one, users must register on the [JSOC exportdata website](http://jsoc.stanford.edu/ajax/exportdata.html).

# In[21]:


email = 'your@email.address'


# In[22]:


c = drms.Client(email=email, verbose=True)


# In[23]:


# Export the magnetogram as a FITS image
r = c.export(hmi_query_string+'{Br}', protocol='fits', email=email)
fits_url_hmi = r.urls['url'][0]
hmi_map = sunpy.map.Map(fits_url_hmi)


# In[24]:


fig = plt.figure()
hmi_map.plot(cmap=hmimag, vmin=-3000,vmax=3000)


# The image is now in the correct coordinate system! We can also inspect the map like this:

# In[25]:


hmi_map


# And look at the header metadata like this:

# In[26]:


hmi_map.fits_header['T_REC']


# If we only want to read the FITS file, we can use via the [`astropy` package for FITS file handling](https://docs.astropy.org/en/stable/io/fits/index.html).

# In[27]:


photosphere_image = fits.open(fits_url_hmi)  


# All of the header information is in `photosphere_image[1].header` as an [Astropy HDUList object](https://docs.astropy.org/en/stable/io/fits/api/hdulists.html#astropy.io.fits.HDUList). The image data is in `photosphere_image[1].data` as a numpy array.

# In[28]:


photosphere_image[1].header['T_REC']


# ## Exporting data

# JSOC allows users to export data as FITS files, jpg images and movies in mp4 or mpg format. 

# ### Exporting a jpg image

# We can easily export any image data as a jpg using one of several color tables (see the [expordata website](http://jsoc.stanford.edu/ajax/exportdata.html) under the jpg protocol for a list of color tables). Here is an example with NOAA Active Region 11158 at the time of the X2.2-class flare.

# In[29]:


protocol_args = {'ct': 'HMI_mag.lut', 'min': -3000, 'max': 3000}


# In[30]:


r = c.export('hmi.sharp_cea_720s[377][2011.02.14_02:00:00]{Br}', protocol='jpg', protocol_args=protocol_args, email=email)


# In[31]:


print(r.request_url)


# In[32]:


image_url = r.urls['url'][0]
print('The image is at this url: ',image_url,'.')


# ### Exporting a movie

# We can also easily create a movie using any image data! The movie below tracks NOAA Active Region 11158 over the same time interval as the plot above. In this case, the segment information, Br, is notated in the curly brackets within the data series query. 

# In[33]:


r = c.export('hmi.sharp_cea_720s[377][2011.02.14_15:00:00/12h]{Br}', protocol='mpg', protocol_args=protocol_args, email=email)


# In[34]:


print(r.request_url)


# In[35]:


print("You can download the movie from here:", r.urls['url'][0])