%matplotlib inline
Start here >> https://portal.diskos.cgg.com/whereoil-data/
Bit confusing as there are a lot of places to get data.
I'll start with FactPages because I've heard of that...
Right-click and copy URL for CSV from this link:
csv = "http://factpages.npd.no/ReportServer?/FactPages/TableView/field_production_monthly&rs:Command=Render&rc:Toolbar=false&rc:Parameters=f&rs:Format=CSV&Top100=false&IpAddress=213.236.218.114&CultureCode=nb-no"
import pandas as pd
df = pd.read_csv(csv)
df.head()
prfInformationCarrier | prfYear | prfMonth | prfPrdOilNetMillSm3 | prfPrdGasNetBillSm3 | prfPrdNGLNetMillSm3 | prfPrdCondensateNetMillSm3 | prfPrdOeNetMillSm3 | prfPrdProducedWaterInFieldMillSm3 | prfNpdidInformationCarrier | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 33/9-6 DELTA | 2009 | 7 | 0.00025 | 0.0 | -0.00004 | 0.0 | 0.00021 | 0.00051 | 44576 |
1 | 33/9-6 DELTA | 2009 | 8 | 0.00325 | 0.0 | 0.00014 | 0.0 | 0.00339 | 0.00063 | 44576 |
2 | 33/9-6 DELTA | 2009 | 9 | 0.00317 | 0.0 | 0.00010 | 0.0 | 0.00327 | 0.00316 | 44576 |
3 | 33/9-6 DELTA | 2009 | 10 | 0.00294 | 0.0 | 0.00012 | 0.0 | 0.00306 | 0.00535 | 44576 |
4 | 33/9-6 DELTA | 2009 | 11 | -0.00001 | 0.0 | -0.00004 | 0.0 | -0.00005 | 0.00000 | 44576 |
len(df)
20003
We could save this if we wanted to...
#df.to_csv('npd.csv', index=False)
Give this a 'DS' (datestamp) column with pandas
datetimes. To do this easily, we need columns named like 'year'
, 'month'
, 'day'
.
#df = pd.read_csv('npd.csv')
columns = {'prfYear': 'year',
'prfMonth': 'month',
'prfInformationCarrier': 'field',
'prfPrdOilNetMillSm3': 'oil',
'prfPrdOeNetMillSm3': 'OE',
'prfPrdProducedWaterInFieldMillSm3': 'water',
}
df = df.rename(columns=columns)
df['day'] = 1
df['DS'] = pd.to_datetime(df[['year', 'month', 'day']])
df = df.set_index('DS')
df.head()
field | year | month | oil | prfPrdGasNetBillSm3 | prfPrdNGLNetMillSm3 | prfPrdCondensateNetMillSm3 | OE | water | prfNpdidInformationCarrier | day | |
---|---|---|---|---|---|---|---|---|---|---|---|
DS | |||||||||||
2009-07-01 | 33/9-6 DELTA | 2009 | 7 | 0.00025 | 0.0 | -0.00004 | 0.0 | 0.00021 | 0.00051 | 44576 | 1 |
2009-08-01 | 33/9-6 DELTA | 2009 | 8 | 0.00325 | 0.0 | 0.00014 | 0.0 | 0.00339 | 0.00063 | 44576 | 1 |
2009-09-01 | 33/9-6 DELTA | 2009 | 9 | 0.00317 | 0.0 | 0.00010 | 0.0 | 0.00327 | 0.00316 | 44576 | 1 |
2009-10-01 | 33/9-6 DELTA | 2009 | 10 | 0.00294 | 0.0 | 0.00012 | 0.0 | 0.00306 | 0.00535 | 44576 | 1 |
2009-11-01 | 33/9-6 DELTA | 2009 | 11 | -0.00001 | 0.0 | -0.00004 | 0.0 | -0.00005 | 0.00000 | 44576 | 1 |
df = df[['field', 'oil', 'OE', 'water']]
df.OE -= df.oil
df.head()
field | oil | OE | water | |
---|---|---|---|---|
DS | ||||
2009-07-01 | 33/9-6 DELTA | 0.00025 | -0.00004 | 0.00051 |
2009-08-01 | 33/9-6 DELTA | 0.00325 | 0.00014 | 0.00063 |
2009-09-01 | 33/9-6 DELTA | 0.00317 | 0.00010 | 0.00316 |
2009-10-01 | 33/9-6 DELTA | 0.00294 | 0.00012 | 0.00535 |
2009-11-01 | 33/9-6 DELTA | -0.00001 | -0.00004 | 0.00000 |
import matplotlib.pyplot as plt
plt.figure(figsize=(15,3))
df.water[df.field=='TROLL'].plot()
df.oil[df.field=='TROLL'].plot()
df.OE[df.field=='TROLL'].plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7f9f652cbdd8>
Once you're used to the pandas
syntax, it's fairly easy to make the mean monthly production for a time range:
df.loc[df.field=='TROLL'].loc['2000':'2010'].resample('Y').mean().plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7f9f6515bac8>
Or to get the summed annual production:
df.loc[df.field=='TROLL'].loc['2000':'2010'].resample('Y').sum()
oil | OE | water | |
---|---|---|---|
DS | |||
2000-12-31 | 18.36151 | 25.41618 | 15.81369 |
2001-12-31 | 19.50288 | 22.07272 | 18.49151 |
2002-12-31 | 21.19424 | 26.34914 | 18.91939 |
2003-12-31 | 20.82079 | 27.68133 | 21.91749 |
2004-12-31 | 17.76382 | 27.09299 | 23.63560 |
2005-12-31 | 14.09268 | 29.64932 | 22.59443 |
2006-12-31 | 10.81995 | 32.10073 | 21.25846 |
2007-12-31 | 9.42551 | 37.14892 | 21.66064 |
2008-12-31 | 8.16830 | 30.81331 | 21.53865 |
2009-12-31 | 7.84924 | 25.46659 | 20.95653 |
2010-12-31 | 6.86951 | 29.62816 | 19.98388 |
Throw .plot()
on the end:
df.loc[df.field=='TROLL'].loc['2000':'2010'].resample('Y').sum().plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7f9f650ea358>
Or we can get totals for ALL fields in the database:
df.loc['2010':].resample('M').sum().plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7f9f64589358>
Let's look at the contribution TROLL made to NCS production since 1993:
fig, ax = plt.subplots()
df.loc['1993':'2018', 'oil'].resample('Y').sum().plot(ax=ax)
df.loc[df.field!='TROLL'].loc['1993':'2018', 'oil'].resample('Y').sum().plot(ax=ax)
<matplotlib.axes._subplots.AxesSubplot at 0x7f9f644fb4a8>