In [1]:

%matplotlib inline

NPD data¶

Start here >> https://portal.diskos.cgg.com/whereoil-data/

Bit confusing as there are a lot of places to get data.

I'll start with FactPages because I've heard of that...

FactPages... Use pandas to read CSV directly¶

Right-click and copy URL for CSV from this link:

http://factpages.npd.no/factpages/Default.aspx?culture=nb-no&nav1=field&nav2=TableView%7CProduction%7CSaleable%7CMonthly

In [2]:

csv = "http://factpages.npd.no/ReportServer?/FactPages/TableView/field_production_monthly&rs:Command=Render&rc:Toolbar=false&rc:Parameters=f&rs:Format=CSV&Top100=false&IpAddress=213.236.218.114&CultureCode=nb-no"

In [3]:

import pandas as pd

df = pd.read_csv(csv)

In [4]:

df.head()

Out[4]:

	prfInformationCarrier	prfYear	prfMonth	prfPrdOilNetMillSm3	prfPrdNGLNetMillSm3	prfPrdOeNetMillSm3	prfPrdProducedWaterInFieldMillSm3	prfNpdidInformationCarrier
0	33/9-6 DELTA	2009	7	0.00025	-0.00004	0.00021	0.00051	44576
1	33/9-6 DELTA	2009	8	0.00325	0.00014	0.00339	0.00063	44576
2	33/9-6 DELTA	2009	9	0.00317	0.00010	0.00327	0.00316	44576
3	33/9-6 DELTA	2009	10	0.00294	0.00012	0.00306	0.00535	44576
4	33/9-6 DELTA	2009	11	-0.00001	-0.00004	-0.00005	0.00000	44576

In [5]:

len(df)

Out[5]:

We could save this if we wanted to...

In [6]:

#df.to_csv('npd.csv', index=False)

Add a datetime¶

Give this a 'DS' (datestamp) column with pandas datetimes. To do this easily, we need columns named like 'year', 'month', 'day'.

In [26]:

#df = pd.read_csv('npd.csv')

In [20]:

columns = {'prfYear': 'year',
           'prfMonth': 'month',
           'prfInformationCarrier': 'field',
           'prfPrdOilNetMillSm3': 'oil',
           'prfPrdOeNetMillSm3': 'OE',
           'prfPrdProducedWaterInFieldMillSm3': 'water',
          }

df = df.rename(columns=columns)

In [21]:

df['day'] = 1

In [22]:

df['DS'] = pd.to_datetime(df[['year', 'month', 'day']])
df = df.set_index('DS')

In [23]:

df.head()

Out[23]:

	field	year	month	oil	prfPrdGasNetBillSm3	prfPrdNGLNetMillSm3	prfPrdCondensateNetMillSm3	OE	water	prfNpdidInformationCarrier	day
DS
2009-07-01	33/9-6 DELTA	2009	7	0.00025	0.0	-0.00004	0.0	0.00021	0.00051	44576	1
2009-08-01	33/9-6 DELTA	2009	8	0.00325	0.0	0.00014	0.0	0.00339	0.00063	44576	1
2009-09-01	33/9-6 DELTA	2009	9	0.00317	0.0	0.00010	0.0	0.00327	0.00316	44576	1
2009-10-01	33/9-6 DELTA	2009	10	0.00294	0.0	0.00012	0.0	0.00306	0.00535	44576	1
2009-11-01	33/9-6 DELTA	2009	11	-0.00001	0.0	-0.00004	0.0	-0.00005	0.00000	44576	1

In [24]:

df = df[['field', 'oil', 'OE', 'water']]
df.OE -= df.oil

In [25]:

df.head()

Out[25]:

	field	oil	OE	water
DS
2009-07-01	33/9-6 DELTA	0.00025	-0.00004	0.00051
2009-08-01	33/9-6 DELTA	0.00325	0.00014	0.00063
2009-09-01	33/9-6 DELTA	0.00317	0.00010	0.00316
2009-10-01	33/9-6 DELTA	0.00294	0.00012	0.00535
2009-11-01	33/9-6 DELTA	-0.00001	-0.00004	0.00000

Pandas is very useful for time series!¶

In [27]:

import matplotlib.pyplot as plt

plt.figure(figsize=(15,3))

df.water[df.field=='TROLL'].plot()
df.oil[df.field=='TROLL'].plot()
df.OE[df.field=='TROLL'].plot()

Out[27]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f9f652cbdd8>

Once you're used to the pandas syntax, it's fairly easy to make the mean monthly production for a time range:

In [28]:

df.loc[df.field=='TROLL'].loc['2000':'2010'].resample('Y').mean().plot()

Out[28]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f9f6515bac8>

Or to get the summed annual production:

In [29]:

df.loc[df.field=='TROLL'].loc['2000':'2010'].resample('Y').sum()

Out[29]:

	oil	OE	water
DS
2000-12-31	18.36151	25.41618	15.81369
2001-12-31	19.50288	22.07272	18.49151
2002-12-31	21.19424	26.34914	18.91939
2003-12-31	20.82079	27.68133	21.91749
2004-12-31	17.76382	27.09299	23.63560
2005-12-31	14.09268	29.64932	22.59443
2006-12-31	10.81995	32.10073	21.25846
2007-12-31	9.42551	37.14892	21.66064
2008-12-31	8.16830	30.81331	21.53865
2009-12-31	7.84924	25.46659	20.95653
2010-12-31	6.86951	29.62816	19.98388

Throw .plot() on the end:

In [30]:

df.loc[df.field=='TROLL'].loc['2000':'2010'].resample('Y').sum().plot()

Out[30]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f9f650ea358>

Or we can get totals for ALL fields in the database:

In [31]:

df.loc['2010':].resample('M').sum().plot()

Out[31]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f9f64589358>

Let's look at the contribution TROLL made to NCS production since 1993:

In [32]:

fig, ax = plt.subplots()
df.loc['1993':'2018', 'oil'].resample('Y').sum().plot(ax=ax)
df.loc[df.field!='TROLL'].loc['1993':'2018', 'oil'].resample('Y').sum().plot(ax=ax)

Out[32]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f9f644fb4a8>

In [ ]: