A quick test of textualising nomis data.
nomis is a source of UK labour market statistics data. Data is published via an API in a variety of forms including JSON and XML using SDMX. Data is also published in CSV format (select the item on the API page referring to help on using HTML documents).
This is a quick and crude attempt at generating textualiastions of the data (data2text) using simple string formating.
This allows us to generate text from data such as the following:
The most recent figure (August 2014) for persons claiming JSA for the Isle of Wight area is 1502. This compares with a figure of 2720 from a year ago (August 2013), a considerable decrease of 1218 since then. Last month saw a total amount of 1624, of which 1101 applicants were male and 523 female. Of the current total amount, 986 applicants were male and 516 were female. At the same time last year, 1845 applicants were male and 875 female.
(Note - thousands separators are easily added.)
We can also generate sentences of the following form - italicised elements are all inserted from the original datasource:
Figures came out for *August 2014* for unemployed claimants which show that *1,502* people on the *Isle of Wight* were claiming Job Seekers Allowance (JSA) in *August 2014*. Unemployment Benefit figures released by the Office for National Statistics show *a fall of 122* since *July 2014*, which reported *1,624* JSA claimants, and *a fall of 1,218* from *August 2013* (*2,720*).
That means *1.9%* of the resident *Isle of Wight* population *aged 16-64* are *persons claiming JSA* – *0.5% more* than the rest of the *South East* (*1.4%*), and *0.5% less* than the whole of the UK (*2.4%*).
import pandas as pd
#We can download a CSV file containing measures for a particular region
#For example, Jobseeker allowance on the Isle of Wight
tmp=pd.read_csv('http://www.nomisweb.co.uk/api/v01/dataset/NM_1_1.data.csv?geography=2038431803&sex=5&item=1&measures=20100')
tmp[:3]
DATE | DATE_NAME | DATE_CODE | DATE_TYPE | DATE_TYPECODE | DATE_SORTORDER | GEOGRAPHY | GEOGRAPHY_NAME | GEOGRAPHY_CODE | GEOGRAPHY_TYPE | ... | MEASURES | MEASURES_NAME | OBS_VALUE | OBS_STATUS | OBS_STATUS_NAME | OBS_CONF | OBS_CONF_NAME | URN | RECORD_OFFSET | RECORD_COUNT | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1983-06 | June 1983 | 1983-06 | date | 0 | 0 | 2038431803 | Isle of Wight | E06000046 | pre-2009 local authorities: district / unitary | ... | 20100 | Persons claiming JSA | 3504 | A | Normal Value | F | Free (free for publication) | Nm-1d1d31734e1d2038431803d5d1d20100 | 0 | 375 |
1 | 1983-07 | July 1983 | 1983-07 | date | 0 | 1 | 2038431803 | Isle of Wight | E06000046 | pre-2009 local authorities: district / unitary | ... | 20100 | Persons claiming JSA | 3458 | A | Normal Value | F | Free (free for publication) | Nm-1d1d31735e1d2038431803d5d1d20100 | 1 | 375 |
2 | 1983-08 | August 1983 | 1983-08 | date | 0 | 2 | 2038431803 | Isle of Wight | E06000046 | pre-2009 local authorities: district / unitary | ... | 20100 | Persons claiming JSA | 3409 | A | Normal Value | F | Free (free for publication) | Nm-1d1d31736e1d2038431803d5d1d20100 | 2 | 375 |
3 rows × 34 columns
tmp.columns
Index(['DATE', 'DATE_NAME', 'DATE_CODE', 'DATE_TYPE', 'DATE_TYPECODE', 'DATE_SORTORDER', 'GEOGRAPHY', 'GEOGRAPHY_NAME', 'GEOGRAPHY_CODE', 'GEOGRAPHY_TYPE', 'GEOGRAPHY_TYPECODE', 'GEOGRAPHY_SORTORDER', 'SEX', 'SEX_NAME', 'SEX_CODE', 'SEX_TYPE', 'SEX_TYPECODE', 'SEX_SORTORDER', 'ITEM', 'ITEM_NAME', 'ITEM_CODE', 'ITEM_TYPE', 'ITEM_TYPECODE', 'ITEM_SORTORDER', 'MEASURES', 'MEASURES_NAME', 'OBS_VALUE', 'OBS_STATUS', 'OBS_STATUS_NAME', 'OBS_CONF', 'OBS_CONF_NAME', 'URN', 'RECORD_OFFSET', 'RECORD_COUNT'], dtype='object')
#We can project just a subset of the columns using the select parameter
baseURL='http://www.nomisweb.co.uk/api/v01/dataset/NM_1_1.data.csv?'
url=baseURL+'geography=2038431803&sex=5,6,7&item=1&measures=20100'
#Projection
url+='&select=sex_name,geography_name,measures_name,date_code,date_name,obs_value'
tmp=pd.read_csv(url)
tmp[:9]
SEX_NAME | GEOGRAPHY_NAME | MEASURES_NAME | DATE_CODE | DATE_NAME | OBS_VALUE | |
---|---|---|---|---|---|---|
0 | Male | Isle of Wight | Persons claiming JSA | 1983-06 | June 1983 | 3504 |
1 | Female | Isle of Wight | Persons claiming JSA | 1983-06 | June 1983 | 1336 |
2 | Total | Isle of Wight | Persons claiming JSA | 1983-06 | June 1983 | 4840 |
3 | Male | Isle of Wight | Persons claiming JSA | 1983-07 | July 1983 | 3458 |
4 | Female | Isle of Wight | Persons claiming JSA | 1983-07 | July 1983 | 1302 |
5 | Total | Isle of Wight | Persons claiming JSA | 1983-07 | July 1983 | 4760 |
6 | Male | Isle of Wight | Persons claiming JSA | 1983-08 | August 1983 | 3409 |
7 | Female | Isle of Wight | Persons claiming JSA | 1983-08 | August 1983 | 1264 |
8 | Total | Isle of Wight | Persons claiming JSA | 1983-08 | August 1983 | 4673 |
#And can project just a subset of the columns, and also reshape them using the rows and cols parameters
url+='&rows=date_code,date_name,geography_name,measures_name,&cols=sex_name'
tmp=pd.read_csv(url)
tmp[:3]
DATE_CODE | DATE_NAME | GEOGRAPHY_NAME | MEASURES_NAME | Female | Male | Total | |
---|---|---|---|---|---|---|---|
0 | 1983-06 | June 1983 | Isle of Wight | Persons claiming JSA | 1336 | 3504 | 4840 |
1 | 1983-07 | July 1983 | Isle of Wight | Persons claiming JSA | 1302 | 3458 | 4760 |
2 | 1983-08 | August 1983 | Isle of Wight | Persons claiming JSA | 1264 | 3409 | 4673 |
#We can also limit the time - for example, over theprevious year
url+='&time=latest,prevyear'
df=pd.read_csv(url)
df
DATE_CODE | DATE_NAME | GEOGRAPHY_NAME | MEASURES_NAME | Female | Male | Total | |
---|---|---|---|---|---|---|---|
0 | 2013-08 | August 2013 | Isle of Wight | Persons claiming JSA | 875 | 1845 | 2720 |
1 | 2013-09 | September 2013 | Isle of Wight | Persons claiming JSA | 877 | 1810 | 2687 |
2 | 2013-10 | October 2013 | Isle of Wight | Persons claiming JSA | 914 | 1867 | 2781 |
3 | 2013-11 | November 2013 | Isle of Wight | Persons claiming JSA | 1016 | 1996 | 3012 |
4 | 2013-12 | December 2013 | Isle of Wight | Persons claiming JSA | 1027 | 2053 | 3080 |
5 | 2014-01 | January 2014 | Isle of Wight | Persons claiming JSA | 1065 | 2094 | 3159 |
6 | 2014-02 | February 2014 | Isle of Wight | Persons claiming JSA | 996 | 2017 | 3013 |
7 | 2014-03 | March 2014 | Isle of Wight | Persons claiming JSA | 888 | 1781 | 2669 |
8 | 2014-04 | April 2014 | Isle of Wight | Persons claiming JSA | 731 | 1554 | 2285 |
9 | 2014-05 | May 2014 | Isle of Wight | Persons claiming JSA | 645 | 1387 | 2032 |
10 | 2014-06 | June 2014 | Isle of Wight | Persons claiming JSA | 560 | 1195 | 1755 |
11 | 2014-07 | July 2014 | Isle of Wight | Persons claiming JSA | 523 | 1101 | 1624 |
12 | 2014-08 | August 2014 | Isle of Wight | Persons claiming JSA | 516 | 986 | 1502 |
Let's see if we can use that data to tell a story.
mostRecent=df.iloc[-1]
txt1='The most recent figure ({0}) for {1} for the {2} area is {3}.'.format(mostRecent['DATE_NAME'],
mostRecent['MEASURES_NAME'],
mostRecent['GEOGRAPHY_NAME'],
mostRecent['Total'])
print(txt1)
The most recent figure (August 2014) for Persons claiming JSA for the Isle of Wight area is 1502.
Let's just tidy up the case on the allowance title.
#http://stackoverflow.com/a/3847369/454773
decase = lambda s: s[:1].lower() + s[1:] if s else ''
txt1='The most recent figure ({0}) for {1} for the {2} area is {3}.'.format(mostRecent['DATE_NAME'],
decase(mostRecent['MEASURES_NAME']),
mostRecent['GEOGRAPHY_NAME'],
mostRecent['Total'])
print(txt1)
The most recent figure (August 2014) for persons claiming JSA for the Isle of Wight area is 1502.
Let's make a comparison with a year ago.
yearago=df.iloc[0]
txt2='This compares with a figure of {0} from a year ago ({1}).'.format(yearago['Total'],yearago['DATE_NAME'])
print(txt1)
print(txt2)
The most recent figure (August 2014) for persons claiming JSA for the Isle of Wight area is 1502. This compares with a figure of 2720 from a year ago (August 2013).
We can tweak this further to calculate the difference.
txt3=txt2[:-1]
yeardelta= mostRecent['Total'] - yearago['Total']
if yeardelta==0:
txt3+=', exactly the same amount.'
else:
txt3+=', a change of {0}.'.format(yeardelta)
print(txt1)
print(txt3)
The most recent figure (August 2014) for persons claiming JSA for the Isle of Wight area is 1502. This compares with a figure of 2720 from a year ago (August 2013), a change of -1218.
We can also make it clear in which direction the change took place.
txt4=txt2[:-1]
if yeardelta==0:
txt4+=', exactly the same amount.'
else:
if yeardelta <0: direction='decrease'
else: direction='increase'
txt4+=', {0} of {1}.'.format(p.a(direction),abs(yeardelta))
print(txt1)
print(txt4)
The most recent figure (August 2014) for persons claiming JSA for the Isle of Wight area is 1502. This compares with a figure of 2720 from a year ago (August 2013), a decrease of 1218.
We might want to add a bit of flexibility into the text, so it's form can changes slightly each presentation.
#Library to support natural language text generation
#!pip3 install git+https://github.com/pwdyson/inflect.py
import inflect
p = inflect.engine()
import random
def _txt5():
txt5=txt2[:-1]
if yeardelta==0:
txt4+=', exactly the same amount.'
else:
if yeardelta <0: direction=random.choice(['decrease', 'fall'])
else: direction=random.choice(['increase','rise'])
txt5+=', and represents {0} of {1} since then.'.format(p.a(direction),abs(yeardelta))
return txt5
print(txt1)
print(_txt5())
print(_txt5())
print(_txt5())
The most recent figure (August 2014) for persons claiming JSA for the Isle of Wight area is 1502. This compares with a figure of 2720 from a year ago (August 2013), and represents a decrease of 1218 since then. This compares with a figure of 2720 from a year ago (August 2013), and represents a fall of 1218 since then. This compares with a figure of 2720 from a year ago (August 2013), and represents a fall of 1218 since then.
We might want to add in an interpretation of how much the figures have differed.
def _txt6(filler=','):
def _magnitude(term):
#A heuristic here
if propDelta<0.05:
mod=random.choice(['slight'])
elif propDelta<0.10:
mod=random.choice(['significant'])
else:
mod=random.choice(['considerable','large' ])
term=' '.join([mod,term])
return p.a(term)
txt6=txt2[:-1]
propDelta= abs(yeardelta)/mostRecent['Total']
if yeardelta==0:
txt6+=', exactly the same amount.'
else:
if yeardelta <0: direction=_magnitude(random.choice([ 'decrease','fall']))
else: direction=_magnitude(random.choice(['increase','rise']))
txt6+='{_filler} {0} of {1} since then.'.format(p.a(direction),
abs(yeardelta),
_filler= filler )
return txt6
print(txt1)
print(_txt6())
print(_txt6())
print(_txt6(' and represents'))
print(_txt6())
print(_txt6())
The most recent figure (August 2014) for persons claiming JSA for the Isle of Wight area is 1502. This compares with a figure of 2720 from a year ago (August 2013), a large fall of 1218 since then. This compares with a figure of 2720 from a year ago (August 2013), a considerable fall of 1218 since then. This compares with a figure of 2720 from a year ago (August 2013) and represents a considerable decrease of 1218 since then. This compares with a figure of 2720 from a year ago (August 2013), a considerable fall of 1218 since then. This compares with a figure of 2720 from a year ago (August 2013), a considerable decrease of 1218 since then.
How about a monthly comparison too?
lastMonth=df.iloc[-2]
def updown(now, then,amount=False, magnitude=False):
txt=[]
delta=now - then
propdelta = delta/now
if magnitude and delta!=0:
if delta<0.05:
txt.append(random.choice(['slightly']))
elif delta<0.10:
txt.append(random.choice(['significantly']))
else:
txt.append(random.choice(['considerably']))
if now>then: txt.append('up')
elif now<then: txt.append('down')
else: txt.append('no change')
txt=' '.join(txt)
if amount:
txt='{0} {1}'.format(txt, abs(now-then) )
return txt
def _txt7(amount=False,magnitude=False):
txt7=txt6[:-1]
txt7+=', {0} on the previous month.'.format(updown(mostRecent['Total'],lastMonth['Total'],amount,magnitude))
return txt7
txt6=_txt6()
print(_txt7())
txt6=_txt6()
print(_txt7(amount=True,magnitude=False))
txt6=_txt6()
print(_txt7(amount=False,magnitude=True))
txt6=_txt6()
print(_txt7(amount=True,magnitude=True))
This compares with a figure of 2720 from a year ago (August 2013), a large fall of 1218, down on the previous month. This compares with a figure of 2720 from a year ago (August 2013), a considerable decrease of 1218, down 122 on the previous month. This compares with a figure of 2720 from a year ago (August 2013), a decrease of 1218, slightly down on the previous month. This compares with a figure of 2720 from a year ago (August 2013), a decrease of 1218, slightly down 122 on the previous month.
The table also contained information about the gender split. So what might we let the machine write about that?
def _txt8():
txt8='''\
Of the current total{_figure}, {maleCount} applicants \
were male and {femaleCount} female.\
'''.format(femaleCount=mostRecent['Female'],
maleCount=mostRecent['Male'],
_figure=random.choice(['',' figure']))
return txt8
print(_txt8())
print(_txt8())
print(_txt8())
print(_txt8())
Of the current total, 986 applicants were male and 516 female. Of the current total figure, 986 applicants were male and 516 female. Of the current total figure, 986 applicants were male and 516 female. Of the current total figure, 986 applicants were male and 516 female.
We could perhaps introduce some variation in the order?
def _txt9():
genderCount=[ ('male',mostRecent['Male']),('female',mostRecent['Female'])]
if random.choice([True, False]):
genderCount=[genderCount[1],genderCount[0]]
txt9='''\
Of {_period} total{_figure}, {A_count} applicants \
were {A_gender} and {B_count}{_space_were}{B_gender}.\
'''.format(A_count=genderCount[0][1], A_gender=genderCount[0][0],
B_count=genderCount[1][1], B_gender=genderCount[1][0],
_figure=random.choice(['',' figure',' amount']),
_space_were=random.choice([' ',' were ']),
_period=random.choice(["this month's",'the current'])
)
return txt9
print(_txt9())
print(_txt9())
print(_txt9())
print(_txt9())
print(_txt9())
Of the current total amount, 516 applicants were female and 986 male. Of this month's total figure, 986 applicants were male and 516 were female. Of the current total figure, 516 applicants were female and 986 male. Of the current total amount, 516 applicants were female and 986 were male. Of this month's total, 516 applicants were female and 986 were male.
We can then try to generalise that function a little more, so it can produce things from other months.
I iterated on the following function several times, increasing it's complexity and checking the result by inspection of multiple generated sentences at each step.
def _txt10(data, period,randGender=False, opener='Of',total=True,showTotal=False,filler=False):
genderCount=[ ('male',data['Male']),('female',data['Female'])]
if randGender and random.choice([True, False]):
genderCount=[genderCount[1],genderCount[0]]
txt10='''\
{_opener} {_period}{_total},{_filler} {A_count} applicants \
were {A_gender} and {B_count}{_space_were}{B_gender}.\
'''.format(A_count=genderCount[0][1], A_gender=genderCount[0][0],
B_count=genderCount[1][1], B_gender=genderCount[1][0],
_space_were=random.choice([' ',' were ']),
_period=random.choice(period),
_opener=opener,
_total = ' total{_figure}{_showTotal}'.format(_figure=random.choice(['',' figure',' amount']),
_showTotal= '' if not showTotal else ' of {0}'.format(data['Total'])) if total else '',
_filler='' if not filler else filler,
)
return txt10
print(_txt10(mostRecent,["this month's",'the current','the most recent']))
print(_txt10(mostRecent,["this month's",'the current','the most recent'],randGender=True))
print(_txt10(mostRecent,["this month's",'the current','the most recent'],randGender=True))
print(_txt10(yearago,["the same month last year's",],opener='From',showTotal=True))
print(_txt10(lastMonth,["last month's"]))
print(_txt10(mostRecent,['the most recent'],opener='From',showTotal=True))
print()
print(_txt10(mostRecent,["this month's",'the current','the most recent']))
print(_txt10(lastMonth,["last month's"],opener='This compares with',filler=' of which'))
print(_txt10(yearago,["the same time last year"],opener='At',total=False))
Of the most recent total figure, 986 applicants were male and 516 were female. Of this month's total amount, 986 applicants were male and 516 female. Of this month's total figure, 986 applicants were male and 516 female. From the same month last year's total amount of 2720, 1845 applicants were male and 875 were female. Of last month's total figure, 1101 applicants were male and 523 were female. From the most recent total amount of 1502, 986 applicants were male and 516 female. Of the current total figure, 986 applicants were male and 516 were female. This compares with last month's total amount, of which 1101 applicants were male and 523 were female. At the same time last year, 1845 applicants were male and 875 were female.
So what have we got? Something like this perhaps?
print(txt1)
print(_txt6())
print(_txt10(lastMonth,["month saw a"],opener='Last',filler=' of which',showTotal=True))
print(_txt10(mostRecent,["this month's",'the current','the most recent']))
print(_txt10(yearago,["the same time last year"],opener='At',total=False))
The most recent figure (August 2014) for persons claiming JSA for the Isle of Wight area is 1502. This compares with a figure of 2720 from a year ago (August 2013), a considerable decrease of 1218 since then. Last month saw a total amount of 1624, of which 1101 applicants were male and 523 female. Of the current total amount, 986 applicants were male and 516 were female. At the same time last year, 1845 applicants were male and 875 female.
Charts are often useful too...
from ggplot import *
df_long=pd.melt(df, id_vars=['DATE_CODE','DATE_NAME','GEOGRAPHY_NAME','MEASURES_NAME'], value_vars= ['Female','Male','Total'] )
df_long['date']=pd.to_datetime(df_long['DATE_CODE'], format='%Y-%m')
df_long[:5]
DATE_CODE | DATE_NAME | GEOGRAPHY_NAME | MEASURES_NAME | variable | value | date | |
---|---|---|---|---|---|---|---|
0 | 2013-08 | August 2013 | Isle of Wight | Persons claiming JSA | Female | 875 | 2013-08-01 |
1 | 2013-09 | September 2013 | Isle of Wight | Persons claiming JSA | Female | 877 | 2013-09-01 |
2 | 2013-10 | October 2013 | Isle of Wight | Persons claiming JSA | Female | 914 | 2013-10-01 |
3 | 2013-11 | November 2013 | Isle of Wight | Persons claiming JSA | Female | 1016 | 2013-11-01 |
4 | 2013-12 | December 2013 | Isle of Wight | Persons claiming JSA | Female | 1027 | 2013-12-01 |
ggplot(df_long,aes(x='date',y='value',colour='variable')) + geom_line() + ylim(0,3500) + ggtitle('JSA Claims, Isle of Wight')
<ggplot: (8733771965346)>
Hyperlocal blog OnTheWight publish JSA updates in the following style:
Figures for unemployed claimants came out this morning which show that 1,502 people on the Isle of Wight were claiming Job Seekers Allowance (JSA) in August.Unemployment Benefit figures released by the Office for National Statistics show a fall of 122 since July, which reported 1,624 JSA claimants, and a fall of 1,218 from August 2013 (2,720).
That means 1.9% of the resident population of area aged 16-64 is claiming JSA – 0.6% more than the rest of the South East (1.3%), and 0.4% than the whole of the UK (2.3%).
Let's see if we can replicate that...
let's also tidy up the presentation of numbers, using commas as a thousands separator.
#Thousands separator
def c(amount):
return '{:,}'.format(amount)
def otwRiseFall(now,then,amount=False):
delta=now-then
if delta>0:
txt=p.a(random.choice(['rise','increase']))
elif delta<0:
txt=p.a(random.choice(['fall','decrease']))
if amount:
txt+=' of {0}'.format(c(abs(delta)))
return txt
otw1='''
Figures came out for {date} for unemployed claimants which show that {currNum} people on {place} \
were claiming Job Seekers Allowance (JSA) in {currDate}. \
Unemployment Benefit figures released by the Office for National Statistics show \
{monthChange} since {lastMonthDate}, which reported {lastMonthNum} JSA claimants, \
and {yearChange} from {yearagoDate} ({yearagoNum}).
'''.format(date=mostRecent['DATE_NAME'],
currNum=c(mostRecent['Total']),
place='the '+mostRecent['GEOGRAPHY_NAME'],
currDate=mostRecent['DATE_NAME'],
monthChange=otwRiseFall(mostRecent['Total'],lastMonth['Total'],True),
lastMonthDate=lastMonth['DATE_NAME'],
lastMonthNum=c(lastMonth['Total']),
yearChange=otwRiseFall(mostRecent['Total'],yearago['Total'],True),
yearagoDate=yearago['DATE_NAME'],
yearagoNum=c(yearago['Total'])
)
print(otw1)
Figures came out for August 2014 for unemployed claimants which show that 1,502 people on the Isle of Wight were claiming Job Seekers Allowance (JSA) in August 2014. Unemployment Benefit figures released by the Office for National Statistics show a fall of 122 since July 2014, which reported 1,624 JSA claimants, and a fall of 1,218 from August 2013 (2,720).
The next paragraph in the OnTheWight report requires more data, about the JSA percentage rates.
Calculating the rates requires dividing the JSA count by a population count - but what count?
Simple population estimates can be found on the ONS website - ONS population estimates - but the nomis API provides access to a wide range of grouped population figures. Referring to the nomis dataset listing, we find the annual population survey (NM_17_1) family of indicators which details what figures are available.
The nomis monthly JSA report for the Isle of Wight is the one presumably used as the basis for the onTheWight report, although the JSA rate is not given for the Island. The calculated rates are calculated relative to the following population basis: % is a proportion of resident population of area aged 16-64. On nomis, that's the indicator 402720769 T01:22 (Aged 16-64 - All : All People ).
nomis reports for the UK and the South East provide various sets of figures for JSA claimants; the figure that appears to be used as the basis for the national and regional rate on the Isle of Wight pages seem to be the figure for JSA claimants by age and duration - All ages. Rates in that table are calculated according to the following basis: % is number of persons claiming JSA as a proportion of resident population of the same age. The broken out age ranges specified are 18-24, 25-49, 50-64, although the corresponding figures total to less than the All ages figure. The Working Age Client Group table uses the following population basis: % is a proportion of resident population of area aged 16-64. (The 16-17 age group may count for the shortfall in the JSA summed age group figures compared to the All ages figure? Or can people aged 65+ claim JSA?) So I'm guessing that the population basis to use is: 402720769 T01:22 (Aged 16-64 - All : All People ).
(An alternative rate calculation is % is a proportion of claimant count + workforce jobs total, either seasonally adjusted or not, which unsurprisingly gives a higher rate, but does use the possible workforce population as perhaps a more reasonable basis for the rate?)
def getLatestJSA(code):
baseURL='http://www.nomisweb.co.uk/api/v01/dataset/NM_1_1.data.csv?'
url=baseURL+'geography={code}&date=latest&sex=7&item=1&measures=20100'.format(code=code)
#Projection
url+='&select=sex_name,geography_name,measures_name,date_code,date_name,obs_value'
return pd.read_csv(url).dropna()
def get16_64Population(code):
url='http://www.nomisweb.co.uk/api/v01/dataset/NM_17_1.data.csv?date=latest&geography={code}&measures=20100&cell=402720769'.format(code=code)
url+='&select=date_code,date_name,geography_name,measures_name,cell_name,obs_value,cell_code'#&item=1&measures=20100
return pd.read_csv(url).dropna()
def JSA_rate(code):
return getLatestJSA(code)['OBS_VALUE'].iloc[0]/get16_64Population(code)['OBS_VALUE'].iloc[0]
iwCode=2038431803
ukCode=2092957697
seCode=2013265928
getLatestJSA(iwCode)
SEX_NAME | GEOGRAPHY_NAME | MEASURES_NAME | DATE_CODE | DATE_NAME | OBS_VALUE | |
---|---|---|---|---|---|---|
0 | Total | Isle of Wight | Persons claiming JSA | 2014-08 | August 2014 | 1502 |
get16_64Population(iwCode)
DATE_CODE | DATE_NAME | GEOGRAPHY_NAME | MEASURES_NAME | CELL_NAME | OBS_VALUE | CELL_CODE | |
---|---|---|---|---|---|---|---|
0 | 2014-03 | Apr 2013-Mar 2014 | Isle of Wight | Value | T01:22 (Aged 16-64 - All : All People ) | 80700 | T01:22 |
def rateGetter(code):
jsa=getLatestJSA(code)
pop=get16_64Population(code)
txt='''
The percentage rate of {claim} for the {loc} region in {date} was {rate}, based on {num} claimants and \
population {poptype} of {popnum} ({popDate} figure).
'''.format(claim=decase(jsa['MEASURES_NAME'].iloc[0]),
loc=jsa['GEOGRAPHY_NAME'].iloc[0],
date=jsa['DATE_NAME'].iloc[0],
rate='{0:.2f}%'.format(100 * JSA_rate(code)),
num=c(int(jsa['OBS_VALUE'].iloc[0])),
poptype=decase(pop['CELL_NAME'].iloc[0].split('(')[1].split(' -')[0]),
popnum=c(int(pop['OBS_VALUE'].iloc[0])),
popDate=pop['DATE_NAME'].iloc[0])
print(txt)
rateGetter(iwCode)
rateGetter(seCode)
rateGetter(ukCode)
The percentage rate of persons claiming JSA for the Isle of Wight region in August 2014 was 1.86%, based on 1,502 claimants and population aged 16-64 of 80,700 (Apr 2013-Mar 2014 figure). The percentage rate of persons claiming JSA for the South East region in August 2014 was 1.35%, based on 73,563 claimants and population aged 16-64 of 5,445,000 (Apr 2013-Mar 2014 figure). The percentage rate of persons claiming JSA for the United Kingdom region in August 2014 was 2.39%, based on 961,149 claimants and population aged 16-64 of 40,298,700 (Apr 2013-Mar 2014 figure).
How abut the percentage rate comparison?
Note the rounding has an effect - to nearest, or round down?
localCode=iwCode
regionCode=seCode
jsaLocal=getLatestJSA(localcode)
jsaRegion=getLatestJSA(regionCode)
jsaLocal_rate=JSA_rate(localCode)
jsaRegion_rate=JSA_rate(regionCode)
jsaUK_rate=JSA_rate(ukCode)
import decimal
'''
ROUND_CEILING (towards Infinity),
ROUND_DOWN (towards zero),
ROUND_FLOOR (towards -Infinity),
ROUND_HALF_DOWN (to nearest with ties going towards zero),
ROUND_HALF_EVEN (to nearest with ties going to nearest even integer),
ROUND_HALF_UP (to nearest with ties going away from zero), or
ROUND_UP (away from zero).
ROUND_05UP (away from zero if last digit after rounding towards zero would have been 0 or 5; otherwise towards zero)
'''
def pc(amount,rounding=''):
if rounding=='down': rounding=decimal.ROUND_DOWN
elif rounding=='up': rounding=decimal.ROUND_UP
else: rounding=decimal.ROUND_HALF_UP
ramount=float(decimal.Decimal(100 * amount).quantize(decimal.Decimal('.1'), rounding=rounding))
return '{0:.1f}%'.format(ramount)
def otwMoreLess(now,then):
delta=now-then
if delta>0:
txt=random.choice(['more'])
elif delta<0:
txt=random.choice(['less'])
return txt
def otwPCmoreLess(this,that):
delta=this-that
return '{delta} {diff}'.format(delta=pc(abs(delta)),diff=otwMoreLess(this,that))
otw3='''
That means {localrate} of the resident {localarea} population {poptype} are {claim} \
– {regiondiff} than the rest of the {region} ({regionrate}), \
and {ukdiff} than the whole of the UK ({ukrate}).
'''.format(localrate=pc(jsaLocal_rate),
localarea=jsaLocal['GEOGRAPHY_NAME'].iloc[0],
poptype=decase(get16_64Population(localcode)['CELL_NAME'].iloc[0].split('(')[1].split(' -')[0]),
claim=decase(jsaLocal['MEASURES_NAME'].iloc[0]),
regiondiff=otwPCmoreLess(jsaLocal_rate,jsaRegion_rate),
region=jsaRegion['GEOGRAPHY_NAME'].iloc[0],
regionrate=pc(JSA_rate(regionCode)),
ukdiff=otwPCmoreLess(jsaLocal_rate,jsaUK_rate),
ukrate=pc(jsaUK_rate))
print(otw3)
That means 1.9% of the resident Isle of Wight population aged 16-64 are persons claiming JSA – 0.5% more than the rest of the South East (1.4%), and 0.5% less than the whole of the UK (2.4%).
JSA_rate(seCode)
0.013510192837465565
JSA_rate(ukCode)
0.023850620491479874
#.....doodles and fragments...
#Geographies codelist
#http://www.nomisweb.co.uk/api/v01/dataset/NM_1_1/geography/2092957697.def.htm
#http://www.nomisweb.co.uk/api/v01/dataset/NM_1_1/geography/2092957697TYPE480.def.htm
'''
CL_1_1_GEOGRAPHY
value Description
2013265921 North East
2013265922 North West
2013265923 Yorkshire and The Humber
2013265924 East Midlands
2013265925 West Midlands
2013265926 East
2013265927 London
2013265928 South East
2013265929 South West
2013265930 Wales
2013265931 Scotland
2013265932 Northern Ireland
'''