datatable_demo¶

This notebook demonstrates the use of the DataTable object in the ukds package.

This demonstration uses for an example the following dataset: Gershuny, J., Sullivan, O. (2017). United Kingdom Time Use Survey, 2014-2015. Centre for Time Use Research, University of Oxford. [data collection]. UK Data Service. SN: 8128, http://doi.org/10.5255/UKDA-SN-8128-1

Import the ukds package¶

This demonstration used the ukds package, which is available on PyPi.

In [1]:

import ukds

Set up a filepath to a .tab data table file¶

The filepath to the data table under study is specified here. This can be changed as needed.

In [2]:

fp_tab=r'C:\Users\cvskf\OneDrive - Loughborough University\_Data\United_Kingdom_Time_Use_Survey_2014-2015'+\
       r'\UKDA-8128-tab\tab\uktus15_household.tab'

Set up a filepath to a UKDS .rtf data dictionary file¶

The filepath to the associated data dictionary is specified here. This can be changed as needed.

In [3]:

fp_dd=r'C:\Users\cvskf\OneDrive - Loughborough University\_Data\United_Kingdom_Time_Use_Survey_2014-2015' + \
      r'\UKDA-8128-tab\mrdoc\allissue\uktus15_household_ukda_data_dictionary.rtf'

Create a DataTable object¶

A DataTable object is created. The filepaths are supplied as arguments and the files are read into the DataTable object.

In [4]:

dt=ukds.DataTable(fp_tab,fp_dd)
print(dt.__doc__)
print(dt)

A class for reading a UK Data Service .tab data table file
    
<ukds.data_table.DataTable object at 0x000001DBFB2F92B0>

The data table .tab file is stored in the tab attribute as a pandas DataFrame:

In [5]:

dt.tab.head()

Out[5]:

	serial	strata	psu	HhOut	hh_wt	IMonth	IYear	DM014	DM016	...	Relate10_P1	Relate10_P2	Relate10_P3	Relate10_P4	Relate10_P5	Relate10_P6	Relate10_P7	Relate10_P8	Relate10_P9	Relate10_P10
0	11010903	-2	-2	598	NaN	9	2014	0	0	...	-2	-2.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	11010904	-2	-2	598	NaN	9	2014	0	0	...	-2	-2.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	11010906	-2	-2	598	NaN	10	2014	0	0	...	-2	-2.0	-2.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	11010907	-2	-2	598	NaN	9	2014	1	1	...	-2	-2.0	-2.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	11010908	-2	-2	598	NaN	9	2014	0	0	...	-2	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

5 rows × 335 columns

The data dictionary .rtf file is stored in the datadictionary attribute as a ukds.DataDictionary object:

In [6]:

dt.datadictionary

Out[6]:

<ukds.data_dictionary.DataDictionary at 0x1dbfb2f9358>

Get dataframe¶

The information in the tab and datadictionary attributes can be combined by the get_dataframe method.

This method returns a new pandas Dataframe in which:

the columns are a multi-level index which hold the data dictionary information
the table values are converted from numerical values to the label values, where applicable

In [7]:

df=dt.get_dataframe()
df.head()

Out[7]:

variable	serial	strata	psu	HhOut	hh_wt	IMonth	IYear	DM014	DM016	DM510	...	Relate10_P1	Relate10_P2	Relate10_P3	Relate10_P4	Relate10_P5	Relate10_P6	Relate10_P7	Relate10_P8	Relate10_P9	Relate10_P10
variable_label	Household number	Strata	Primary sampling unit	Final outcome - household	Household weight	Interview month	Interview Year	Number of children aged 0-14	Number of children aged 0-16	Number of children aged 5-10	...	Relate10_P1: How related to person 10	Relate10_P2: How related to person 10	Relate10_P3: How related to person 10	Relate10_P4: How related to person 10	Relate10_P5: How related to person 10	Relate10_P6: How related to person 10	Relate10_P7: How related to person 10	Relate10_P8: How related to person 10	Relate10_P9: How related to person 10	Relate10_P10: How related to person 10
variable_type	numeric	numeric	numeric	numeric	numeric	numeric	numeric	numeric	numeric	numeric	...	numeric	numeric	numeric	numeric	numeric	numeric	numeric	numeric	numeric	numeric
SPSS_measurement_level	SCALE	SCALE	SCALE	SCALE	SCALE	NOMINAL	NOMINAL	NOMINAL	NOMINAL	NOMINAL	...	SCALE	SCALE	SCALE	SCALE	SCALE	SCALE	SCALE	SCALE	SCALE	NOMINAL
SPSS_user_missing_values											...
pos	1	2	3	4	5	6	7	8	9	10	...	326	327	328	329	330	331	332	333	334	335
0	11010903	Schedule not applicable	Schedule not applicable	Other reasons why unproductive	NaN	September	2014	0	0	0	...	Schedule not applicable	Schedule not applicable	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	11010904	Schedule not applicable	Schedule not applicable	Other reasons why unproductive	NaN	September	2014	0	0	0	...	Schedule not applicable	Schedule not applicable	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	11010906	Schedule not applicable	Schedule not applicable	Other reasons why unproductive	NaN	October	2014	0	0	0	...	Schedule not applicable	Schedule not applicable	Schedule not applicable	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	11010907	Schedule not applicable	Schedule not applicable	Other reasons why unproductive	NaN	September	2014	1	1	0	...	Schedule not applicable	Schedule not applicable	Schedule not applicable	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	11010908	Schedule not applicable	Schedule not applicable	Other reasons why unproductive	NaN	September	2014	0	0	0	...	Schedule not applicable	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

5 rows × 335 columns

In [ ]: