This notebook demonstrates the use of the DataTable object in the ukds package.
This demonstration uses for an example the following dataset: Gershuny, J., Sullivan, O. (2017). United Kingdom Time Use Survey, 2014-2015. Centre for Time Use Research, University of Oxford. [data collection]. UK Data Service. SN: 8128, http://doi.org/10.5255/UKDA-SN-8128-1
This demonstration used the ukds
package, which is available on PyPi.
import ukds
The filepath to the data table under study is specified here. This can be changed as needed.
fp_tab=r'C:\Users\cvskf\OneDrive - Loughborough University\_Data\United_Kingdom_Time_Use_Survey_2014-2015'+\
r'\UKDA-8128-tab\tab\uktus15_household.tab'
The filepath to the associated data dictionary is specified here. This can be changed as needed.
fp_dd=r'C:\Users\cvskf\OneDrive - Loughborough University\_Data\United_Kingdom_Time_Use_Survey_2014-2015' + \
r'\UKDA-8128-tab\mrdoc\allissue\uktus15_household_ukda_data_dictionary.rtf'
A DataTable object is created. The filepaths are supplied as arguments and the files are read into the DataTable object.
dt=ukds.DataTable(fp_tab,fp_dd)
print(dt.__doc__)
print(dt)
A class for reading a UK Data Service .tab data table file <ukds.data_table.DataTable object at 0x000001DBFB2F92B0>
The data table .tab file is stored in the tab
attribute as a pandas DataFrame:
dt.tab.head()
serial | strata | psu | HhOut | hh_wt | IMonth | IYear | DM014 | DM016 | DM510 | ... | Relate10_P1 | Relate10_P2 | Relate10_P3 | Relate10_P4 | Relate10_P5 | Relate10_P6 | Relate10_P7 | Relate10_P8 | Relate10_P9 | Relate10_P10 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 11010903 | -2 | -2 | 598 | NaN | 9 | 2014 | 0 | 0 | 0 | ... | -2 | -2.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 11010904 | -2 | -2 | 598 | NaN | 9 | 2014 | 0 | 0 | 0 | ... | -2 | -2.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 11010906 | -2 | -2 | 598 | NaN | 10 | 2014 | 0 | 0 | 0 | ... | -2 | -2.0 | -2.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 11010907 | -2 | -2 | 598 | NaN | 9 | 2014 | 1 | 1 | 0 | ... | -2 | -2.0 | -2.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | 11010908 | -2 | -2 | 598 | NaN | 9 | 2014 | 0 | 0 | 0 | ... | -2 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 335 columns
The data dictionary .rtf file is stored in the datadictionary
attribute as a ukds.DataDictionary object:
dt.datadictionary
<ukds.data_dictionary.DataDictionary at 0x1dbfb2f9358>
The information in the tab
and datadictionary
attributes can be combined by the get_dataframe
method.
This method returns a new pandas Dataframe in which:
df=dt.get_dataframe()
df.head()
variable | serial | strata | psu | HhOut | hh_wt | IMonth | IYear | DM014 | DM016 | DM510 | ... | Relate10_P1 | Relate10_P2 | Relate10_P3 | Relate10_P4 | Relate10_P5 | Relate10_P6 | Relate10_P7 | Relate10_P8 | Relate10_P9 | Relate10_P10 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
variable_label | Household number | Strata | Primary sampling unit | Final outcome - household | Household weight | Interview month | Interview Year | Number of children aged 0-14 | Number of children aged 0-16 | Number of children aged 5-10 | ... | Relate10_P1: How related to person 10 | Relate10_P2: How related to person 10 | Relate10_P3: How related to person 10 | Relate10_P4: How related to person 10 | Relate10_P5: How related to person 10 | Relate10_P6: How related to person 10 | Relate10_P7: How related to person 10 | Relate10_P8: How related to person 10 | Relate10_P9: How related to person 10 | Relate10_P10: How related to person 10 |
variable_type | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | ... | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric |
SPSS_measurement_level | SCALE | SCALE | SCALE | SCALE | SCALE | NOMINAL | NOMINAL | NOMINAL | NOMINAL | NOMINAL | ... | SCALE | SCALE | SCALE | SCALE | SCALE | SCALE | SCALE | SCALE | SCALE | NOMINAL |
SPSS_user_missing_values | ... | ||||||||||||||||||||
pos | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ... | 326 | 327 | 328 | 329 | 330 | 331 | 332 | 333 | 334 | 335 |
0 | 11010903 | Schedule not applicable | Schedule not applicable | Other reasons why unproductive | NaN | September | 2014 | 0 | 0 | 0 | ... | Schedule not applicable | Schedule not applicable | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 11010904 | Schedule not applicable | Schedule not applicable | Other reasons why unproductive | NaN | September | 2014 | 0 | 0 | 0 | ... | Schedule not applicable | Schedule not applicable | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 11010906 | Schedule not applicable | Schedule not applicable | Other reasons why unproductive | NaN | October | 2014 | 0 | 0 | 0 | ... | Schedule not applicable | Schedule not applicable | Schedule not applicable | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 11010907 | Schedule not applicable | Schedule not applicable | Other reasons why unproductive | NaN | September | 2014 | 1 | 1 | 0 | ... | Schedule not applicable | Schedule not applicable | Schedule not applicable | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | 11010908 | Schedule not applicable | Schedule not applicable | Other reasons why unproductive | NaN | September | 2014 | 0 | 0 | 0 | ... | Schedule not applicable | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 335 columns