An investigation of Social Class Inequalities in General Cognitive Ability in Two British Birth Cohorts

Roxanne Connelly ([email protected])

Vernon Gayle ([email protected])


Abstract

The ‘Flynn effect’ describes the substantial and long-standing increase in average cognitive ability test scores, which has been observed in numerous psychological studies. Flynn makes an appeal for researchers to move beyond psychology’s standard disciplinary boundaries and to consider sociological contexts, in order to develop a more comprehensive understanding of cognitive inequalities. In this article we respond to this appeal and investigate social class inequalities in general cognitive ability test scores over time. We analyse data from the National Child Development Study (1958) and the British Cohort Study (1970). These two British birth cohorts are suitable nationally representative large-scale data resources for studying inequalities in general cognitive ability.

We observe a large parental social class effect, net of parental education and gender in both cohorts. The overall finding is that large social class divisions in cognitive ability can be observed when children are still at primary school, and similar patterns are observed in each cohort. Notably, pupils with fathers at the lower end of the class structure are at a distinct disadvantage. This is a disturbing finding and it is especially important because cognitive ability is known to influence individuals later in the lifecourse.

Keywords

Social Class, Cognitive Ability, Longitudinal, Cohort Studies, Social Stratification, Inequality.

Acknowledgements

We are indebted to the National Child Development Study and 1970 British Cohort Study participants. We are grateful to The Centre for Longitudinal Studies, UCL Institute of Education for the use of these data and to the UK Data Archive and Economic and Social Data Service for making them available. These organizations bear no responsibility for the analysis or interpretation of these data.

Funding

This work was funded by the Economic and Social Research Council [Grant Number: ES/N011783/1].


Introduction to the Notebook

There is an increasing desire and requirement to make sociological research more transparent, and to actively render it reproducible.

Jupyter notebooks are increasingly used in high-profile big science applications (e.g. see here). Using Jupyter notebooks for large-scale social science data analysis in sociology is zygotic.

Publishing a Jupyter notebook allows third parties to fully reproduce the complete workflow for the article and to duplicate the empirical results. In addition to increasing transparency, this approach greatly extends the possibility for other researchers to build on the work, for example with alternative measures or additional data. This is an attractive feature and likely to make a major contribution to quantitative sociology.

This is a very early example of undertaking a complete analytical workflow within a Jupyter notebook.

As the practice of using Jupyter notebooks becomes more ubiquitos it is likely that there will be improvements to how the notebooks are used and best practices will become much more evident.

Rendering a complete workflow open and accessible is a new departure. Therefore we would ask that you consider the amount of extra work that has gone into rendering our workflow open and accessible and developing this notebook. As a safeguard against being overcritical, we also invite you to reflect on how much of your own work is transparent.

There are hundreds of analytical decisions that are made in the process of data enabling (e.g. which measure of social class to use, how to code education). When dealing with complex, messy real world data there is often no single 'correct' way to organise the data. Researchers are unable to describe these analytical decisions within the confines of a standard journal article. This is one of the reasons why improved transparency is required in quantitative sociology.

An overarching goal when producing this notebook was to ensure that a third party could follow the workflow. In developing an open and published workflow we have drawn upon ideas advanced in computer science especially the concept ‘literate computing’, which is the weaving of a narrative directly into live computation, interleaving text with code and results in order to construct a complete piece that achieves the goals of communicating results (see here).

Data analysis software can be operated in various ways. In some parts of this notebook we have deliberately chosen simpler forms of code rather than more complex programming in order to assist the reader. But as far as is practicable we have tried to annotate the work in a fashion that is least likely to obstruct the reader.

A further innovation within this work has been the adoption of ‘pair programming’ which is a technique from software development in which two programmers work together in the development of code. In addition we have also used ‘code peer review’ and each author has run the complete workflow independently using different computers and different software set-ups. This has enabled us to undertake an in-depth test of the reproducibility of the work. These practices are currently unknown in sociological research.

Please remember that this approach to transparent research is very exploratory.

Positive comments are always appreciated, but brickbats improve work.

Here is how to contact us:
Roxanne Connelly ([email protected])
Vernon Gayle ([email protected])


Using Stata

The Jupyter notebook is an open-source web application that allows researchers to create and share documents that contain live code, equations, visualizations and explanatory text.

An introduction to using Jupyter notebooks in social science research is available here.

Jupyter is 'language agnostic' and at the current time over forty languages are supported including those popular in data science such as Python, Stata, R and Julia.

In this notebook we use Stata. Stata is a proprietary software and researchers MUST have access to Stata in order to undertake data analyses within the Jupyter notebook.

There are currently two approaches to undertaking analyses using Stata within a Jupyter notebook.

1. The Stata Kernel

The first approach is using a Stata kernel. The Stata kernel can be downloaded and installed from this github repository.

This kernel currently only works in Windows.

You need a recent version of Stata, and if you have not already used Stata automation, register its type library by following these instructions.

Once your have registered Stata you can install the kernel.

At the command prompt you need to type:

pip install git+https://github.com/jrfiedler/stata-kernel
python -m stata_kernel.install

Now when you open a new Jupyter notebook you should be able to switch to the Stata kernel from the kernel menu option at the top of the notebook.

2. Using Stata via Magic Cells

The second approach is using a Stata via magic cells. This facility can be downloaded and installed from this github repository.

At the command prompt you need to type:

pip install ipystata

In a code cell before using Stata you must type:

import ipystata

and then run the cell.

Each cell will now be a Stata code cell as long as you start your syntax with:

%%stata

For example to get a summary of the variables in Stata the cell should include the following code:

%%stata
summarize

further information on using Stata via magic is available here.



Introduction

The ‘Flynn effect’ describes the substantial and long-standing increase in average cognitive test scores, which has been observed in numerous psychological studies 1 (Flynn, 2012). Flynn makes an appeal for researchers to move beyond psychology’s standard disciplinary boundaries and to consider sociological contexts, in order to develop a more comprehensive understanding of the influence of the social on cognitive inequalities. In this article we investigate social class inequalities in general cognitive ability through the examination of data from two British birth cohort studies.

The focus of this article is general cognitive ability in childhood, which is understood to be socially stratified from a very young age (Feinstein, 2003; Sullivan et al., 2013; Cunha and Heckman, 2009; Duncan et al., 1998; Gottfried et al., 2003). Childhood general cognitive ability is important because it is associated with later educational attainment, occupational attainment, and health and wellbeing across the lifecourse (Deary et al., 2007; Nettle, 2003; Vanhanen, 2011). Understanding social class inequalities in childhood cognitive test scores can therefore contribute to the wider sociological understanding of the reproduction of social inequalities.

Background

Neisser et al. (1995: 77) describes cognitive ability as the ‘ability to understand complex ideas, to adapt effectively to the environment, to learn from experience, to engage in various forms of reasoning, to overcome obstacles by taking thought.’ Cognitive ability tests are well validated measures of individual differences of cognitive capability (Deary et al., 2007; Sternberg et al., 2001). The association between parental social class and children’s cognitive test performance has been consistently documented, and a wealth of empirical evidence demonstrates that children from more advantaged families generally have better cognitive test scores (McCulloch and Joshi, 2001; Feinstein, 2003; Goodman and Gregg, 2010; Blanden et al., 2007; Schoon et al., 2011; Schoon et al., 2010; Dickerson and Popli, 2016; Sullivan et al., 2013). Shenkin et al. (2001) describe social class inequalities in the cognitive ability test performance of 11 year olds born in 1921. Lawlor et al. (2005) found that father’s social class was an important predictor of cognitive ability test scores at ages 7, 9 and 11 for a cohort of children born between 1950 and 1956. Feinstein (2003) demonstrated socio-economic inequalities in cognitive skills at as young as 22 months for a cohort of children born in 1970. Similar inequalities were also found at ages 42 months, and at 5 and 10 years (Feinstein, 2003). Using data from the UK Millennium Cohort Study (MCS) a series of more recent investigations have shown that children from less advantaged social backgrounds perform worse on cognitive ability tests than their more advantaged peers throughout childhood (see Blanden and Machin, 2010; Blanden et al., 2007; Schoon et al., 2011; Schoon et al., 2010; Dickerson and Popli, 2012; Sullivan et al., 2013).

The overall motivation for this article is to directly respond to Flynn’s appeal for researchers to move beyond psychology’s standard disciplinary boundaries, and to consider sociological contexts with the aim of developing a more comprehensive understanding of cognitive inequalities. There has been a dearth of research investigating the extent to which social class inequalities in childhood cognitive test scores have changed between birth cohorts. This stands in stark contrast to the vast quantity of research that has investigated trends in educational test scores, and the formal educational outcomes of children and young people (see for example Bradbury et al., 2015; Blanden and Gregg, 2004; Erikson et al., 2005).

The analyses within this article use data from two long running British birth cohort studies, the National Child Development Study (NCDS) and the 1970 British Cohort Study (BCS). These large-scale longitudinal surveys are ongoing and follow infants born in 1958 and 1970 respectively (Power and Elliott, 2006; Elliott and Shepherd, 2006). These two studies have proven to be invaluable sociological data resources. A sizable cannon of research regarding social mobility trends in the UK is based on comparisons between these two birth cohorts (e.g. Blanden and Machin, 2004; Blanden et al., 2005; Blanden et al., 2004; Machin and Vignoles, 2004; Goldthorpe and Jackson, 2007; Tampubolon and Savage, 2012; Blanden et al., 2013). A key concern in these projects is measuring changes between birth cohorts. For example studies have investigated changes in educational inequalities (Breen et al., 2010; Shavit and Blossfeld, 1991; Shavit et al., 2007), and changes in inequalities in access to advantaged occupational positions (Erikson and Goldthorpe, 1992; Breen, 2004). Building on the tradition of cross-cohort comparisons, this work compares social class inequalities in childhood cognitive ability test scores in these two cohorts.


Data

The UK data portfolio is well endowed with large-scale nationally representative birth cohort datasets. The National Child Development Study (NCDS) follows the lives of babies born in England, Scotland and Wales from the 3rd to the 9th of March 1958 (see Power and Elliott, 2006). The British Cohort Study (BCS) follows babies born in England, Scotland and Wales from the 5th to the 11th of April 1970 (see Elliott and Shepherd, 2006) 2. Childhood data were collected at birth, age 7 and age 11 in the NCDS (SN5565, University of London, 2014), and at birth, age 5 and age 10 in the BCS (SN2666, SN2699, SN3723, University of London, 2013; University of London, 2016a; University of London, 2016b).

The UK also has a more recent nationally representative birth cohort, the Millennium Cohort Study (MCS) (see Connelly and Platt, 2014). The overall design, the selection strategy, and the content of the MCS differs substantially from the previous British birth cohorts. The MCS 5th sweep (age 11) only contains one subtest of the British Ability Scales, the Verbal Similarities Test. This single test would not be sufficient to compute an overall general ability test score that is suitably comparable with the tests included in the NCDS and BCS. The MCS 5th sweep (age 11) also contains two cognitive tests drawn from the Cambridge Neuropsychological Test Automated Battery, however these tests are very different in nature to the tests completed in the NCDS and BCS (see Atkinson, 2015).

Goisis et al. (2017) undertook a comparative analysis of the effects of low birth weight in the NCDS, BCS and MCS. They operationalised a measure by using only the verbal test scores within the NCDS and the BCS, and then compared them with the single Verbal Similarities Test in the MCS. We do not adopt this strategy because psychometricians have warned against the use of isolated subtests for the measurement of general cognitive ability (McDermott et al., 1990). Ensuring the comparability of cognitive tests is challenging, especially when studying test scores over time (see Must et al., 2009). Flynn (2012) highlights that performances on different cognitive ability subtests have improved at different rates. In particular, the similarities subtest has shown some of the largest increases. Therefore, the use of the similarities subtest from the MCS cohort in isolation is likely to result in misleading comparisons.

The NCDS and BCS data were downloaded from the UK Data Archive. The dates and times of download are provided below. New versions of the data are uploaded periodically. If you are using a different version of the data, it is possible that slight variations in the results will occur. To identify changes between data versions you can consult the documentation provided with the datasets.

National Child Development Study (NCDS)
British Cohort Study (BCS)
Parental Occupational Information

The detailed parental occupational information is provided in a joint file for the NCDS and BCS:


Preparation of Stata

In [8]:
global path1 "F:\Data\RAWDATA"
global path2 "F:\Data\MYDATA\WORK"
global path3 "F:\Data\MYDATA\TEMP"
global path4 "F:\Data\MYDATA\FINAL"

clear

*return to jupyter
. global path1 "F:\Data\RAWDATA"

. global path2 "F:\Data\MYDATA\WORK"

. global path3 "F:\Data\MYDATA\TEMP"

. global path4 "F:\Data\MYDATA\FINAL"

. 
. clear

. 
. *return to jupyter

You will need the following user written comments installed in Stata:

  • fitstat
  • estpost
  • mibeta

You can check if these are already installed on your machine using the 'which' command (i.e. which fitstat).

In [9]:
which fitstat
which estout
which mibeta

*return to jupyter
. which fitstat
c:\ado\plus\f\fitstat.ado
*! version 1.6.4 2/22/01 add warning messages

. which estout
c:\ado\plus\e\estout.ado
*! version 3.17  02jun2014  Ben Jann

. which mibeta
c:\ado\plus\m\mibeta.ado
*! version 1.0.2  19jun2014

. 
. *return to jupyter

If these programs are not installed you will have to install them using 'ssc install' (e.g. ssc install fitstat). For more details on how to install programs from ssc see here.

We extend our thanks for these programs to their authors:

Long, J. S., & Freese, J. (2001). FITSTAT: Stata module to compute fit statistics for single equation regression models. Statistical Software Components.

Jann, B. (2017). ESTOUT: Stata module to make regression tables. Statistical Software Components.

mibeta - Yulia Marchenko, StataCorp.


Preparation of NCDS Datasets

Open raw NCDS data file. This file contains information from the first four sweeps (age 0, age 7, age 11 and age 16) of the NCDS.

In [10]:
use $path1\ARCHIVE\NCDS\S1-3\ncds0123.dta, clear
count

*return to jupyter
. use $path1\ARCHIVE\NCDS\S1-3\ncds0123.dta, clear

. count
  18,558

. 
. *return to jupyter

Identify the missing values.

In [11]:
quietly mvdecode _all, mv(-9=. \-8=. \-2=. \-1=. \-7=. \-3=.)

*return to jupyter
. quietly mvdecode _all, mv(-9=. \-8=. \-2=. \-1=. \-7=. \-3=.)

. 
. *return to jupyter

Cohort member's gender

Gender is derived from variable n622.

This variable comes from the age 0 (birth) survey (question 53). This question asks: Sex of infant - Male/Female. Variable n622 also appears in other sweeps of the survey so it is possible that this is variable includes information collected in multiple surveys.

This variable is coded (1) Male (2) Female. We recode the variable into a 1/0 dummy variable for male.

In [12]:
numlabel n622, add
tab n622, mi
codebook n622
capture drop ncds_male
    gen ncds_male = .
    replace ncds_male = 1 if (n622==1)
    replace ncds_male = 0 if (n622==2)
    label variable ncds_male "NCDS Cohort member Male"
    label define yesno 1 "Yes" 0 "No", replace
    label values ncds_male yesno
    tab ncds_male, mi

tab n622 ncds_male

*return to jupyter
. numlabel n622, add

. tab n622, mi

  0-3D Sex of |
        child |      Freq.     Percent        Cum.
--------------+-----------------------------------
      1. Male |      9,595       51.70       51.70
    2. Female |      8,959       48.28       99.98
            . |          4        0.02      100.00
--------------+-----------------------------------
        Total |     18,558      100.00

. codebook n622

--------------------------------------------------------------------------------------------------------------------------------------------
n622                                                                                                                       0-3D Sex of child
--------------------------------------------------------------------------------------------------------------------------------------------

                  type:  numeric (byte)
                 label:  n622

                 range:  [1,2]                        units:  1
         unique values:  2                        missing .:  4/18,558

            tabulation:  Freq.   Numeric  Label
                         9,595         1  1. Male
                         8,959         2  2. Female
                             4         .  

. capture drop ncds_male

.     gen ncds_male = .
(18,558 missing values generated)

.     replace ncds_male = 1 if (n622==1)
(9,595 real changes made)

.     replace ncds_male = 0 if (n622==2)
(8,959 real changes made)

.     label variable ncds_male "NCDS Cohort member Male"

.     label define yesno 1 "Yes" 0 "No", replace

.     label values ncds_male yesno

.     tab ncds_male, mi

NCDS Cohort |
member Male |      Freq.     Percent        Cum.
------------+-----------------------------------
         No |      8,959       48.28       48.28
        Yes |      9,595       51.70       99.98
          . |          4        0.02      100.00
------------+-----------------------------------
      Total |     18,558      100.00

. 
. tab n622 ncds_male

              |  NCDS Cohort member
  0-3D Sex of |         Male
        child |        No        Yes |     Total
--------------+----------------------+----------
      1. Male |         0      9,595 |     9,595 
    2. Female |     8,959          0 |     8,959 
--------------+----------------------+----------
        Total |     8,959      9,595 |    18,554 


. 
. *return to jupyter

Parents Education

Information on the parents educational qualifications was collected in the age 10 survey of the BCS, however this information is not available in the NCDS so we have chosen to use parents years of education in both cohorts to facilitate comparability.

We code parental education using the method described in:

Cheung, S.Y. & Egerton, M. (2007). Great Britain: Higher Education Expansion and Reform: Changing Educational Inequalities. Stratification in higher education: A comparative study, 195-219.

Page 206 to 207

'Information on parent's education in the NCDS was available only as the age at which the respondent's father and mother left full-time education. We have no informaiton on whether they left school with any qualifications. As most parents left school at the age of 14 to 15, we coded them as having completed intermediate secondary qualifications. Those who left school at age 13 or below would have only had school minimum, low-level school qualifications or basic vocational qualifications. However, the number in this category is very low and was therefore combined with those who left school at age 14 to 15. We took the highest of the parents' education and recoded it into four categories following the CASMIN schema'

  1. Left at age 15 or below
  2. Left at age 16 to 18
  3. Left at age 19 to 20
  4. Left at age 21 or above

For fathers we use variable n195 that comes from the age 7 survey.

The survey respondent is asked: Did the father stay at school after the minimum school leaving age? Yes/No

Then they are asked the follow up question: If yes, at what age did he finish full-time education? (n195 comes from this question).

In [13]:
*Father's Education

numlabel n195, add
tab n195, mi

capture drop ncds_paed_cat
    gen ncds_paed_cat = .
*Category 1 if they left education at age 15 or below
    replace ncds_paed_cat = 1 if (n195<=15)
*Category 2 if they left education from at age 16, 17 or 18
    replace ncds_paed_cat = 2 if ((n195>=16)&(n195<=18))
*Category 3 if they left education at age 19 or 20
    replace ncds_paed_cat = 3 if ((n195>=19)&(n195<=20))
*Category 4 if they left education at age 21+ (39 is highest year observed)
    replace ncds_paed_cat = 4 if ((n195>=21)&(n195<=39))
    tab ncds_paed_cat
    label define ed_cat 1 "Comp" 2 "Comp+1-3" 3 "Comp+4-5" 4 "Comp+6+"
    label values ncds_paed_cat ed_cat
    label variable ncds_paed_cat "NCDS Father's Education Categories"
    tab ncds_paed_cat
    tab n195 ncds_paed_cat 
    tab n195 ncds_paed_cat, mi
    
*return to jupyter
. *Father's Education

. 
. numlabel n195, add

. tab n195, mi

    1P Dads age |
      finishing |
 school if aftr |
            min |      Freq.     Percent        Cum.
----------------+-----------------------------------
              0 |     11,068       59.64       59.64
              2 |          2        0.01       59.65
              3 |          4        0.02       59.67
              6 |          3        0.02       59.69
             10 |          1        0.01       59.69
             11 |          3        0.02       59.71
             12 |          4        0.02       59.73
             13 |          2        0.01       59.74
             14 |         91        0.49       60.23
             15 |        380        2.05       62.28
             16 |      1,153        6.21       68.49
             17 |        641        3.45       71.95
             18 |        419        2.26       74.21
             19 |         83        0.45       74.65
             20 |         70        0.38       75.03
             21 |        102        0.55       75.58
             22 |         90        0.48       76.06
             23 |         75        0.40       76.47
             24 |         98        0.53       77.00
             25 |         58        0.31       77.31
             26 |         32        0.17       77.48
             27 |         16        0.09       77.57
             28 |         12        0.06       77.63
             29 |          7        0.04       77.67
             30 |          8        0.04       77.71
             31 |          2        0.01       77.72
             32 |          1        0.01       77.73
             33 |          2        0.01       77.74
             35 |          2        0.01       77.75
             39 |          1        0.01       77.76
              . |      4,128       22.24      100.00
----------------+-----------------------------------
          Total |     18,558      100.00

. 
. capture drop ncds_paed_cat

.     gen ncds_paed_cat = .
(18,558 missing values generated)

. *Category 1 if they left education at age 15 or below

.     replace ncds_paed_cat = 1 if (n195<=15)
(11,558 real changes made)

. *Category 2 if they left education from at age 16, 17 or 18

.     replace ncds_paed_cat = 2 if ((n195>=16)&(n195<=18))
(2,213 real changes made)

. *Category 3 if they left education at age 19 or 20

.     replace ncds_paed_cat = 3 if ((n195>=19)&(n195<=20))
(153 real changes made)

. *Category 4 if they left education at age 21+ (39 is highest year observed)

.     replace ncds_paed_cat = 4 if ((n195>=21)&(n195<=39))
(506 real changes made)

.     tab ncds_paed_cat

ncds_paed_c |
         at |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |     11,558       80.10       80.10
          2 |      2,213       15.34       95.43
          3 |        153        1.06       96.49
          4 |        506        3.51      100.00
------------+-----------------------------------
      Total |     14,430      100.00

.     label define ed_cat 1 "Comp" 2 "Comp+1-3" 3 "Comp+4-5" 4 "Comp+6+"

.     label values ncds_paed_cat ed_cat

.     label variable ncds_paed_cat "NCDS Father's Education Categories"

.     tab ncds_paed_cat

       NCDS |
   Father's |
  Education |
 Categories |      Freq.     Percent        Cum.
------------+-----------------------------------
       Comp |     11,558       80.10       80.10
   Comp+1-3 |      2,213       15.34       95.43
   Comp+4-5 |        153        1.06       96.49
    Comp+6+ |        506        3.51      100.00
------------+-----------------------------------
      Total |     14,430      100.00

.     tab n195 ncds_paed_cat 

    1P Dads age |
      finishing |
 school if aftr |     NCDS Father's Education Categories
            min |      Comp   Comp+1-3   Comp+4-5    Comp+6+ |     Total
----------------+--------------------------------------------+----------
              0 |    11,068          0          0          0 |    11,068 
              2 |         2          0          0          0 |         2 
              3 |         4          0          0          0 |         4 
              6 |         3          0          0          0 |         3 
             10 |         1          0          0          0 |         1 
             11 |         3          0          0          0 |         3 
             12 |         4          0          0          0 |         4 
             13 |         2          0          0          0 |         2 
             14 |        91          0          0          0 |        91 
             15 |       380          0          0          0 |       380 
             16 |         0      1,153          0          0 |     1,153 
             17 |         0        641          0          0 |       641 
             18 |         0        419          0          0 |       419 
             19 |         0          0         83          0 |        83 
             20 |         0          0         70          0 |        70 
             21 |         0          0          0        102 |       102 
             22 |         0          0          0         90 |        90 
             23 |         0          0          0         75 |        75 
             24 |         0          0          0         98 |        98 
             25 |         0          0          0         58 |        58 
             26 |         0          0          0         32 |        32 
             27 |         0          0          0         16 |        16 
             28 |         0          0          0         12 |        12 
             29 |         0          0          0          7 |         7 
             30 |         0          0          0          8 |         8 
             31 |         0          0          0          2 |         2 
             32 |         0          0          0          1 |         1 
             33 |         0          0          0          2 |         2 
             35 |         0          0          0          2 |         2 
             39 |         0          0          0          1 |         1 
----------------+--------------------------------------------+----------
          Total |    11,558      2,213        153        506 |    14,430 


.     tab n195 ncds_paed_cat, mi

    1P Dads age |
      finishing |
 school if aftr |           NCDS Father's Education Categories
            min |      Comp   Comp+1-3   Comp+4-5    Comp+6+          . |     Total
----------------+-------------------------------------------------------+----------
              0 |    11,068          0          0          0          0 |    11,068 
              2 |         2          0          0          0          0 |         2 
              3 |         4          0          0          0          0 |         4 
              6 |         3          0          0          0          0 |         3 
             10 |         1          0          0          0          0 |         1 
             11 |         3          0          0          0          0 |         3 
             12 |         4          0          0          0          0 |         4 
             13 |         2          0          0          0          0 |         2 
             14 |        91          0          0          0          0 |        91 
             15 |       380          0          0          0          0 |       380 
             16 |         0      1,153          0          0          0 |     1,153 
             17 |         0        641          0          0          0 |       641 
             18 |         0        419          0          0          0 |       419 
             19 |         0          0         83          0          0 |        83 
             20 |         0          0         70          0          0 |        70 
             21 |         0          0          0        102          0 |       102 
             22 |         0          0          0         90          0 |        90 
             23 |         0          0          0         75          0 |        75 
             24 |         0          0          0         98          0 |        98 
             25 |         0          0          0         58          0 |        58 
             26 |         0          0          0         32          0 |        32 
             27 |         0          0          0         16          0 |        16 
             28 |         0          0          0         12          0 |        12 
             29 |         0          0          0          7          0 |         7 
             30 |         0          0          0          8          0 |         8 
             31 |         0          0          0          2          0 |         2 
             32 |         0          0          0          1          0 |         1 
             33 |         0          0          0          2          0 |         2 
             35 |         0          0          0          2          0 |         2 
             39 |         0          0          0          1          0 |         1 
              . |         0          0          0          0      4,128 |     4,128 
----------------+-------------------------------------------------------+----------
          Total |    11,558      2,213        153        506      4,128 |    18,558 


.     

For mothers years of education we use variable n2397 that comes from the age 16 survey.

From the age 0 we have variable n537 available to us. This comes from the question: Did the patient stay at school after the minimum school-leaving age? Yes/No

In the survey this question is followed up with the question: At what age did she finish her full-time education? But the answer to this question does not appear to be deposited with the data, so we have to gather this information from a later sweep of the survey (the age 16 survey). This is suboptimal, but this information does not appear to be available from the earlier survey sweeps.

Variable n2397 comes from the question: At what age did mother or mother figure leave full-time education?

This is not a continuous variable but grouped into categories. We categorise the mothers years of education using the same method outlined above.

In [14]:
*Mother's Education

numlabel n2397, add
tab n2397, mi

capture drop ncds_moed_cat
    gen ncds_moed_cat = .
*Category 1 if they left education at age 15 or below
    replace ncds_moed_cat = 1 if (n2397<=4)
*Category 2 if they left education at age 16, 17 or 18
    replace ncds_moed_cat = 2 if ((n2397>=5)&(n2397<=7))
*Category 3 if they left education at age 19 or 20
    replace ncds_moed_cat = 3 if (n2397==8)
*Category 4 if they left education at age 21+ 
    replace ncds_moed_cat = 4 if ((n2397>=9)&(n2397<=10))
    tab ncds_moed_cat
    label values ncds_moed_cat ed_cat
    label variable ncds_moed_cat "NCDS Mother's Education Categories"
    tab ncds_moed_cat
    tab n2397 ncds_moed_cat 
    tab n2397 ncds_moed_cat, mi

*return to jupyter
. *Mother's Education

. 
. numlabel n2397, add

. tab n2397, mi

  3P Age mother figr |
 left full-time educ |      Freq.     Percent        Cum.
---------------------+-----------------------------------
     1. under 13 yrs |        157        0.85        0.85
   2. 13 to 14 years |        102        0.55        1.40
   3. 14 to 15 years |      5,258       28.33       29.73
   4. 15 to 16 years |      3,438       18.53       48.25
   5. 16 to 17 years |      1,307        7.04       55.30
   6. 17 to 18 years |        504        2.72       58.01
   7. 18 to 19 years |        278        1.50       59.51
   8. 19 to 21 years |        153        0.82       60.34
   9. 21 to 23 years |        187        1.01       61.34
10. 23 or more years |         47        0.25       61.60
       11. Not known |         43        0.23       61.83
                   . |      7,084       38.17      100.00
---------------------+-----------------------------------
               Total |     18,558      100.00

. 
. capture drop ncds_moed_cat

.     gen ncds_moed_cat = .
(18,558 missing values generated)

. *Category 1 if they left education at age 15 or below

.     replace ncds_moed_cat = 1 if (n2397<=4)
(8,955 real changes made)

. *Category 2 if they left education at age 16, 17 or 18

.     replace ncds_moed_cat = 2 if ((n2397>=5)&(n2397<=7))
(2,089 real changes made)

. *Category 3 if they left education at age 19 or 20

.     replace ncds_moed_cat = 3 if (n2397==8)
(153 real changes made)

. *Category 4 if they left education at age 21+ 

.     replace ncds_moed_cat = 4 if ((n2397>=9)&(n2397<=10))
(234 real changes made)

.     tab ncds_moed_cat

ncds_moed_c |
         at |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |      8,955       78.34       78.34
          2 |      2,089       18.27       96.61
          3 |        153        1.34       97.95
          4 |        234        2.05      100.00
------------+-----------------------------------
      Total |     11,431      100.00

.     label values ncds_moed_cat ed_cat

.     label variable ncds_moed_cat "NCDS Mother's Education Categories"

.     tab ncds_moed_cat

       NCDS |
   Mother's |
  Education |
 Categories |      Freq.     Percent        Cum.
------------+-----------------------------------
       Comp |      8,955       78.34       78.34
   Comp+1-3 |      2,089       18.27       96.61
   Comp+4-5 |        153        1.34       97.95
    Comp+6+ |        234        2.05      100.00
------------+-----------------------------------
      Total |     11,431      100.00

.     tab n2397 ncds_moed_cat 

  3P Age mother figr |     NCDS Mother's Education Categories
 left full-time educ |      Comp   Comp+1-3   Comp+4-5    Comp+6+ |     Total
---------------------+--------------------------------------------+----------
     1. under 13 yrs |       157          0          0          0 |       157 
   2. 13 to 14 years |       102          0          0          0 |       102 
   3. 14 to 15 years |     5,258          0          0          0 |     5,258 
   4. 15 to 16 years |     3,438          0          0          0 |     3,438 
   5. 16 to 17 years |         0      1,307          0          0 |     1,307 
   6. 17 to 18 years |         0        504          0          0 |       504 
   7. 18 to 19 years |         0        278          0          0 |       278 
   8. 19 to 21 years |         0          0        153          0 |       153 
   9. 21 to 23 years |         0          0          0        187 |       187 
10. 23 or more years |         0          0          0         47 |        47 
---------------------+--------------------------------------------+----------
               Total |     8,955      2,089        153        234 |    11,431 


.     tab n2397 ncds_moed_cat, mi

  3P Age mother figr |           NCDS Mother's Education Categories
 left full-time educ |      Comp   Comp+1-3   Comp+4-5    Comp+6+          . |     Total
---------------------+-------------------------------------------------------+----------
     1. under 13 yrs |       157          0          0          0          0 |       157 
   2. 13 to 14 years |       102          0          0          0          0 |       102 
   3. 14 to 15 years |     5,258          0          0          0          0 |     5,258 
   4. 15 to 16 years |     3,438          0          0          0          0 |     3,438 
   5. 16 to 17 years |         0      1,307          0          0          0 |     1,307 
   6. 17 to 18 years |         0        504          0          0          0 |       504 
   7. 18 to 19 years |         0        278          0          0          0 |       278 
   8. 19 to 21 years |         0          0        153          0          0 |       153 
   9. 21 to 23 years |         0          0          0        187          0 |       187 
10. 23 or more years |         0          0          0         47          0 |        47 
       11. Not known |         0          0          0          0         43 |        43 
                   . |         0          0          0          0      7,084 |     7,084 
---------------------+-------------------------------------------------------+----------
               Total |     8,955      2,089        153        234      7,127 |    18,558 


. 
. *return to jupyter

Again in line with Cheung and Egerton (2007, p206-207) we take the highest of the parent's education to create a parental educational level variable.

In [15]:
*Highest of the parent's education 

capture drop ncds_parented
*Highest of father's education and mother's education
egen ncds_parented = rmax(ncds_paed_cat ncds_moed_cat)
tab ncds_parented
label values ncds_parented ed_cat
label variable ncds_parented "NCDS Parent's Highest Education"
tab ncds_parented

tab ncds_parented ncds_paed_cat
tab ncds_parented ncds_moed_cat

* return to jupyter
. *Highest of the parent's education 

. 
. capture drop ncds_parented

. *Highest of father's education and mother's education

. egen ncds_parented = rmax(ncds_paed_cat ncds_moed_cat)
(2631 missing values generated)

. tab ncds_parented

ncds_parent |
         ed |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |     11,659       73.20       73.20
          2 |      3,384       21.25       94.45
          3 |        246        1.54       95.99
          4 |        638        4.01      100.00
------------+-----------------------------------
      Total |     15,927      100.00

. label values ncds_parented ed_cat

. label variable ncds_parented "NCDS Parent's Highest Education"

. tab ncds_parented

       NCDS |
   Parent's |
    Highest |
  Education |      Freq.     Percent        Cum.
------------+-----------------------------------
       Comp |     11,659       73.20       73.20
   Comp+1-3 |      3,384       21.25       94.45
   Comp+4-5 |        246        1.54       95.99
    Comp+6+ |        638        4.01      100.00
------------+-----------------------------------
      Total |     15,927      100.00

. 
. tab ncds_parented ncds_paed_cat

      NCDS |
  Parent's |
   Highest |     NCDS Father's Education Categories
 Education |      Comp   Comp+1-3   Comp+4-5    Comp+6+ |     Total
-----------+--------------------------------------------+----------
      Comp |    10,505          0          0          0 |    10,505 
  Comp+1-3 |       972      2,111          0          0 |     3,083 
  Comp+4-5 |        36         48        143          0 |       227 
   Comp+6+ |        45         54         10        506 |       615 
-----------+--------------------------------------------+----------
     Total |    11,558      2,213        153        506 |    14,430 


. tab ncds_parented ncds_moed_cat

      NCDS |
  Parent's |
   Highest |     NCDS Mother's Education Categories
 Education |      Comp   Comp+1-3   Comp+4-5    Comp+6+ |     Total
-----------+--------------------------------------------+----------
      Comp |     8,041          0          0          0 |     8,041 
  Comp+1-3 |       807      1,907          0          0 |     2,714 
  Comp+4-5 |        39         43        115          0 |       197 
   Comp+6+ |        68        139         38        234 |       479 
-----------+--------------------------------------------+----------
     Total |     8,955      2,089        153        234 |    11,431 


. 
. * return to jupyter

Here we identify the country at interview of the first three sweeps of the survey.

In [16]:
tab n0region
capture drop ncds0_country
    gen ncds0_country = .
    replace ncds0_country = 1 if (n0region>=1)&(n0region<=9)
    replace ncds0_country = 2 if (n0region==10)
    replace ncds0_country = 3 if (n0region==11)
    label define country 1 "England" 2 "Wales" 3 "Scotland"
    label values ncds0_country country
    label variable ncds0_country "NCDS Age 0 Country"
    tab ncds0_country

tab n1region
capture drop ncds5_country
    gen ncds5_country = .
    replace ncds5_country = 1 if (n1region>=1)&(n1region<=9)
    replace ncds5_country = 2 if (n1region==10)
    replace ncds5_country = 3 if (n1region==11)
    label values ncds5_country country
    label variable ncds5_country "NCDS Age 5 Country"
    tab ncds5_country

tab n2region
capture drop ncds11_country
    gen ncds11_country = .
    replace ncds11_country = 1 if (n2region>=1)&(n2region<=9)
    replace ncds11_country = 2 if (n2region==10)
    replace ncds11_country = 3 if (n2region==11)
    label values ncds11_country country
    label variable ncds11_country "NCDS Age 11 Country"
    tab ncds11_country
 
*return to jupyter
. tab n0region

 Region at PMS |
(1958) - Birth |      Freq.     Percent        Cum.
---------------+-----------------------------------
         North |      1,234        7.09        7.09
    North West |      2,295       13.18       20.26
  E & W.Riding |      1,433        8.23       28.49
North Midlands |      1,299        7.46       35.95
      Midlands |      1,648        9.46       45.41
          East |      1,242        7.13       52.54
    South East |      3,445       19.78       72.32
         South |        955        5.48       77.81
    South West |        966        5.55       83.35
         Wales |        914        5.25       88.60
      Scotland |      1,985       11.40      100.00
---------------+-----------------------------------
         Total |     17,416      100.00

. capture drop ncds0_country

.     gen ncds0_country = .
(18,558 missing values generated)

.     replace ncds0_country = 1 if (n0region>=1)&(n0region<=9)
(14,517 real changes made)

.     replace ncds0_country = 2 if (n0region==10)
(914 real changes made)

.     replace ncds0_country = 3 if (n0region==11)
(1,985 real changes made)

.     label define country 1 "England" 2 "Wales" 3 "Scotland"

.     label values ncds0_country country

.     label variable ncds0_country "NCDS Age 0 Country"

.     tab ncds0_country

 NCDS Age 0 |
    Country |      Freq.     Percent        Cum.
------------+-----------------------------------
    England |     14,517       83.35       83.35
      Wales |        914        5.25       88.60
   Scotland |      1,985       11.40      100.00
------------+-----------------------------------
      Total |     17,416      100.00

. 
. tab n1region

     Region at |
NCDS1 (1965) - |
       7 years |      Freq.     Percent        Cum.
---------------+-----------------------------------
         North |      1,126        7.31        7.31
    North West |      1,980       12.85       20.16
  E & W.Riding |      1,286        8.35       28.51
North Midlands |      1,180        7.66       36.17
      Midlands |      1,499        9.73       45.89
          East |      1,181        7.67       53.56
    South East |      2,815       18.27       71.83
         South |        948        6.15       77.98
    South West |        930        6.04       84.02
         Wales |        822        5.34       89.36
      Scotland |      1,640       10.64      100.00
---------------+-----------------------------------
         Total |     15,407      100.00

. capture drop ncds5_country

.     gen ncds5_country = .
(18,558 missing values generated)

.     replace ncds5_country = 1 if (n1region>=1)&(n1region<=9)
(12,945 real changes made)

.     replace ncds5_country = 2 if (n1region==10)
(822 real changes made)

.     replace ncds5_country = 3 if (n1region==11)
(1,640 real changes made)

.     label values ncds5_country country

.     label variable ncds5_country "NCDS Age 5 Country"

.     tab ncds5_country

 NCDS Age 5 |
    Country |      Freq.     Percent        Cum.
------------+-----------------------------------
    England |     12,945       84.02       84.02
      Wales |        822        5.34       89.36
   Scotland |      1,640       10.64      100.00
------------+-----------------------------------
      Total |     15,407      100.00

. 
. tab n2region

     Region at |
NCDS2 (1969) - |
      11 years |      Freq.     Percent        Cum.
---------------+-----------------------------------
         North |      1,063        6.92        6.92
    North West |      1,943       12.65       19.58
  E & W.Riding |      1,303        8.49       28.06
North Midlands |      1,181        7.69       35.75
      Midlands |      1,438        9.36       45.12
          East |      1,310        8.53       53.65
    South East |      2,793       18.19       71.84
         South |        962        6.26       78.10
    South West |        962        6.26       84.36
         Wales |        817        5.32       89.68
      Scotland |      1,584       10.32      100.00
---------------+-----------------------------------
         Total |     15,356      100.00

. capture drop ncds11_country

.     gen ncds11_country = .
(18,558 missing values generated)

.     replace ncds11_country = 1 if (n2region>=1)&(n2region<=9)
(12,955 real changes made)

.     replace ncds11_country = 2 if (n2region==10)
(817 real changes made)

.     replace ncds11_country = 3 if (n2region==11)
(1,584 real changes made)

.     label values ncds11_country country

.     label variable ncds11_country "NCDS Age 11 Country"

.     tab ncds11_country

NCDS Age 11 |
    Country |      Freq.     Percent        Cum.
------------+-----------------------------------
    England |     12,955       84.36       84.36
      Wales |        817        5.32       89.68
   Scotland |      1,584       10.32      100.00
------------+-----------------------------------
      Total |     15,356      100.00

.  
. *return to jupyter

Here we clean the general ability test variable. We use variable n920 taken from the third survey (age 11). This variable is described in the NCDS documentation here. The original test materials can be viewed on the NCDS webpage.

This is the ability test score variable that has been used in previous social stratification studies. For example see: Breen, R., & Yaish, M. (2006). Testing the Breen-Goldthorpe model of educational decision making. Mobility and inequality, 232-258.

In [17]:
tab n920
rename n920 ncds11_bastotalscore
summ ncds11_bastotalscore

*standardise to mean 0 sd 1
tab ncds11_bastotalscore
capture drop sncds11_bastotalscore
egen sncds11_bastotalscore = std(ncds11_bastotalscore)
tab sncds11_bastotalscore
summ sncds11_bastotalscore

*standardise to mean 100 sd 15
capture drop ncds11_stdbastotalscore
gen ncds11_stdbastotalscore = (sncds11_bastotalscore*15)+100
tab ncds11_stdbastotalscore
summ ncds11_stdbastotalscore

label variable ncds11_stdbastotalscore "NCDS Age 11 BAS Total Score std"

*return to jupyter
. tab n920

   2T Total |
   score on |
    general |
    ability |
       test |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |         66        0.47        0.47
          1 |          4        0.03        0.50
          2 |          7        0.05        0.54
          3 |          9        0.06        0.61
          4 |          7        0.05        0.66
          5 |          6        0.04        0.70
          6 |         16        0.11        0.81
          7 |         18        0.13        0.94
          8 |         23        0.16        1.10
          9 |         30        0.21        1.32
         10 |         50        0.35        1.67
         11 |         47        0.33        2.00
         12 |         56        0.40        2.40
         13 |         81        0.57        2.97
         14 |        107        0.76        3.73
         15 |        113        0.80        4.53
         16 |        124        0.88        5.41
         17 |        149        1.05        6.46
         18 |        131        0.93        7.39
         19 |        154        1.09        8.48
         20 |        182        1.29        9.77
         21 |        174        1.23       11.00
         22 |        181        1.28       12.28
         23 |        194        1.37       13.65
         24 |        196        1.39       15.04
         25 |        203        1.44       16.47
         26 |        228        1.61       18.09
         27 |        220        1.56       19.64
         28 |        214        1.51       21.16
         29 |        223        1.58       22.74
         30 |        255        1.80       24.54
         31 |        263        1.86       26.40
         32 |        250        1.77       28.17
         33 |        250        1.77       29.94
         34 |        218        1.54       31.48
         35 |        290        2.05       33.54
         36 |        284        2.01       35.55
         37 |        273        1.93       37.48
         38 |        287        2.03       39.51
         39 |        279        1.97       41.48
         40 |        272        1.92       43.41
         41 |        283        2.00       45.41
         42 |        290        2.05       47.46
         43 |        246        1.74       49.20
         44 |        299        2.12       51.32
         45 |        328        2.32       53.64
         46 |        280        1.98       55.62
         47 |        301        2.13       57.75
         48 |        332        2.35       60.10
         49 |        324        2.29       62.39
         50 |        295        2.09       64.48
         51 |        306        2.17       66.65
         52 |        282        2.00       68.64
         53 |        271        1.92       70.56
         54 |        286        2.02       72.59
         55 |        289        2.05       74.63
         56 |        302        2.14       76.77
         57 |        306        2.17       78.93
         58 |        276        1.95       80.89
         59 |        278        1.97       82.85
         60 |        247        1.75       84.60
         61 |        233        1.65       86.25
         62 |        220        1.56       87.81
         63 |        203        1.44       89.24
         64 |        216        1.53       90.77
         65 |        200        1.42       92.19
         66 |        175        1.24       93.43
         67 |        176        1.25       94.67
         68 |        143        1.01       95.68
         69 |        115        0.81       96.50
         70 |        110        0.78       97.28
         71 |         81        0.57       97.85
         72 |         89        0.63       98.48
         73 |         57        0.40       98.88
         74 |         60        0.42       99.31
         75 |         34        0.24       99.55
         76 |         27        0.19       99.74
         77 |         20        0.14       99.88
         78 |         10        0.07       99.95
         79 |          6        0.04       99.99
         80 |          1        0.01      100.00
------------+-----------------------------------
      Total |     14,131      100.00

. rename n920 ncds11_bastotalscore

. summ ncds11_bastotalscore

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
ncds11_bas~e |     14,131    42.94041    16.14388          0         80

. 
. *standardise to mean 0 sd 1

. tab ncds11_bastotalscore

   2T Total |
   score on |
    general |
    ability |
       test |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |         66        0.47        0.47
          1 |          4        0.03        0.50
          2 |          7        0.05        0.54
          3 |          9        0.06        0.61
          4 |          7        0.05        0.66
          5 |          6        0.04        0.70
          6 |         16        0.11        0.81
          7 |         18        0.13        0.94
          8 |         23        0.16        1.10
          9 |         30        0.21        1.32
         10 |         50        0.35        1.67
         11 |         47        0.33        2.00
         12 |         56        0.40        2.40
         13 |         81        0.57        2.97
         14 |        107        0.76        3.73
         15 |        113        0.80        4.53
         16 |        124        0.88        5.41
         17 |        149        1.05        6.46
         18 |        131        0.93        7.39
         19 |        154        1.09        8.48
         20 |        182        1.29        9.77
         21 |        174        1.23       11.00
         22 |        181        1.28       12.28
         23 |        194        1.37       13.65
         24 |        196        1.39       15.04
         25 |        203        1.44       16.47
         26 |        228        1.61       18.09
         27 |        220        1.56       19.64
         28 |        214        1.51       21.16
         29 |        223        1.58       22.74
         30 |        255        1.80       24.54
         31 |        263        1.86       26.40
         32 |        250        1.77       28.17
         33 |        250        1.77       29.94
         34 |        218        1.54       31.48
         35 |        290        2.05       33.54
         36 |        284        2.01       35.55
         37 |        273        1.93       37.48
         38 |        287        2.03       39.51
         39 |        279        1.97       41.48
         40 |        272        1.92       43.41
         41 |        283        2.00       45.41
         42 |        290        2.05       47.46
         43 |        246        1.74       49.20
         44 |        299        2.12       51.32
         45 |        328        2.32       53.64
         46 |        280        1.98       55.62
         47 |        301        2.13       57.75
         48 |        332        2.35       60.10
         49 |        324        2.29       62.39
         50 |        295        2.09       64.48
         51 |        306        2.17       66.65
         52 |        282        2.00       68.64
         53 |        271        1.92       70.56
         54 |        286        2.02       72.59
         55 |        289        2.05       74.63
         56 |        302        2.14       76.77
         57 |        306        2.17       78.93
         58 |        276        1.95       80.89
         59 |        278        1.97       82.85
         60 |        247        1.75       84.60
         61 |        233        1.65       86.25
         62 |        220        1.56       87.81
         63 |        203        1.44       89.24
         64 |        216        1.53       90.77
         65 |        200        1.42       92.19
         66 |        175        1.24       93.43
         67 |        176        1.25       94.67
         68 |        143        1.01       95.68
         69 |        115        0.81       96.50
         70 |        110        0.78       97.28
         71 |         81        0.57       97.85
         72 |         89        0.63       98.48
         73 |         57        0.40       98.88
         74 |         60        0.42       99.31
         75 |         34        0.24       99.55
         76 |         27        0.19       99.74
         77 |         20        0.14       99.88
         78 |         10        0.07       99.95
         79 |          6        0.04       99.99
         80 |          1        0.01      100.00
------------+-----------------------------------
      Total |     14,131      100.00

. capture drop sncds11_bastotalscore

. egen sncds11_bastotalscore = std(ncds11_bastotalscore)
(4427 missing values generated)

. tab sncds11_bastotalscore

Standardize |
d values of |
(ncds11_bas |
totalscore) |      Freq.     Percent        Cum.
------------+-----------------------------------
  -2.659858 |         66        0.47        0.47
  -2.597915 |          4        0.03        0.50
  -2.535972 |          7        0.05        0.54
  -2.474029 |          9        0.06        0.61
  -2.412086 |          7        0.05        0.66
  -2.350143 |          6        0.04        0.70
    -2.2882 |         16        0.11        0.81
  -2.226257 |         18        0.13        0.94
  -2.164314 |         23        0.16        1.10
  -2.102371 |         30        0.21        1.32
  -2.040428 |         50        0.35        1.67
  -1.978485 |         47        0.33        2.00
  -1.916542 |         56        0.40        2.40
  -1.854599 |         81        0.57        2.97
  -1.792656 |        107        0.76        3.73
  -1.730713 |        113        0.80        4.53
   -1.66877 |        124        0.88        5.41
  -1.606827 |        149        1.05        6.46
  -1.544884 |        131        0.93        7.39
  -1.482941 |        154        1.09        8.48
  -1.420998 |        182        1.29        9.77
  -1.359055 |        174        1.23       11.00
  -1.297112 |        181        1.28       12.28
  -1.235169 |        194        1.37       13.65
  -1.173226 |        196        1.39       15.04
  -1.111283 |        203        1.44       16.47
   -1.04934 |        228        1.61       18.09
   -.987397 |        220        1.56       19.64
   -.925454 |        214        1.51       21.16
   -.863511 |        223        1.58       22.74
   -.801568 |        255        1.80       24.54
   -.739625 |        263        1.86       26.40
   -.677682 |        250        1.77       28.17
   -.615739 |        250        1.77       29.94
   -.553796 |        218        1.54       31.48
   -.491853 |        290        2.05       33.54
    -.42991 |        284        2.01       35.55
   -.367967 |        273        1.93       37.48
   -.306024 |        287        2.03       39.51
  -.2440811 |        279        1.97       41.48
  -.1821381 |        272        1.92       43.41
  -.1201951 |        283        2.00       45.41
  -.0582521 |        290        2.05       47.46
   .0036909 |        246        1.74       49.20
   .0656339 |        299        2.12       51.32
   .1275769 |        328        2.32       53.64
   .1895199 |        280        1.98       55.62
   .2514628 |        301        2.13       57.75
   .3134058 |        332        2.35       60.10
   .3753488 |        324        2.29       62.39
   .4372918 |        295        2.09       64.48
   .4992348 |        306        2.17       66.65
   .5611778 |        282        2.00       68.64
   .6231208 |        271        1.92       70.56
   .6850638 |        286        2.02       72.59
   .7470068 |        289        2.05       74.63
   .8089498 |        302        2.14       76.77
   .8708928 |        306        2.17       78.93
   .9328358 |        276        1.95       80.89
   .9947788 |        278        1.97       82.85
   1.056722 |        247        1.75       84.60
   1.118665 |        233        1.65       86.25
   1.180608 |        220        1.56       87.81
   1.242551 |        203        1.44       89.24
   1.304494 |        216        1.53       90.77
   1.366437 |        200        1.42       92.19
    1.42838 |        175        1.24       93.43
   1.490323 |        176        1.25       94.67
   1.552266 |        143        1.01       95.68
   1.614209 |        115        0.81       96.50
   1.676152 |        110        0.78       97.28
   1.738095 |         81        0.57       97.85
   1.800038 |         89        0.63       98.48
   1.861981 |         57        0.40       98.88
   1.923924 |         60        0.42       99.31
   1.985867 |         34        0.24       99.55
    2.04781 |         27        0.19       99.74
   2.109753 |         20        0.14       99.88
   2.171695 |         10        0.07       99.95
   2.233639 |          6        0.04       99.99
   2.295582 |          1        0.01      100.00
------------+-----------------------------------
      Total |     14,131      100.00

. summ sncds11_bastotalscore

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
sncds11_ba~e |     14,131    4.08e-09           1  -2.659858   2.295582

. 
. *standardise to mean 100 sd 15

. capture drop ncds11_stdbastotalscore

. gen ncds11_stdbastotalscore = (sncds11_bastotalscore*15)+100
(4,427 missing values generated)

. tab ncds11_stdbastotalscore

ncds11_stdb |
astotalscor |
          e |      Freq.     Percent        Cum.
------------+-----------------------------------
   60.10213 |         66        0.47        0.47
   61.03128 |          4        0.03        0.50
   61.96043 |          7        0.05        0.54
   62.88957 |          9        0.06        0.61
   63.81871 |          7        0.05        0.66
   64.74786 |          6        0.04        0.70
     65.677 |         16        0.11        0.81
   66.60615 |         18        0.13        0.94
   67.53529 |         23        0.16        1.10
   68.46444 |         30        0.21        1.32
   69.39359 |         50        0.35        1.67
   70.32273 |         47        0.33        2.00
   71.25187 |         56        0.40        2.40
   72.18102 |         81        0.57        2.97
   73.11016 |        107        0.76        3.73
   74.03931 |        113        0.80        4.53
   74.96845 |        124        0.88        5.41
    75.8976 |        149        1.05        6.46
   76.82674 |        131        0.93        7.39
   77.75589 |        154        1.09        8.48
   78.68504 |        182        1.29        9.77
   79.61417 |        174        1.23       11.00
   80.54332 |        181        1.28       12.28
   81.47247 |        194        1.37       13.65
   82.40161 |        196        1.39       15.04
   83.33076 |        203        1.44       16.47
    84.2599 |        228        1.61       18.09
   85.18905 |        220        1.56       19.64
   86.11819 |        214        1.51       21.16
   87.04733 |        223        1.58       22.74
   87.97648 |        255        1.80       24.54
   88.90562 |        263        1.86       26.40
   89.83477 |        250        1.77       28.17
   90.76392 |        250        1.77       29.94
   91.69306 |        218        1.54       31.48
   92.62221 |        290        2.05       33.54
   93.55135 |        284        2.01       35.55
   94.48049 |        273        1.93       37.48
   95.40964 |        287        2.03       39.51
   96.33878 |        279        1.97       41.48
   97.26793 |        272        1.92       43.41
   98.19707 |        283        2.00       45.41
   99.12622 |        290        2.05       47.46
   100.0554 |        246        1.74       49.20
   100.9845 |        299        2.12       51.32
   101.9137 |        328        2.32       53.64
   102.8428 |        280        1.98       55.62
   103.7719 |        301        2.13       57.75
   104.7011 |        332        2.35       60.10
   105.6302 |        324        2.29       62.39
   106.5594 |        295        2.09       64.48
   107.4885 |        306        2.17       66.65
   108.4177 |        282        2.00       68.64
   109.3468 |        271        1.92       70.56
    110.276 |        286        2.02       72.59
   111.2051 |        289        2.05       74.63
   112.1342 |        302        2.14       76.77
   113.0634 |        306        2.17       78.93
   113.9925 |        276        1.95       80.89
   114.9217 |        278        1.97       82.85
   115.8508 |        247        1.75       84.60
     116.78 |        233        1.65       86.25
   117.7091 |        220        1.56       87.81
   118.6383 |        203        1.44       89.24
   119.5674 |        216        1.53       90.77
   120.4966 |        200        1.42       92.19
   121.4257 |        175        1.24       93.43
   122.3548 |        176        1.25       94.67
    123.284 |        143        1.01       95.68
   124.2131 |        115        0.81       96.50
   125.1423 |        110        0.78       97.28
   126.0714 |         81        0.57       97.85
   127.0006 |         89        0.63       98.48
   127.9297 |         57        0.40       98.88
   128.8589 |         60        0.42       99.31
    129.788 |         34        0.24       99.55
   130.7171 |         27        0.19       99.74
   131.6463 |         20        0.14       99.88
   132.5754 |         10        0.07       99.95
   133.5046 |          6        0.04       99.99
   134.4337 |          1        0.01      100.00
------------+-----------------------------------
      Total |     14,131      100.00

. summ ncds11_stdbastotalscore

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
ncds11_std~e |     14,131         100          15   60.10213   134.4337

. 
. label variable ncds11_stdbastotalscore "NCDS Age 11 BAS Total Score std"

. 

We could also produce an overall cognitive ability test scores using principal components analysis (see for example Schoon 2010). Here we compute a general ability test scores using the method described in Schoon (2010).

In [18]:
* We use the two cognitive ability sub-test which make up the general abilty 
* test described above.

tab n914, mi 
tab n917, mi



* We only want to include those cohort members who completed both tests.
* We create a variable which indicates how many of the sub-tests
* we have information on.

capture drop rmiss
egen rmiss = rmiss(n914 n917)
tab rmiss

* We examine the correlation between these two test scores:

pwcorr n914 n917 if (rmiss==0), sig

* Principal components analysis of the tests that make up the 
* general ability test

pca n914 n917 if (rmiss==0)

* Only the first component has an eigenvalue greater than 1.

screeplot

* The screeplot leads to the same conclusion

* Here we predict the score for each individual on the first principal
* component. This score is obtained by applying the elements of the 
* corresponding eigenvector to the standardised values of the original
* observations for an individual.

predict ncds11_pc1 if (rmiss==0), score
label variable ncds11_pc1 "NCDS Age 11 PCA Score"

summ ncds11_pc1

* We standardise this variable:

capture drop ncds11_stdpc1
egen ncds11_stdpc1 = std(ncds11_pc1)
summ ncds11_stdpc1
label variable ncds11_stdpc1 "NCDS Age 11 standardised PCA Score"

*return to jupyter
. *return to jupyter

. * We use the two cognitive ability sub-test which make up the general abilty 

. * test described above.

. 
. tab n914, mi 

  2T Verbal |
   score on |
    general |
    ability |
       test |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |         97        0.52        0.52
          1 |         17        0.09        0.61
          2 |         38        0.20        0.82
          3 |         57        0.31        1.13
          4 |         87        0.47        1.59
          5 |        132        0.71        2.31
          6 |        207        1.12        3.42
          7 |        279        1.50        4.93
          8 |        327        1.76        6.69
          9 |        338        1.82        8.51
         10 |        409        2.20       10.71
         11 |        412        2.22       12.93
         12 |        372        2.00       14.94
         13 |        361        1.95       16.88
         14 |        380        2.05       18.93
         15 |        400        2.16       21.09
         16 |        412        2.22       23.31
         17 |        402        2.17       25.47
         18 |        401        2.16       27.63
         19 |        456        2.46       30.09
         20 |        428        2.31       32.40
         21 |        436        2.35       34.75
         22 |        502        2.71       37.45
         23 |        515        2.78       40.23
         24 |        497        2.68       42.90
         25 |        520        2.80       45.71
         26 |        513        2.76       48.47
         27 |        470        2.53       51.00
         28 |        516        2.78       53.78
         29 |        485        2.61       56.40
         30 |        480        2.59       58.98
         31 |        514        2.77       61.75
         32 |        485        2.61       64.37
         33 |        486        2.62       66.98
         34 |        440        2.37       69.36
         35 |        336        1.81       71.17
         36 |        300        1.62       72.78
         37 |        255        1.37       74.16
         38 |        183        0.99       75.14
         39 |        131        0.71       75.85
         40 |         55        0.30       76.15
          . |      4,427       23.85      100.00
------------+-----------------------------------
      Total |     18,558      100.00

. tab n917, mi

     2T Non |
     verbal |
   score on |
gen ability |
       test |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |         76        0.41        0.41
          1 |         10        0.05        0.46
          2 |         31        0.17        0.63
          3 |         42        0.23        0.86
          4 |         60        0.32        1.18
          5 |         94        0.51        1.69
          6 |        139        0.75        2.44
          7 |        177        0.95        3.39
          8 |        216        1.16        4.55
          9 |        242        1.30        5.86
         10 |        298        1.61        7.46
         11 |        363        1.96        9.42
         12 |        371        2.00       11.42
         13 |        429        2.31       13.73
         14 |        456        2.46       16.19
         15 |        485        2.61       18.80
         16 |        575        3.10       21.90
         17 |        577        3.11       25.01
         18 |        593        3.20       28.20
         19 |        644        3.47       31.67
         20 |        671        3.62       35.29
         21 |        711        3.83       39.12
         22 |        706        3.80       42.92
         23 |        698        3.76       46.69
         24 |        691        3.72       50.41
         25 |        626        3.37       53.78
         26 |        641        3.45       57.24
         27 |        571        3.08       60.31
         28 |        529        2.85       63.16
         29 |        480        2.59       65.75
         30 |        431        2.32       68.07
         31 |        375        2.02       70.09
         32 |        332        1.79       71.88
         33 |        245        1.32       73.20
         34 |        178        0.96       74.16
         35 |        143        0.77       74.93
         36 |        110        0.59       75.53
         37 |         57        0.31       75.83
         38 |         33        0.18       76.01
         39 |         22        0.12       76.13
         40 |          3        0.02       76.15
          . |      4,427       23.85      100.00
------------+-----------------------------------
      Total |     18,558      100.00

. 
. 
. 
. * We only want to include those cohort members who completed both tests.

. * We create a variable which indicates how many of the sub-tests

. * we have information on.

. 
. capture drop rmiss

. egen rmiss = rmiss(n914 n917)

. tab rmiss

      rmiss |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |     14,131       76.15       76.15
          2 |      4,427       23.85      100.00
------------+-----------------------------------
      Total |     18,558      100.00

. 
. * We examine the correlation between these two test scores:

. 
. pwcorr n914 n917 if (rmiss==0), sig

             |     n914     n917
-------------+------------------
        n914 |   1.0000 
             |
             |
        n917 |   0.8074   1.0000 
             |   0.0000
             |

. 
. * Principal components analysis of the tests that make up the 

. * general ability test

. 
. pca n914 n917 if (rmiss==0)

Principal components/correlation                 Number of obs    =     14,131
                                                 Number of comp.  =          2
                                                 Trace            =          2
    Rotation: (unrotated = principal)            Rho              =     1.0000

    --------------------------------------------------------------------------
       Component |   Eigenvalue   Difference         Proportion   Cumulative
    -------------+------------------------------------------------------------
           Comp1 |      1.80738      1.61476             0.9037       0.9037
           Comp2 |      .192618            .             0.0963       1.0000
    --------------------------------------------------------------------------

Principal components (eigenvectors) 

    ------------------------------------------------
        Variable |    Comp1     Comp2 | Unexplained 
    -------------+--------------------+-------------
            n914 |   0.7071    0.7071 |           0 
            n917 |   0.7071   -0.7071 |           0 
    ------------------------------------------------

. 
. * Only the first component has an eigenvalue greater than 1.

. 
. screeplot

. 
. * The screeplot leads to the same conclusion

. 
. * Here we predict the score for each individual on the first principal

. * component. This score is obtained by applying the elements of the 

. * corresponding eigenvector to the standardised values of the original

. * observations for an individual.

. 
. predict ncds11_pc1 if (rmiss==0), score
(1 components skipped)

Scoring coefficients 
    sum of squares(column-loading) = 1

    ----------------------------------
        Variable |    Comp1     Comp2 
    -------------+--------------------
            n914 |   0.7071           
            n917 |   0.7071   -0.7071 
    ----------------------------------

. label variable ncds11_pc1 "NCDS Age 11 PCA Score"

. 
. summ ncds11_pc1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
  ncds11_pc1 |     14,131    2.37e-09    1.344389  -3.606081   3.131255

. 
. * We standardise this variable:

. 
. capture drop ncds11_stdpc1

. egen ncds11_stdpc1 = std(ncds11_pc1)
(4427 missing values generated)

. summ ncds11_stdpc1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
ncds11_std~1 |     14,131   -1.45e-10           1   -2.68232   2.329129

. label variable ncds11_stdpc1 "NCDS Age 11 standardised PCA Score"

. 
. *return to jupyter

In [19]:
keep ncdsid ncds_male ncds_paed ncds_paed_cat ncds_moed ncds_parented ncds_moed_cat ncds0_country ncds5_country ncds11_country ncds11_bastotalscore ncds11_stdbastotalscore ncds11_stdpc1

sort ncdsid

save $path3\temp1.dta, replace

*return to jupyter
. keep ncdsid ncds_male ncds_paed ncds_paed_cat ncds_moed ncds_parented ncds_moed_cat ncds0_country ncds5_country ncds11_country ncds11_bast
> otalscore ncds11_stdbastotalscore ncds11_stdpc1

. 
. sort ncdsid

. 
. save $path3\temp1.dta, replace
(note: file F:\Data\MYDATA\TEMP\temp1.dta not found)
file F:\Data\MYDATA\TEMP\temp1.dta saved

. 
. *return to jupyter

We intend to use father's NS-SEC based on the new occupational coding for our parental social class measure. However, here we prepare some of the older parental occupation-based social class measures which are available in the deposited datasets. These were used in intial sensitivity analyses and were considered when preparing the inverse probability weights.

In [20]:
use $path1\ARCHIVE\NCDS\S1-3\ncds0123.dta, clear

quietly mvdecode _all, mv(-9=. \-8=. \-2=. \-1=. \-7=. \-3=.)

*return to jupyter
. use $path1\ARCHIVE\NCDS\S1-3\ncds0123.dta, clear

. 
. quietly mvdecode _all, mv(-9=. \-8=. \-2=. \-1=. \-7=. \-3=.)

. 
. *return to jupyter

In [21]:
*Social Class of mother's husband - Age 0 (birth)
numlabel n236 n492, add

tab n236
tab n492

tab n236 n492
*The difference bewteen n236 and n492, seems to be that:
*n236 includes the manual non-manual distinction in class III
*but n492 does not.

*Create father's RGSC at birth survey using variable n236
capture drop ncds0_olddadrgsc
gen ncds0_olddadrgsc = n236
recode ncds0_olddadrgsc (1=.)
replace ncds0_olddadrgsc = (ncds0_olddadrgsc-1)
label variable ncds0_olddadrgsc "NCDS Birth Dad RGSC Old Coding"
label define rgsc 1 "I" 2 "II" 3 "III NM" 4 "III M" 5 "IV" 6 "V"
label values ncds0_olddadrgsc rgsc

tab ncds0_olddadrgsc
tab ncds0_olddadrgsc n236, mi


*Social class of father or male head - Age 7

numlabel n190, add
tab n190

capture drop ncds7_olddadrgsc
gen ncds7_olddadrgsc = n190
recode ncds7_olddadrgsc (1=.) (7=6) (8=7) 
replace ncds7_olddadrgsc = (ncds7_olddadrgsc-1)
label variable ncds7_olddadrgsc "NCDS Age 7 Dad RGSC Old Coding"
label values ncds7_olddadrgsc rgsc

tab ncds7_olddadrgsc n190, mi
tab ncds7_olddadrgsc, mi

*Social class of father or male head - Age 11

tab n1171

tab n1687

tab n1687 n1171

numlabel n1687, add

capture drop ncds11_olddadrgsc
gen ncds11_olddadrgsc = n1687
recode ncds11_olddadrgsc (7=.)
label variable ncds11_olddadrgsc "NCDS Age 11 Dad RGSC Old Coding"
label values ncds11_olddadrgsc rgsc

tab ncds11_olddadrgsc n1687, mi
tab ncds11_olddadrgsc, mi

*Age 16 Social class of father or male head

tab n2384

numlabel n2384, add

capture drop ncds16_olddadrgsc
gen ncds16_olddadrgsc = n2384
recode ncds16_olddadrgsc (8=.) 
recode ncds16_olddadrgsc (6=5) 
recode ncds16_olddadrgsc (7=6) 
label variable ncds16_olddadrgsc "NCDS Age 16 Dad RGSC Old Coding"
label values ncds16_olddadrgsc rgsc

tab ncds16_olddadrgsc n2384, mi
tab ncds16_olddadrgsc, mi

*return to jupyter
. *Social Class of mother's husband - Age 0 (birth)

. numlabel n236 n492, add

. 
. tab n236

0P Social class of |
  mother's husband |
        (GRO 1951) |      Freq.     Percent        Cum.
-------------------+-----------------------------------
1. Unemployed,sick |          5        0.03        0.03
              2. I |        746        4.53        4.56
             3. II |      2,133       12.96       17.52
 4. III non-manual |      1,592        9.67       27.19
     5. III manual |      8,376       50.88       78.07
             6. IV |      1,995       12.12       90.18
              7. V |      1,616        9.82      100.00
-------------------+-----------------------------------
             Total |     16,463      100.00

. tab n492

      0 Social class |
    mother's husband |
          (GRO 1951) |      Freq.     Percent        Cum.
---------------------+-----------------------------------
  1. Unemployed,sick |          5        0.03        0.03
                2. I |        746        4.38        4.41
               3. II |      2,133       12.53       16.94
              4. III |      9,981       58.62       75.56
               5. IV |      1,995       11.72       87.28
                6. V |      1,616        9.49       96.77
         9. Students |         35        0.21       96.98
    10. Dead or away |          3        0.02       96.99
         11. Retired |          2        0.01       97.00
12. Single,no husbnd |        510        3.00      100.00
---------------------+-----------------------------------
               Total |     17,026      100.00

. 
. tab n236 n492

0P Social class of |
  mother's husband |            0 Social class mother's husband (GRO 1951)
        (GRO 1951) | 1. Unempl       2. I      3. II     4. III      5. IV       6. V |     Total
-------------------+------------------------------------------------------------------+----------
1. Unemployed,sick |         5          0          0          0          0          0 |         5 
              2. I |         0        746          0          0          0          0 |       746 
             3. II |         0          0      2,133          0          0          0 |     2,133 
 4. III non-manual |         0          0          0      1,592          0          0 |     1,592 
     5. III manual |         0          0          0      8,375          0          0 |     8,375 
             6. IV |         0          0          0          0      1,995          0 |     1,995 
              7. V |         0          0          0          0          0      1,616 |     1,616 
-------------------+------------------------------------------------------------------+----------
             Total |         5        746      2,133      9,967      1,995      1,616 |    16,462 


. *The difference bewteen n236 and n492, seems to be that:

. *n236 includes the manual non-manual distinction in class III

. *but n492 does not.

. 
. *Create father's RGSC at birth survey using variable n236

. capture drop ncds0_olddadrgsc

. gen ncds0_olddadrgsc = n236
(2,095 missing values generated)

. recode ncds0_olddadrgsc (1=.)
(ncds0_olddadrgsc: 5 changes made)

. replace ncds0_olddadrgsc = (ncds0_olddadrgsc-1)
(16,458 real changes made)

. label variable ncds0_olddadrgsc "NCDS Birth Dad RGSC Old Coding"

. label define rgsc 1 "I" 2 "II" 3 "III NM" 4 "III M" 5 "IV" 6 "V"

. label values ncds0_olddadrgsc rgsc

. 
. tab ncds0_olddadrgsc

 NCDS Birth |
   Dad RGSC |
 Old Coding |      Freq.     Percent        Cum.
------------+-----------------------------------
          I |        746        4.53        4.53
         II |      2,133       12.96       17.49
     III NM |      1,592        9.67       27.17
      III M |      8,376       50.89       78.06
         IV |      1,995       12.12       90.18
          V |      1,616        9.82      100.00
------------+-----------------------------------
      Total |     16,458      100.00

. tab ncds0_olddadrgsc n236, mi

NCDS Birth |
  Dad RGSC |                     0P Social class of mother's husband (GRO 1951)
Old Coding | 1. Unempl       2. I      3. II  4. III no  5. III ma      6. IV       7. V          . |     Total
-----------+----------------------------------------------------------------------------------------+----------
         I |         0        746          0          0          0          0          0          0 |       746 
        II |         0          0      2,133          0          0          0          0          0 |     2,133 
    III NM |         0          0          0      1,592          0          0          0          0 |     1,592 
     III M |         0          0          0          0      8,376          0          0          0 |     8,376 
        IV |         0          0          0          0          0      1,995          0          0 |     1,995 
         V |         0          0          0          0          0          0      1,616          0 |     1,616 
         . |         5          0          0          0          0          0          0      2,095 |     2,100 
-----------+----------------------------------------------------------------------------------------+----------
     Total |         5        746      2,133      1,592      8,376      1,995      1,616      2,095 |    18,558 


. 
. 
. *Social class of father or male head - Age 7

. 
. numlabel n190, add

. tab n190

  1P Social class |
   of father,male |
  head (GRO 1960) |      Freq.     Percent        Cum.
------------------+-----------------------------------
  1. No male head |        421        2.90        2.90
             2. I |        750        5.16        8.06
            3. II |      2,079       14.30       22.36
4. III non-manual |      1,408        9.69       32.05
    5. III manual |      6,416       44.14       76.19
 6. IV non-manual |        258        1.78       77.96
     7. IV manual |      2,272       15.63       93.59
             8. V |        931        6.41      100.00
------------------+-----------------------------------
            Total |     14,535      100.00

. 
. capture drop ncds7_olddadrgsc

. gen ncds7_olddadrgsc = n190
(4,023 missing values generated)

. recode ncds7_olddadrgsc (1=.) (7=6) (8=7) 
(ncds7_olddadrgsc: 3624 changes made)

. replace ncds7_olddadrgsc = (ncds7_olddadrgsc-1)
(14,114 real changes made)

. label variable ncds7_olddadrgsc "NCDS Age 7 Dad RGSC Old Coding"

. label values ncds7_olddadrgsc rgsc

. 
. tab ncds7_olddadrgsc n190, mi

NCDS Age 7 |
  Dad RGSC |                           1P Social class of father,male head (GRO 1960)
Old Coding | 1. No mal       2. I      3. II  4. III no  5. III ma  6. IV non  7. IV man       8. V          . |     Total
-----------+---------------------------------------------------------------------------------------------------+----------
         I |         0        750          0          0          0          0          0          0          0 |       750 
        II |         0          0      2,079          0          0          0          0          0          0 |     2,079 
    III NM |         0          0          0      1,408          0          0          0          0          0 |     1,408 
     III M |         0          0          0          0      6,416          0          0          0          0 |     6,416 
        IV |         0          0          0          0          0        258      2,272          0          0 |     2,530 
         V |         0          0          0          0          0          0          0        931          0 |       931 
         . |       421          0          0          0          0          0          0          0      4,023 |     4,444 
-----------+---------------------------------------------------------------------------------------------------+----------
     Total |       421        750      2,079      1,408      6,416        258      2,272        931      4,023 |    18,558 


. tab ncds7_olddadrgsc, mi

 NCDS Age 7 |
   Dad RGSC |
 Old Coding |      Freq.     Percent        Cum.
------------+-----------------------------------
          I |        750        4.04        4.04
         II |      2,079       11.20       15.24
     III NM |      1,408        7.59       22.83
      III M |      6,416       34.57       57.40
         IV |      2,530       13.63       71.04
          V |        931        5.02       76.05
          . |      4,444       23.95      100.00
------------+-----------------------------------
      Total |     18,558      100.00

. 
. *Social class of father or male head - Age 11

. 
. tab n1171

 2P Social Class |
    of father or |
  male head (GRO |
           1966) |      Freq.     Percent        Cum.
-----------------+-----------------------------------
  Social class I |        738        5.52        5.52
 Social class II |      2,432       18.19       23.71
 SC III non-man. |      1,245        9.31       33.02
   SC III manual |      5,721       42.78       75.80
SC IV non-manual |        285        2.13       77.93
    SC IV manual |      2,064       15.44       93.37
  Social class V |        827        6.18       99.55
  Unclassifiable |         60        0.45      100.00
-----------------+-----------------------------------
           Total |     13,372      100.00

. 
. tab n1687

    2PD Social |
      class of |
father or male |
     head (GRO |
         1966) |      Freq.     Percent        Cum.
---------------+-----------------------------------
             I |        726        5.34        5.34
            II |      2,363       17.39       22.73
III non manual |      1,202        8.84       31.57
    III manual |      5,564       40.94       72.52
            IV |      2,257       16.61       89.12
             V |        776        5.71       94.83
  No male head |        702        5.17      100.00
---------------+-----------------------------------
         Total |     13,590      100.00

. 
. tab n1687 n1171

    2PD Social |
      class of |
father or male |
     head (GRO |                    2P Social Class of father or male head (GRO 1966)
         1966) | Social cl  Social cl  SC III no  SC III ma  SC IV non  SC IV man  Social cl  Unclassif |     Total
---------------+----------------------------------------------------------------------------------------+----------
             I |       726          0          0          0          0          0          0          0 |       726 
            II |         0      2,363          0          0          0          0          0          0 |     2,363 
III non manual |         0          0      1,202          0          0          0          0          0 |     1,202 
    III manual |         0          0          0      5,564          0          0          0          0 |     5,564 
            IV |         0          0          0          0        266      1,991          0          0 |     2,257 
             V |         0          0          0          0          0          0        776          0 |       776 
  No male head |        12         69         43        157         19         73         51          9 |       433 
---------------+----------------------------------------------------------------------------------------+----------
         Total |       738      2,432      1,245      5,721        285      2,064        827          9 |    13,321 


. 
. numlabel n1687, add

. 
. capture drop ncds11_olddadrgsc

. gen ncds11_olddadrgsc = n1687
(4,968 missing values generated)

. recode ncds11_olddadrgsc (7=.)
(ncds11_olddadrgsc: 702 changes made)

. label variable ncds11_olddadrgsc "NCDS Age 11 Dad RGSC Old Coding"

. label values ncds11_olddadrgsc rgsc

. 
. tab ncds11_olddadrgsc n1687, mi

  NCDS Age |
    11 Dad |
  RGSC Old |                   2PD Social class of father or male head (GRO 1966)
    Coding |      1. I      2. II  3. III no  4. III ma      5. IV       6. V  7. No mal          . |     Total
-----------+----------------------------------------------------------------------------------------+----------
         I |       726          0          0          0          0          0          0          0 |       726 
        II |         0      2,363          0          0          0          0          0          0 |     2,363 
    III NM |         0          0      1,202          0          0          0          0          0 |     1,202 
     III M |         0          0          0      5,564          0          0          0          0 |     5,564 
        IV |         0          0          0          0      2,257          0          0          0 |     2,257 
         V |         0          0          0          0          0        776          0          0 |       776 
         . |         0          0          0          0          0          0        702      4,968 |     5,670 
-----------+----------------------------------------------------------------------------------------+----------
     Total |       726      2,363      1,202      5,564      2,257        776        702      4,968 |    18,558 


. tab ncds11_olddadrgsc, mi

NCDS Age 11 |
   Dad RGSC |
 Old Coding |      Freq.     Percent        Cum.
------------+-----------------------------------
          I |        726        3.91        3.91
         II |      2,363       12.73       16.65
     III NM |      1,202        6.48       23.12
      III M |      5,564       29.98       53.10
         IV |      2,257       12.16       65.27
          V |        776        4.18       69.45
          . |      5,670       30.55      100.00
------------+-----------------------------------
      Total |     18,558      100.00

. 
. *Age 16 Social class of father or male head

. 
. tab n2384

     3P Social |
         class |
   father,male |
     head (GRO |
         1970) |      Freq.     Percent        Cum.
---------------+-----------------------------------
             I |        569        5.36        5.36
            II |      2,100       19.77       25.13
III non-manual |      1,004        9.45       34.58
    III manual |      4,661       43.88       78.47
 IV non-manual |        155        1.46       79.93
     IV manual |      1,403       13.21       93.14
             V |        608        5.72       98.86
       Unclear |        121        1.14      100.00
---------------+-----------------------------------
         Total |     10,621      100.00

. 
. numlabel n2384, add

. 
. capture drop ncds16_olddadrgsc

. gen ncds16_olddadrgsc = n2384
(7,937 missing values generated)

. recode ncds16_olddadrgsc (8=.) 
(ncds16_olddadrgsc: 121 changes made)

. recode ncds16_olddadrgsc (6=5) 
(ncds16_olddadrgsc: 1403 changes made)

. recode ncds16_olddadrgsc (7=6) 
(ncds16_olddadrgsc: 608 changes made)

. label variable ncds16_olddadrgsc "NCDS Age 16 Dad RGSC Old Coding"

. label values ncds16_olddadrgsc rgsc

. 
. tab ncds16_olddadrgsc n2384, mi

  NCDS Age |
    16 Dad |
  RGSC Old |                            3P Social class father,male head (GRO 1970)
    Coding |      1. I      2. II  3. III no  4. III ma  5. IV non  6. IV man       7. V  8. Unclea          . |     Total
-----------+---------------------------------------------------------------------------------------------------+----------
         I |       569          0          0          0          0          0          0          0          0 |       569 
        II |         0      2,100          0          0          0          0          0          0          0 |     2,100 
    III NM |         0          0      1,004          0          0          0          0          0          0 |     1,004 
     III M |         0          0          0      4,661          0          0          0          0          0 |     4,661 
        IV |         0          0          0          0        155      1,403          0          0          0 |     1,558 
         V |         0          0          0          0          0          0        608          0          0 |       608 
         . |         0          0          0          0          0          0          0        121      7,937 |     8,058 
-----------+---------------------------------------------------------------------------------------------------+----------
     Total |       569      2,100      1,004      4,661        155      1,403        608        121      7,937 |    18,558 


. tab ncds16_olddadrgsc, mi

NCDS Age 16 |
   Dad RGSC |
 Old Coding |      Freq.     Percent        Cum.
------------+-----------------------------------
          I |        569        3.07        3.07
         II |      2,100       11.32       14.38
     III NM |      1,004        5.41       19.79
      III M |      4,661       25.12       44.91
         IV |      1,558        8.40       53.30
          V |        608        3.28       56.58
          . |      8,058       43.42      100.00
------------+-----------------------------------
      Total |     18,558      100.00

. 
. *return to jupyter

The mother's occupational information is available in a much more limited format to that of fathers.

Elliott and Lawrence (2014) discuss the availability of mother's occupational information in the NCDS.

Elliott and Lawrence (2014) Refining childhood social class measures in the 1958 British Cohort Study. London: CLS.

Further discussion of our decision not to use mother's occupational information is provided in endnote 7.

In [22]:
*Mother's job when pregnant
tab n540, mi

*Mother's job (when starting this baby) age 0 (birth)
tab n539, mi

*Mother's occupation aged 11
tab n1225, mi


*Age 16 Social class of mother 
numlabel n2393, add
tab n2393, mi

capture drop ncds16_oldmumrgsc
gen ncds16_oldmumrgsc = n2393
recode ncds16_oldmumrgsc (8=.) 
recode ncds16_oldmumrgsc (6=5) 
recode ncds16_oldmumrgsc (7=6) 
label variable ncds16_oldmumrgsc "NCDS Age 16 Mum RGSC Old Coding"
label values ncds16_oldmumrgsc rgsc

tab ncds16_oldmumrgsc n2384, mi
tab ncds16_oldmumrgsc, mi

*return to jupyter
. *Mother's job when pregnant

. tab n540, mi

 0 Mums paid job |
during pregnancy |
      (GRO 1951) |      Freq.     Percent        Cum.
-----------------+-----------------------------------
        Teachers |        261        1.41        1.41
Nurses qualified |         92        0.50        1.90
 Bank clerks etc |        239        1.29        3.19
 Shopkeepers etc |         60        0.32        3.51
Others in SCI,II |        100        0.54        4.05
Nurses- not qual |        107        0.58        4.63
  Clerks,typists |      1,538        8.29       12.92
Shop asst,hairdr |        786        4.24       17.15
 Garment workers |        149        0.80       17.95
Textile wkr skld |        281        1.51       19.47
Personal service |        222        1.20       20.66
Others in SC III |        545        2.94       23.60
      Machinists |        286        1.54       25.14
Textile wkr SCIV |        102        0.55       25.69
   Personal-SCIV |        374        2.02       27.71
 Others in SC IV |        982        5.29       33.00
Textile-labourer |        349        1.88       34.88
   Personal-SC V |        118        0.64       35.52
 No job dur preg |     10,621       57.23       92.75
               . |      1,346        7.25      100.00
-----------------+-----------------------------------
           Total |     18,558      100.00

. 
. *Mother's job (when starting this baby) age 0 (birth)

. tab n539, mi

 0 Mums paid job |
   when starting |
  this baby (GRO |
           1951) |      Freq.     Percent        Cum.
-----------------+-----------------------------------
        Teachers |        269        1.45        1.45
Nurses qualified |         92        0.50        1.95
 Bank clerks etc |        246        1.33        3.27
 Shopkeepers etc |         60        0.32        3.59
Others in SCI,II |        101        0.54        4.14
Nurses- not qual |        109        0.59        4.73
  Clerks,typists |      1,559        8.40       13.13
Shop asst,hairdr |        799        4.31       17.43
 Garment workers |        152        0.82       18.25
Textile wkr skld |        281        1.51       19.77
Personal service |        224        1.21       20.97
Others in SC III |        553        2.98       23.95
      Machinists |        287        1.55       25.50
Textile wkr SCIV |        104        0.56       26.06
   Personal-SCIV |        379        2.04       28.10
 Others in SC IV |        988        5.32       33.42
Textile-labourer |        356        1.92       35.34
   Personal-SC V |        122        0.66       36.00
               . |     11,877       64.00      100.00
-----------------+-----------------------------------
           Total |     18,558      100.00

. 
. *Mother's occupation aged 11

. tab n1225, mi

    2P Mothers's |
most recent work |
    and SEG (GRO |
           1966) |      Freq.     Percent        Cum.
-----------------+-----------------------------------
      Prof,manag |        268        1.44        1.44
Intermed non-man |        798        4.30        5.74
 Typist,clerical |      1,308        7.05       12.79
  Shop assistant |        839        4.52       17.31
Telephonists etc |        183        0.99       18.30
Personal service |      1,784        9.61       27.91
Forewomen,manual |        124        0.67       28.58
  Manual workers |      2,839       15.30       43.88
     Own account |         70        0.38       44.26
    Farm workers |        148        0.80       45.05
 Inadequate info |         49        0.26       45.32
               . |     10,148       54.68      100.00
-----------------+-----------------------------------
           Total |     18,558      100.00

. 
. 
. *Age 16 Social class of mother 

. numlabel n2393, add

. tab n2393, mi

  3P Mother-s social |
 class,if works (GRO |
               1970) |      Freq.     Percent        Cum.
---------------------+-----------------------------------
                1. I |         37        0.20        0.20
               2. II |      1,196        6.44        6.64
      3. III non-man |      2,320       12.50       19.15
       4. III manual |        545        2.94       22.08
       5. IV non-man |      1,344        7.24       29.32
        6. IV manual |      1,192        6.42       35.75
                7. V |        771        4.15       39.90
     8. Unclassified |        113        0.61       40.51
                   . |     11,040       59.49      100.00
---------------------+-----------------------------------
               Total |     18,558      100.00

. 
. capture drop ncds16_oldmumrgsc

. gen ncds16_oldmumrgsc = n2393
(11,040 missing values generated)

. recode ncds16_oldmumrgsc (8=.) 
(ncds16_oldmumrgsc: 113 changes made)

. recode ncds16_oldmumrgsc (6=5) 
(ncds16_oldmumrgsc: 1192 changes made)

. recode ncds16_oldmumrgsc (7=6) 
(ncds16_oldmumrgsc: 771 changes made)

. label variable ncds16_oldmumrgsc "NCDS Age 16 Mum RGSC Old Coding"

. label values ncds16_oldmumrgsc rgsc

. 
. tab ncds16_oldmumrgsc n2384, mi

  NCDS Age |
    16 Mum |
  RGSC Old |                            3P Social class father,male head (GRO 1970)
    Coding |      1. I      2. II  3. III no  4. III ma  5. IV non  6. IV man       7. V  8. Unclea          . |     Total
-----------+---------------------------------------------------------------------------------------------------+----------
         I |        20         11          0          5          0          0          0          0          1 |        37 
        II |       101        436        120        321         18         69         22          8        101 |     1,196 
    III NM |       132        523        346        867         20        179         56         21        176 |     2,320 
     III M |         5         42         36        312          4         71         26          4         45 |       545 
        IV |        30        213        180      1,236         38        462        133         15        229 |     2,536 
         V |         3         19         41        408         15        134         81          3         67 |       771 
         . |       278        856        281      1,512         60        488        290         70      7,318 |    11,153 
-----------+---------------------------------------------------------------------------------------------------+----------
     Total |       569      2,100      1,004      4,661        155      1,403        608        121      7,937 |    18,558 


. tab ncds16_oldmumrgsc, mi

NCDS Age 16 |
   Mum RGSC |
 Old Coding |      Freq.     Percent        Cum.
------------+-----------------------------------
          I |         37        0.20        0.20
         II |      1,196        6.44        6.64
     III NM |      2,320       12.50       19.15
      III M |        545        2.94       22.08
         IV |      2,536       13.67       35.75
          V |        771        4.15       39.90
          . |     11,153       60.10      100.00
------------+-----------------------------------
      Total |     18,558      100.00

. 
. *return to jupyter

We convert the available Socio-Economic Group information to an approximation of the Goldthorpe Schema using the method outlined in Goldthorpe and Jackson (2007).

Goldthorpe, J. H., & Jackson, M. (2007). Intergenerational class mobility in contemporary Britain: political concerns and empirical findings. The British journal of sociology, 58(4), 525-546. Chicago.

This method builds on an approximation developed by Health and McDonald (1987).

Heath, A., & McDonald, S. K. (1987). Social change and the future of the left. The Political Quarterly, 58(4), 364-377.

In [23]:
*Father's SEG
numlabel n2385 n1175, add

*Father's SEG measured age 16
tab n2385
capture drop ncds16_dadseg2egp
gen ncds16_dadseg2egp = .
replace ncds16_dadseg2egp = 1 if (n2385==1)|(n2385==3)|(n2385==4)
replace ncds16_dadseg2egp = 2 if (n2385==2)|(n2385==5)
replace ncds16_dadseg2egp = 3 if (n2385==6)|(n2385==7)
replace ncds16_dadseg2egp = 4 if (n2385==12)|(n2385==13)|(n2385==14)
replace ncds16_dadseg2egp = 5 if (n2385==8)
replace ncds16_dadseg2egp = 6 if (n2385==9)
replace ncds16_dadseg2egp = 7 if (n2385==10)|(n2385==11)|(n2385==15)
replace ncds16_dadseg2egp = . if (n2385==17)|(n2385==16)
label define egp 1 "I" 2 "II+IVa" 3 "III" 4 "IVb+c" 5 "V" 6 "VI" 7 "VII"
label values ncds16_dadseg2egp egp
label variable ncds16_dadseg2egp "NCDS Age 16 Dad's EGP from SEG"

*armed forces are coded as missing
tab ncds16_dadseg2egp n2385, mi

*Father's SEG measured age 11
tab n1175
capture drop ncds11_dadseg2egp
gen ncds11_dadseg2egp = .
replace ncds11_dadseg2egp = 1 if (n1175==1)|(n1175==3)|(n1175==4)
replace ncds11_dadseg2egp = 2 if (n1175==2)|(n1175==5)
replace ncds11_dadseg2egp = 3 if (n1175==6)|(n1175==7)
replace ncds11_dadseg2egp = 4 if (n1175==12)|(n1175==13)|(n1175==14)
replace ncds11_dadseg2egp = 5 if (n1175==8)
replace ncds11_dadseg2egp = 6 if (n1175==9)
replace ncds11_dadseg2egp = 7 if (n1175==10)|(n1175==11)|(n1175==15)
replace ncds11_dadseg2egp = . if (n1175==16)
label values ncds11_dadseg2egp egp
label variable ncds11_dadseg2egp "NCDS Age 11 Dad's EGP from SEG"
tab ncds11_dadseg2egp n1175, mi

*Mother's SEG (Age 16)
tab n2394
capture drop ncds16_mumseg2egp
gen ncds16_mumseg2egp = .
replace ncds16_mumseg2egp = 1 if (n2394==1)|(n2394==3)|(n2394==4)
replace ncds16_mumseg2egp = 2 if (n2394==2)|(n2394==5)
replace ncds16_mumseg2egp = 3 if (n2394==6)|(n2394==7)
replace ncds16_mumseg2egp = 4 if (n2394==12)|(n2394==13)|(n2394==14)
replace ncds16_mumseg2egp = 5 if (n2394==8)
replace ncds16_mumseg2egp = 6 if (n2394==9)
replace ncds16_mumseg2egp = 7 if (n2394==10)|(n2394==11)|(n2394==15)
replace ncds16_mumseg2egp = . if (n2394==16)
label values ncds16_mumseg2egp egp
label variable ncds16_mumseg2egp "NCDS Age 16 Mum's EGP from SEG"
tab ncds16_mumseg2egp n2394, mi

*return to jupyter
. *Father's SEG

. numlabel n2385 n1175, add

. 
. *Father's SEG measured age 16

. tab n2385

3P Father,male heads |
  socio-economic grp |
          (GRO 1970) |      Freq.     Percent        Cum.
---------------------+-----------------------------------
   1. Emp,mana,large |        584        5.45        5.45
  2. Emp,manag,small |      1,155       10.78       16.24
    3. Prof-self-emp |         89        0.83       17.07
  4. Prof. employees |        483        4.51       21.58
 5. Intermed non-man |        698        6.52       28.10
   6. Junior non-man |        723        6.75       34.85
 7. Personal service |         62        0.58       35.42
   8. Foremen-manual |      1,045        9.76       45.18
   9. Skilled manual |      2,996       27.97       73.16
10. Semi skld manual |      1,268       11.84       85.00
11. Unskilled manual |        586        5.47       90.47
12. Work own account |        491        4.58       95.05
  13. Farm emp,manag |        105        0.98       96.03
14. Farm-own account |        100        0.93       96.97
    15. Agric worker |        114        1.06       98.03
    16. Armed forces |         89        0.83       98.86
 17. Inadequate info |        122        1.14      100.00
---------------------+-----------------------------------
               Total |     10,710      100.00

. capture drop ncds16_dadseg2egp

. gen ncds16_dadseg2egp = .
(18,558 missing values generated)

. replace ncds16_dadseg2egp = 1 if (n2385==1)|(n2385==3)|(n2385==4)
(1,156 real changes made)

. replace ncds16_dadseg2egp = 2 if (n2385==2)|(n2385==5)
(1,853 real changes made)

. replace ncds16_dadseg2egp = 3 if (n2385==6)|(n2385==7)
(785 real changes made)

. replace ncds16_dadseg2egp = 4 if (n2385==12)|(n2385==13)|(n2385==14)
(696 real changes made)

. replace ncds16_dadseg2egp = 5 if (n2385==8)
(1,045 real changes made)

. replace ncds16_dadseg2egp = 6 if (n2385==9)
(2,996 real changes made)

. replace ncds16_dadseg2egp = 7 if (n2385==10)|(n2385==11)|(n2385==15)
(1,968 real changes made)

. replace ncds16_dadseg2egp = . if (n2385==17)|(n2385==16)
(0 real changes made)

. label define egp 1 "I" 2 "II+IVa" 3 "III" 4 "IVb+c" 5 "V" 6 "VI" 7 "VII"

. label values ncds16_dadseg2egp egp

. label variable ncds16_dadseg2egp "NCDS Age 16 Dad's EGP from SEG"

. 
. *armed forces are coded as missing

. tab ncds16_dadseg2egp n2385, mi

  NCDS Age |
  16 Dad's |
  EGP from |                              3P Father,male heads socio-economic grp (GRO 1970)
       SEG | 1. Emp,ma  2. Emp,ma  3. Prof-s  4. Prof.   5. Interm  6. Junior  7. Person  8. Foreme  9. Skille  10. Semi  |     Total
-----------+--------------------------------------------------------------------------------------------------------------+----------
         I |       584          0         89        483          0          0          0          0          0          0 |     1,156 
    II+IVa |         0      1,155          0          0        698          0          0          0          0          0 |     1,853 
       III |         0          0          0          0          0        723         62          0          0          0 |       785 
     IVb+c |         0          0          0          0          0          0          0          0          0          0 |       696 
         V |         0          0          0          0          0          0          0      1,045          0          0 |     1,045 
        VI |         0          0          0          0          0          0          0          0      2,996          0 |     2,996 
       VII |         0          0          0          0          0          0          0          0          0      1,268 |     1,968 
         . |         0          0          0          0          0          0          0          0          0          0 |     8,059 
-----------+--------------------------------------------------------------------------------------------------------------+----------
     Total |       584      1,155         89        483        698        723         62      1,045      2,996      1,268 |    18,558 


  NCDS Age |
  16 Dad's |
  EGP from |                   3P Father,male heads socio-economic grp (GRO 1970)
       SEG | 11. Unski  12. Work   13. Farm   14. Farm-  15. Agric  16. Armed  17. Inade          . |     Total
-----------+----------------------------------------------------------------------------------------+----------
         I |         0          0          0          0          0          0          0          0 |     1,156 
    II+IVa |         0          0          0          0          0          0          0          0 |     1,853 
       III |         0          0          0          0          0          0          0          0 |       785 
     IVb+c |         0        491        105        100          0          0          0          0 |       696 
         V |         0          0          0          0          0          0          0          0 |     1,045 
        VI |         0          0          0          0          0          0          0          0 |     2,996 
       VII |       586          0          0          0        114          0          0          0 |     1,968 
         . |         0          0          0          0          0         89        122      7,848 |     8,059 
-----------+----------------------------------------------------------------------------------------+----------
     Total |       586        491        105        100        114         89        122      7,848 |    18,558 


. 
. *Father's SEG measured age 11

. tab n1175

      2P Father,male |
              head's |
  socio-economic grp |
          (GRO 1966) |      Freq.     Percent        Cum.
---------------------+-----------------------------------
  1. Emp,manag,large |        582        4.32        4.32
  2. Emp,manag,small |      1,365       10.12       14.44
    3. Prof-self-emp |        123        0.91       15.35
   4. Prof-employees |        615        4.56       19.91
 5. Intermed non-man |        720        5.34       25.25
   6. Junior non-man |      1,214        9.00       34.26
 7. Personal service |         78        0.58       34.84
   8. Foremen-manual |        820        6.08       40.92
   9. Skilled manual |      4,273       31.69       72.61
10. Semi skld manual |      1,808       13.41       86.02
11. Unskilled manual |        783        5.81       91.83
12. Work-own account |        480        3.56       95.39
13. Farmer-emp,manag |        150        1.11       96.50
14. Farm-own account |        113        0.84       97.34
    15. Agric worker |        179        1.33       98.66
    16. Armed forces |        180        1.34      100.00
---------------------+-----------------------------------
               Total |     13,483      100.00

. capture drop ncds11_dadseg2egp

. gen ncds11_dadseg2egp = .
(18,558 missing values generated)

. replace ncds11_dadseg2egp = 1 if (n1175==1)|(n1175==3)|(n1175==4)
(1,320 real changes made)

. replace ncds11_dadseg2egp = 2 if (n1175==2)|(n1175==5)
(2,085 real changes made)

. replace ncds11_dadseg2egp = 3 if (n1175==6)|(n1175==7)
(1,292 real changes made)

. replace ncds11_dadseg2egp = 4 if (n1175==12)|(n1175==13)|(n1175==14)
(743 real changes made)

. replace ncds11_dadseg2egp = 5 if (n1175==8)
(820 real changes made)

. replace ncds11_dadseg2egp = 6 if (n1175==9)
(4,273 real changes made)

. replace ncds11_dadseg2egp = 7 if (n1175==10)|(n1175==11)|(n1175==15)
(2,770 real changes made)

. replace ncds11_dadseg2egp = . if (n1175==16)
(0 real changes made)

. label values ncds11_dadseg2egp egp

. label variable ncds11_dadseg2egp "NCDS Age 11 Dad's EGP from SEG"

. tab ncds11_dadseg2egp n1175, mi

  NCDS Age |
  11 Dad's |
  EGP from |                              2P Father,male head's socio-economic grp (GRO 1966)
       SEG | 1. Emp,ma  2. Emp,ma  3. Prof-s  4. Prof-e  5. Interm  6. Junior  7. Person  8. Foreme  9. Skille  10. Semi  |     Total
-----------+--------------------------------------------------------------------------------------------------------------+----------
         I |       582          0        123        615          0          0          0          0          0          0 |     1,320 
    II+IVa |         0      1,365          0          0        720          0          0          0          0          0 |     2,085 
       III |         0          0          0          0          0      1,214         78          0          0          0 |     1,292 
     IVb+c |         0          0          0          0          0          0          0          0          0          0 |       743 
         V |         0          0          0          0          0          0          0        820          0          0 |       820 
        VI |         0          0          0          0          0          0          0          0      4,273          0 |     4,273 
       VII |         0          0          0          0          0          0          0          0          0      1,808 |     2,770 
         . |         0          0          0          0          0          0          0          0          0          0 |     5,255 
-----------+--------------------------------------------------------------------------------------------------------------+----------
     Total |       582      1,365        123        615        720      1,214         78        820      4,273      1,808 |    18,558 


  NCDS Age |
  11 Dad's |
  EGP from |             2P Father,male head's socio-economic grp (GRO 1966)
       SEG | 11. Unski  12. Work-  13. Farme  14. Farm-  15. Agric  16. Armed          . |     Total
-----------+-----------------------------------------------------------------------------+----------
         I |         0          0          0          0          0          0          0 |     1,320 
    II+IVa |         0          0          0          0          0          0          0 |     2,085 
       III |         0          0          0          0          0          0          0 |     1,292 
     IVb+c |         0        480        150        113          0          0          0 |       743 
         V |         0          0          0          0          0          0          0 |       820 
        VI |         0          0          0          0          0          0          0 |     4,273 
       VII |       783          0          0          0        179          0          0 |     2,770 
         . |         0          0          0          0          0        180      5,075 |     5,255 
-----------+-----------------------------------------------------------------------------+----------
     Total |       783        480        150        113        179        180      5,075 |    18,558 


. 
. *Mother's SEG (Age 16)

. tab n2394

      3P Mothers |
  Socio-economic |
  group,if works |
      (GRO 1970) |      Freq.     Percent        Cum.
-----------------+-----------------------------------
 Emp,manag large |         46        0.61        0.61
 Emp,manag small |        220        2.93        3.54
   Prof-self-emp |          7        0.09        3.63
  Prof-employees |         42        0.56        4.19
Intermed non-man |        987       13.13       17.32
  Junior non-man |      2,327       30.96       48.28
Personal service |      1,347       17.92       66.20
    Foremen-man. |         55        0.73       66.93
  Skilled manual |        294        3.91       70.84
Semi skld manual |      1,121       14.91       85.75
Unskilled manual |        772       10.27       96.02
Work own account |        115        1.53       97.55
Farmer-emp,manag |          2        0.03       97.58
Farm-own account |          6        0.08       97.66
    Agric worker |         62        0.82       98.48
    Armed forces |          1        0.01       98.50
 Inadequate info |        113        1.50      100.00
-----------------+-----------------------------------
           Total |      7,517      100.00

. capture drop ncds16_mumseg2egp

. gen ncds16_mumseg2egp = .
(18,558 missing values generated)

. replace ncds16_mumseg2egp = 1 if (n2394==1)|(n2394==3)|(n2394==4)
(95 real changes made)

. replace ncds16_mumseg2egp = 2 if (n2394==2)|(n2394==5)
(1,207 real changes made)

. replace ncds16_mumseg2egp = 3 if (n2394==6)|(n2394==7)
(3,674 real changes made)

. replace ncds16_mumseg2egp = 4 if (n2394==12)|(n2394==13)|(n2394==14)
(123 real changes made)

. replace ncds16_mumseg2egp = 5 if (n2394==8)
(55 real changes made)

. replace ncds16_mumseg2egp = 6 if (n2394==9)
(294 real changes made)

. replace ncds16_mumseg2egp = 7 if (n2394==10)|(n2394==11)|(n2394==15)
(1,955 real changes made)

. replace ncds16_mumseg2egp = . if (n2394==16)
(0 real changes made)

. label values ncds16_mumseg2egp egp

. label variable ncds16_mumseg2egp "NCDS Age 16 Mum's EGP from SEG"

. tab ncds16_mumseg2egp n2394, mi

  NCDS Age |
  16 Mum's |
  EGP from |                              3P Mothers Socio-economic group,if works (GRO 1970)
       SEG | Emp,manag  Emp,manag  Prof-self  Prof-empl  Intermed   Junior no  Personal   Foremen-m  Skilled m  Semi skld |     Total
-----------+--------------------------------------------------------------------------------------------------------------+----------
         I |        46          0          7         42          0          0          0          0          0          0 |        95 
    II+IVa |         0        220          0          0        987          0          0          0          0          0 |     1,207 
       III |         0          0          0          0          0      2,327      1,347          0          0          0 |     3,674 
     IVb+c |         0          0          0          0          0          0          0          0          0          0 |       123 
         V |         0          0          0          0          0          0          0         55          0          0 |        55 
        VI |         0          0          0          0          0          0          0          0        294          0 |       294 
       VII |         0          0          0          0          0          0          0          0          0      1,121 |     1,955 
         . |         0          0          0          0          0          0          0          0          0          0 |    11,155 
-----------+--------------------------------------------------------------------------------------------------------------+----------
     Total |        46        220          7         42        987      2,327      1,347         55        294      1,121 |    18,558 


  NCDS Age |
  16 Mum's |
  EGP from |                   3P Mothers Socio-economic group,if works (GRO 1970)
       SEG | Unskilled  Work own   Farmer-em  Farm-own   Agric wor  Armed for  Inadequat          . |     Total
-----------+----------------------------------------------------------------------------------------+----------
         I |         0          0          0          0          0          0          0          0 |        95 
    II+IVa |         0          0          0          0          0          0          0          0 |     1,207 
       III |         0          0          0          0          0          0          0          0 |     3,674 
     IVb+c |         0        115          2          6          0          0          0          0 |       123 
         V |         0          0          0          0          0          0          0          0 |        55 
        VI |         0          0          0          0          0          0          0          0 |       294 
       VII |       772          0          0          0         62          0          0          0 |     1,955 
         . |         0          0          0          0          0          1        113     11,041 |    11,155 
-----------+----------------------------------------------------------------------------------------+----------
     Total |       772        115          2          6         62          1        113     11,041 |    18,558 


. 
. *return to jupyter

In [24]:
keep ncdsid ncds0_olddadrgsc ncds7_olddadrgsc ncds11_olddadrgsc ncds16_olddadrgsc ncds16_oldmumrgsc ncds16_dadseg2egp ncds11_dadseg2egp ncds16_mumseg2egp n539 n1225 n2393 n2394

*return to jupyter
. keep ncdsid ncds0_olddadrgsc ncds7_olddadrgsc ncds11_olddadrgsc ncds16_olddadrgsc ncds16_oldmumrgsc ncds16_dadseg2egp ncds11_dadseg2egp nc
> ds16_mumseg2egp n539 n1225 n2393 n2394

. 
In [25]:
save $path3\temp2.dta, replace

*return to jupyter
. save $path3\temp2.dta, replace
(note: file F:\Data\MYDATA\TEMP\temp2.dta not found)
file F:\Data\MYDATA\TEMP\temp2.dta saved

. 
. *return to jupyter

The occupational information we are going to use for our parental social class measure comes from the new occupational coding files (SN7023).

Gregg, P. (2012). Occupational Coding for the National Child Development Study (1969, 1991-2008) and the 1970 British Cohort Study (1980, 2000-2008). [data collection]. University of London. Institute of Education. Centre for Longitudinal Studies, [original data producer(s)]. UK Data Service. SN: 7023.

"Researchers from the Avon Longitudinal Study of Parents and Children (ALSPAC), based at the University of Bristol, worked on data from selected waves of the NCDS and BCS70. To create occupational code classifications, the computerised questionnaire response text strings were converted into comma separated value (CSV) files and processed using the CASCOT (Computer Assisted Structured COding Tool) software programme, which used automatic and semi-automatic processing to assign Standard Occupational Classification 2000 (SOC2000) codes (SOC2000) to entries."

The NS-SEC Full Version is recoded to the 8 category version using the NS-SEC documentation available here. See the classes and collapses of NS-SEC here.

In [26]:
use $path1\ARCHIVE\NCDSBCS_OCCS\ncds2_occupation_coding_father.dta, clear

describe

*Age 11 Father's Occupational Information

keep NCDSID N2SNSSEC N2SSOCC 

*N2SSOCC is father's SOC2000

*N2SNSSEC if father's NSSEC(simplified version, no employment status
*information used in its preparation.)

tab N2SNSSEC
capture drop ncds_panssec
gen ncds_panssec = .
    replace ncds_panssec = 1 if (N2SNSSEC>=1)&(N2SNSSEC<=2) 
        *1.1 Large Employers and Higher Managerial
    replace ncds_panssec = 2 if (N2SNSSEC>=3.1)&(N2SNSSEC<=3.4) 
        *1.2 Higher Professional
    replace ncds_panssec = 3 if (N2SNSSEC>=4.1)&(N2SNSSEC<=6) 
        *lower managerial and professional
    replace ncds_panssec = 4 if (N2SNSSEC>=7.1)&(N2SNSSEC<=7.4) 
        *intermediate
    replace ncds_panssec = 5 if (N2SNSSEC>=8.1)&(N2SNSSEC<=9.2) 
        *small employers and own account
    replace ncds_panssec = 6 if (N2SNSSEC>=10)&(N2SNSSEC<=11.2) 
        *lower supervisory and technical
    replace ncds_panssec = 7 if (N2SNSSEC>=12.1)&(N2SNSSEC<=12.7) 
        *semiroutine
    replace ncds_panssec = 8 if (N2SNSSEC>=13.1)&(N2SNSSEC<=13.5) 
        *routine
tab ncds_panssec
label variable ncds_panssec "NCDS Age 11 Father's NSSEC"
label define nssec 1 "Large Employers and Higher Managerial" 2 "Higher Professional" 3 "Lower managerial and professional" 4 "Intermediate" 5 "Small employers and own account" 6 "Lower Supervisory and Technical" 7 "Semi-Routine" 8 "Routine" 
label values ncds_panssec nssec
tab ncds_panssec, mi


rename NCDSID ncdsid
sort ncdsid

duplicates report ncdsid

*return to jupyter
. use $path1\ARCHIVE\NCDSBCS_OCCS\ncds2_occupation_coding_father.dta, clear

. 
. describe

Contains data from F:\Data\RAWDATA\ARCHIVE\NCDSBCS_OCCS\ncds2_occupation_coding_father.dta
  obs:        15,337                          
 vars:            22                          
 size:     1,901,788                          
--------------------------------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
--------------------------------------------------------------------------------------------------------------------------------------------
NCDSID          str7    %7s                   ncdsid
N2ASOCC         str4    %4s                   NCDS 1969 Father: FULL automatic SOC2000
N2ASOCS         byte    %8.0g                 NCDS 1969 Father: FULL automatic Score
N2ASOC90        int     %8.0g                 NCDS 1969 Father: Full auto SOC90
N2SSOCC         str4    %4s                   NCDS 1969 Father: SEMI auto SOC2000
N2SSOCS         byte    %8.0g                 NCDS 1969 Father: SEMI auto Score
N2SSOC90        int     %8.0g                 NCDS 1969 Father: Semi auto SOC90
N2VSOCC         str4    %4s                   NCDS 1969 Father: VERIFICATION SOC2000
N2VSOCS         byte    %8.0g                 NCDS 1969 Father: VERIFICATION Score
N2VSOC90        int     %8.0g                 NCDS 1969 Father: VERIFICATION SOC90
N2ANSSEC        double  %10.0g     N2ANSSEC   NCDS 1969 Father: NS-SEC social class code AUTO processing
N2SNSSEC        double  %10.0g     N2SNSSEC   NCDS 1969 Father: NS-SEC social class code SEMI processing
N2VNSSEC        double  %10.0g     N2VNSSEC   NCDS 1969 Father: NS-SEC social class code VERIFICATION
N2ACMSIS        double  %10.0g                NCDS 1969 Father: CAMSIS code AUTO processing
N2SCMSIS        double  %10.0g                NCDS 1969 Father: CAMSIS code SEMI processing
N2VCMSIS        double  %10.0g                NCDS 1969 Father: CAMSIS code VERIFICATION processing
N2ARGSC         double  %10.0g     N2ARGSC    NCDS 1969 Father: RGSC social class code AUTO processing
N2SRGSC         double  %10.0g     N2SRGSC    NCDS 1969 Father: RGSC social class code SEMI processing
N2VRGSC         double  %10.0g     N2VRGSC    NCDS 1969 Father: RGSC social class code VERIFICATION processing
N2ASEG          double  %10.0g     N2ASEG     NCDS 1969 Father: SEG social class code AUTO processing
N2SSEG          double  %10.0g     N2SSEG     NCDS 1969 Father: SEG social class code SEMI processing
N2VSEG          double  %10.0g     N2VSEG     NCDS 1969 Father: SEG social class code VERIFICATION processing
--------------------------------------------------------------------------------------------------------------------------------------------
Sorted by: 

. 
. *Age 11 Father's Occupational Information

. 
. keep NCDSID N2SNSSEC N2SSOCC 

. 
. *N2SSOCC is father's SOC2000

. 
. *N2SNSSEC if father's NSSEC(simplified version, no employment status

. *information used in its preparation.)

. 
. tab N2SNSSEC

  NCDS 1969 Father: NS-SEC social class |
                 code SEMI processing   |      Freq.     Percent        Cum.
----------------------------------------+-----------------------------------
                      Higher managerial |        367        3.29        3.29
                                    3.1 |        461        4.13        7.42
                                    3.2 |         63        0.56        7.99
                                    3.3 |         12        0.11        8.09
                                    4.1 |        674        6.04       14.14
                                    4.2 |        147        1.32       15.45
                                    4.3 |         26        0.23       15.69
                       Lower managerial |        476        4.27       19.95
                                    7.1 |        347        3.11       23.06
                                    7.2 |        467        4.19       27.25
                                    7.3 |        122        1.09       28.34
                                    7.4 |        122        1.09       29.44
                                    8.1 |        229        2.05       31.49
                                    9.1 |        882        7.91       39.40
                                    9.2 |        263        2.36       41.75
                      Lower supervisory |        185        1.66       43.41
                                   11.1 |      1,302       11.67       55.08
                                   11.2 |        330        2.96       58.04
                                   12.1 |        155        1.39       59.43
                                   12.2 |        267        2.39       61.82
                                   12.3 |        696        6.24       68.06
                                   12.4 |        661        5.93       73.99
                                   12.5 |        130        1.17       75.15
                                   12.6 |         61        0.55       75.70
                                   12.7 |          2        0.02       75.72
                                   13.1 |         35        0.31       76.03
                                   13.2 |        193        1.73       77.76
                                   13.3 |      1,469       13.17       90.93
                                   13.4 |        991        8.88       99.81
                                   13.5 |         21        0.19      100.00
----------------------------------------+-----------------------------------
                                  Total |     11,156      100.00

. capture drop ncds_panssec

. gen ncds_panssec = .
(15,337 missing values generated)

.     replace ncds_panssec = 1 if (N2SNSSEC>=1)&(N2SNSSEC<=2) 
(367 real changes made)

.         *1.1 Large Employers and Higher Managerial

.     replace ncds_panssec = 2 if (N2SNSSEC>=3.1)&(N2SNSSEC<=3.4) 
(536 real changes made)

.         *1.2 Higher Professional

.     replace ncds_panssec = 3 if (N2SNSSEC>=4.1)&(N2SNSSEC<=6) 
(1,323 real changes made)

.         *lower managerial and professional

.     replace ncds_panssec = 4 if (N2SNSSEC>=7.1)&(N2SNSSEC<=7.4) 
(1,058 real changes made)

.         *intermediate

.     replace ncds_panssec = 5 if (N2SNSSEC>=8.1)&(N2SNSSEC<=9.2) 
(1,374 real changes made)

.         *small employers and own account

.     replace ncds_panssec = 6 if (N2SNSSEC>=10)&(N2SNSSEC<=11.2) 
(1,817 real changes made)

.         *lower supervisory and technical

.     replace ncds_panssec = 7 if (N2SNSSEC>=12.1)&(N2SNSSEC<=12.7) 
(1,972 real changes made)

.         *semiroutine

.     replace ncds_panssec = 8 if (N2SNSSEC>=13.1)&(N2SNSSEC<=13.5) 
(2,709 real changes made)

.         *routine

. tab ncds_panssec

ncds_pansse |
          c |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |        367        3.29        3.29
          2 |        536        4.80        8.09
          3 |      1,323       11.86       19.95
          4 |      1,058        9.48       29.44
          5 |      1,374       12.32       41.75
          6 |      1,817       16.29       58.04
          7 |      1,972       17.68       75.72
          8 |      2,709       24.28      100.00
------------+-----------------------------------
      Total |     11,156      100.00

. label variable ncds_panssec "NCDS Age 11 Father's NSSEC"

. label define nssec 1 "Large Employers and Higher Managerial" 2 "Higher Professional" 3 "Lower managerial and professional" 4 "Intermediate
> " 5 "Small employers and own account" 6 "Lower Supervisory and Technical" 7 "Semi-Routine" 8 "Routine" 

. label values ncds_panssec nssec

. tab ncds_panssec, mi

           NCDS Age 11 Father's NSSEC |      Freq.     Percent        Cum.
--------------------------------------+-----------------------------------
Large Employers and Higher Managerial |        367        2.39        2.39
                  Higher Professional |        536        3.49        5.89
    Lower managerial and professional |      1,323        8.63       14.51
                         Intermediate |      1,058        6.90       21.41
      Small employers and own account |      1,374        8.96       30.37
      Lower Supervisory and Technical |      1,817       11.85       42.22
                         Semi-Routine |      1,972       12.86       55.08
                              Routine |      2,709       17.66       72.74
                                    . |      4,181       27.26      100.00
--------------------------------------+-----------------------------------
                                Total |     15,337      100.00

. 
. 
. rename NCDSID ncdsid

. sort ncdsid

. 
. duplicates report ncdsid

Duplicates in terms of ncdsid

--------------------------------------
   copies | observations       surplus
----------+---------------------------
        1 |        15337             0
--------------------------------------

. 
. *return to jupyter

Just to double check we are going to recode the SOC2000 codes in this file to NS-SEC.

A look-up table to code NS-SEC from soc2000 is available here, this is produced using the NS-SEC documentation.

NS-SEC is ideally computed using both occupational and employment status information. In this analysis we compute NS-SEC using the simplified method (i.e. occupational information only).

There is some employment status information available in the datasets, however there is not the required information to produce the standardised employment status variable required to compute the full version of NS-SEC fully in the prescribed manner in a comparable manner across the two cohorts.

For example in the NCDS first survey we have information on:

  • Father self-employed
  • If whether they employ more than 10 people
  • If not self-employed whether he supervises others (e.g. foreman, manager, charge hand)

The NS-SEC required the following information:

  • Whether self-employed with no employees (i.e. own account worker)
  • If employer whether employs less then 25, or 25 or more employees
  • If an employee whether they are a supervisor or not
  • and if a supervisor how many employees they supervise

The NCDS question on whether employees supervise others includes foremen and managers in the same response. However the NS-SEC documentation explicitly defines managers as separate from supervisors.

Due to these differences we have been cautious and not used the employment status information to ensure that the coding of our social class measure can be as standardised as possible.

Furthermore, there are different employment status questions used in the BCS which would provide slightly different employment status information. This would potentially hinder fair comparisons between the two cohorts. As a result, we use the simplified NS-SEC coding method in both cohorts.

More information on NS-SEC and the simplified method are available in the NS-SEC documentation.

In [27]:
*We create an employment status variable (==0) "missing" as we
*are using the simplified coding method.
capture drop ukempst
gen ukempst = 0

*The soc2000 variable is a string and contains non-numeric values
*so i need to turn this into a numeric variable.  
describe 
capture drop soc2000
gen soc2000 = real(N2SSOCC)

duplicates report ncdsid

sort soc2000 ukempst
merge m:m soc2000 ukempst using $path1\OTHER\SOC2000_to_NSSEC_20160527_RC_V1.dta

drop if _merge==2

tab nssec

tab nssec ncds_panssec

*There is a perfect match between the NS-SEC variable in the data, and the version coded by us.

rename soc2000 ncds_dadsoc2000 

drop ukempst nssec _merge

*return to jupyter
. *We create an employment status variable (==0) "missing" as we

. *are using the simplified coding method.

. capture drop ukempst

. gen ukempst = 0

. 
. *The soc2000 variable is a string and contains non-numeric values

. *so i need to turn this into a numeric variable.  

. describe 

Contains data from F:\Data\RAWDATA\ARCHIVE\NCDSBCS_OCCS\ncds2_occupation_coding_father.dta
  obs:        15,337                          
 vars:             5                          
 size:       414,099                          
--------------------------------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
--------------------------------------------------------------------------------------------------------------------------------------------
ncdsid          str7    %7s                   ncdsid
N2SSOCC         str4    %4s                   NCDS 1969 Father: SEMI auto SOC2000
N2SNSSEC        double  %10.0g     N2SNSSEC   NCDS 1969 Father: NS-SEC social class code SEMI processing
ncds_panssec    float   %37.0g     nssec      NCDS Age 11 Father's NSSEC
ukempst         float   %9.0g                 
--------------------------------------------------------------------------------------------------------------------------------------------
Sorted by: ncdsid
     Note: Dataset has changed since last saved.

. capture drop soc2000

. gen soc2000 = real(N2SSOCC)
(4,179 missing values generated)

. 
. duplicates report ncdsid

Duplicates in terms of ncdsid

--------------------------------------
   copies | observations       surplus
----------+---------------------------
        1 |        15337             0
--------------------------------------

. 
. sort soc2000 ukempst

. merge m:m soc2000 ukempst using $path1\OTHER\SOC2000_to_NSSEC_20160527_RC_V1.dta
(label nssec already defined)

    Result                           # of obs.
    -----------------------------------------
    not matched                         6,692
        from master                     4,179  (_merge==1)
        from using                      2,513  (_merge==2)

    matched                            11,158  (_merge==3)
    -----------------------------------------

. 
. drop if _merge==2
(2,513 observations deleted)

. 
. tab nssec

                                nssec |      Freq.     Percent        Cum.
--------------------------------------+-----------------------------------
Large Employers and Higher Managerial |        369        3.31        3.31
                  Higher Professional |        536        4.80        8.11
    Lower managerial and professional |      1,323       11.86       19.97
                         Intermediate |      1,058        9.48       29.45
      Small employers and own account |      1,374       12.31       41.76
      Lower Supervisory and Technical |      1,817       16.28       58.05
                         Semi-Routine |      1,972       17.67       75.72
                              Routine |      2,709       24.28      100.00
--------------------------------------+-----------------------------------
                                Total |     11,158      100.00

. 
. tab nssec ncds_panssec

                      |                               NCDS Age 11 Father's NSSEC
                nssec | Large Emp  Higher Pr  Lower man  Intermedi  Small emp  Lower Sup  Semi-Rout    Routine |     Total
----------------------+----------------------------------------------------------------------------------------+----------
Large Employers and H |       367          0          0          0          0          0          0          0 |       367 
  Higher Professional |         0        536          0          0          0          0          0          0 |       536 
Lower managerial and  |         0          0      1,323          0          0          0          0          0 |     1,323 
         Intermediate |         0          0          0      1,058          0          0          0          0 |     1,058 
Small employers and o |         0          0          0          0      1,374          0          0          0 |     1,374 
Lower Supervisory and |         0          0          0          0          0      1,817          0          0 |     1,817 
         Semi-Routine |         0          0          0          0          0          0      1,972          0 |     1,972 
              Routine |         0          0          0          0          0          0          0      2,709 |     2,709 
----------------------+----------------------------------------------------------------------------------------+----------
                Total |       367        536      1,323      1,058      1,374      1,817      1,972      2,709 |    11,156 


. 
. *There is a perfect match between the NS-SEC variable in the data, and the version coded by us.

. 
. rename soc2000 ncds_dadsoc2000 

. 
. drop ukempst nssec _merge

. 
In [28]:
save $path3\temp3.dta, replace

*return to jupyter
. *return to jupyter

. save $path3\temp3.dta, replace
(note: file F:\Data\MYDATA\TEMP\temp3.dta not found)
file F:\Data\MYDATA\TEMP\temp3.dta saved

. 
. *return to jupyter

We use the reponse datasets to include information on the outcome at each sweep of the survey (e.g. productive or not productive).

In [29]:
use $path1\ARCHIVE\NCDS\response\ncds_response.dta, clear

keep NCDSID OUTCME00 OUTCME01 OUTCME02
    numlabel, add

rename NCDSID ncdsid

*NCDS AGE 0  - This variable indicates the outcome of the first survey (i.e. productive or other outcome)
tab OUTCME00
rename OUTCME00 ncds_0outcome
    label variable ncds_0outcome "NCDS response outcome 1958 (age 0)"
    tab ncds_0outcome
    
*NCDS AGE 7 - This variable indicates the outcome of the age 7 survey
tab OUTCME01
rename OUTCME01 ncds_7outcome
    label variable ncds_7outcome "NCDS response outcome 1965 (age 7)"
    tab ncds_7outcome
    
*NCDS AGE 11 - This variable indicates the outcome of the age 11 survey
tab OUTCME02
rename OUTCME02 ncds_11outcome
    label variable ncds_11outcome "NCDS response outcome 1969 (age 11)"
    tab ncds_11outcome
    
*Here we create a simple dummy variable indicating if the cohort member had a productive
*interview at the age 11 survey (or not)
tab ncds_11outcome
    gen sweeptestoutcome = 0
    replace sweeptestoutcome = 1 if (ncds_11outcome==1)
    label define yesno 1 "Yes" 0 "No"
    label values sweeptestoutcome yesno
    tab ncds_11outcome sweeptestoutcome
    label variable sweeptestoutcome "Productive at age 11 survey"

sort ncdsid
save $path3\temp4.dta, replace

*return to jupyter
. use $path1\ARCHIVE\NCDS\response\ncds_response.dta, clear

. 
. keep NCDSID OUTCME00 OUTCME01 OUTCME02

.     numlabel, add

. 
. rename NCDSID ncdsid

. 
. *NCDS AGE 0  - This variable indicates the outcome of the first survey (i.e. productive or other outcome)

. tab OUTCME00

Outcome to PMS (1958)    |      Freq.     Percent        Cum.
-------------------------+-----------------------------------
           1. Productive |     17,415       93.84       93.84
          3. Non-contact |        218        1.17       95.02
           6. Not Issued |        925        4.98      100.00
-------------------------+-----------------------------------
                   Total |     18,558      100.00

. rename OUTCME00 ncds_0outcome

.     label variable ncds_0outcome "NCDS response outcome 1958 (age 0)"

.     tab ncds_0outcome

   NCDS response outcome |
            1958 (age 0) |      Freq.     Percent        Cum.
-------------------------+-----------------------------------
           1. Productive |     17,415       93.84       93.84
          3. Non-contact |        218        1.17       95.02
           6. Not Issued |        925        4.98      100.00
-------------------------+-----------------------------------
                   Total |     18,558      100.00

.     
. *NCDS AGE 7 - This variable indicates the outcome of the age 7 survey

. tab OUTCME01

Outcome to NCDS1 (1965)  |      Freq.     Percent        Cum.
-------------------------+-----------------------------------
           1. Productive |     15,425       83.12       83.12
              2. Refusal |         80        0.43       83.55
          3. Non-contact |      1,036        5.58       89.13
   4. Other unproductive |        173        0.93       90.06
           6. Not Issued |        548        2.95       93.02
7. Not Issued - Emigrant |        475        2.56       95.58
    8. Not Issued - Dead |        821        4.42      100.00
-------------------------+-----------------------------------
                   Total |     18,558      100.00

. rename OUTCME01 ncds_7outcome

.     label variable ncds_7outcome "NCDS response outcome 1965 (age 7)"

.     tab ncds_7outcome

   NCDS response outcome |
            1965 (age 7) |      Freq.     Percent        Cum.
-------------------------+-----------------------------------
           1. Productive |     15,425       83.12       83.12
              2. Refusal |         80        0.43       83.55
          3. Non-contact |      1,036        5.58       89.13
   4. Other unproductive |        173        0.93       90.06
           6. Not Issued |        548        2.95       93.02
7. Not Issued - Emigrant |        475        2.56       95.58
    8. Not Issued - Dead |        821        4.42      100.00
-------------------------+-----------------------------------
                   Total |     18,558      100.00

.     
. *NCDS AGE 11 - This variable indicates the outcome of the age 11 survey

. tab OUTCME02

Outcome to NCDS2 (1969)  |      Freq.     Percent        Cum.
-------------------------+-----------------------------------
           1. Productive |     15,337       82.64       82.64
              2. Refusal |        797        4.29       86.94
          3. Non-contact |        406        2.19       89.13
   4. Other unproductive |        202        1.09       90.21
           6. Not Issued |        275        1.48       91.70
7. Not Issued - Emigrant |        701        3.78       95.47
    8. Not Issued - Dead |        840        4.53      100.00
-------------------------+-----------------------------------
                   Total |     18,558      100.00

. rename OUTCME02 ncds_11outcome

.     label variable ncds_11outcome "NCDS response outcome 1969 (age 11)"

.     tab ncds_11outcome

   NCDS response outcome |
           1969 (age 11) |      Freq.     Percent        Cum.
-------------------------+-----------------------------------
           1. Productive |     15,337       82.64       82.64
              2. Refusal |        797        4.29       86.94
          3. Non-contact |        406        2.19       89.13
   4. Other unproductive |        202        1.09       90.21
           6. Not Issued |        275        1.48       91.70
7. Not Issued - Emigrant |        701        3.78       95.47
    8. Not Issued - Dead |        840        4.53      100.00
-------------------------+-----------------------------------
                   Total |     18,558      100.00

.     
. *Here we create a simple dummy variable indicating if the cohort member had a productive

. *interview at the age 11 survey (or not)

. tab ncds_11outcome

   NCDS response outcome |
           1969 (age 11) |      Freq.     Percent        Cum.
-------------------------+-----------------------------------
           1. Productive |     15,337       82.64       82.64
              2. Refusal |        797        4.29       86.94
          3. Non-contact |        406        2.19       89.13
   4. Other unproductive |        202        1.09       90.21
           6. Not Issued |        275        1.48       91.70
7. Not Issued - Emigrant |        701        3.78       95.47
    8. Not Issued - Dead |        840        4.53      100.00
-------------------------+-----------------------------------
                   Total |     18,558      100.00

.     gen sweeptestoutcome = 0

.     replace sweeptestoutcome = 1 if (ncds_11outcome==1)
(15,337 real changes made)

.     label define yesno 1 "Yes" 0 "No"

.     label values sweeptestoutcome yesno

.     tab ncds_11outcome sweeptestoutcome

NCDS response outcome |   sweeptestoutcome
        1969 (age 11) |        No        Yes |     Total
----------------------+----------------------+----------
        1. Productive |         0     15,337 |    15,337 
           2. Refusal |       797          0 |       797 
       3. Non-contact |       406          0 |       406 
4. Other unproductive |       202          0 |       202 
        6. Not Issued |       275          0 |       275 
7. Not Issued - Emigr |       701          0 |       701 
 8. Not Issued - Dead |       840          0 |       840 
----------------------+----------------------+----------
                Total |     3,221     15,337 |    18,558 


.     label variable sweeptestoutcome "Productive at age 11 survey"

. 
. sort ncdsid

. save $path3\temp4.dta, replace
(note: file F:\Data\MYDATA\TEMP\temp4.dta not found)
file F:\Data\MYDATA\TEMP\temp4.dta saved

. 
. *return to jupyter

We have also brought together some additional variables that are not used in the main data analysis, but which could potentially be used to produce the inverse probability weights and in the multiple imputation.

These are the type of variables that have previously been used in models of missing data in the cohort studies (see Mostafa & Wiggins, 2015; Plewis, Calderwood, Hawkes, & Nathan, 2004).

In [30]:
use $path1\ARCHIVE\NCDS\S1-3\ncds0123.dta, clear

keep ncdsid n0region n553 n545 n504 

numlabel n0region n553 n545 n504 , add

*Whether the cohort member's mother is married.
tab n545
capture drop ncds_married
    gen ncds_married = .
    replace ncds_married = 1 if (n545==4)
    replace ncds_married = 0 if (n545==1)|(n545==2)|(n545==3)|(n545==5)
    label variable ncds_married "NCDS Mother married at Cohort Member's Birth"
    label define yesno 1 "Yes" 0 "No"
    label values ncds_married yesno
    tab ncds_married
    drop n545

*The cohort member's mother's age at the cohort member's birth
tab n553
recode n553 (-1=.)
rename n553 ncds_mumagebirth
label variable ncds_mumagebirth "NCDS Mother's Age at Cohort Member's Birth"
tab ncds_mumagebirth

*Parity at the cohort member's birth
tab n504
recode n504 (-1=.)
    rename n504 ncds_parity
    label variable ncds_parity "NCDS Parity at Birth"
    _strip_labels ncds_parity
    replace ncds_parity = (ncds_parity-1)
    tab ncds_parity

*Region at the first survey
tab n0region
    rename n0region ncds_region
    recode ncds_region (-2=.)
    tab ncds_region

*return to jupyter
. use $path1\ARCHIVE\NCDS\S1-3\ncds0123.dta, clear

. 
. keep ncdsid n0region n553 n545 n504 

. 
. numlabel n0region n553 n545 n504 , add

. 
. *Whether the cohort member's mother is married.

. tab n545

   0 Mother's present |
       marital status |      Freq.     Percent        Cum.
----------------------+-----------------------------------
-1. NA, incomplt info |         10        0.06        0.06
   1. Sep,Div,Widowed |        161        0.92        0.98
      2. Stable union |         39        0.22        1.21
     3. Twice married |         33        0.19        1.40
           4. Married |     16,662       95.68       97.07
         5. Unmarried |        510        2.93      100.00
----------------------+-----------------------------------
                Total |     17,415      100.00

. capture drop ncds_married

.     gen ncds_married = .
(18,558 missing values generated)

.     replace ncds_married = 1 if (n545==4)
(16,662 real changes made)

.     replace ncds_married = 0 if (n545==1)|(n545==2)|(n545==3)|(n545==5)
(743 real changes made)

.     label variable ncds_married "NCDS Mother married at Cohort Member's Birth"

.     label define yesno 1 "Yes" 0 "No"

.     label values ncds_married yesno

.     tab ncds_married

NCDS Mother |
 married at |
     Cohort |
   Member's |
      Birth |      Freq.     Percent        Cum.
------------+-----------------------------------
         No |        743        4.27        4.27
        Yes |     16,662       95.73      100.00
------------+-----------------------------------
      Total |     17,405      100.00

.     drop n545

. 
. *The cohort member's mother's age at the cohort member's birth

. tab n553

 0 Mother's |
   age last |
birthday,in |
      years |      Freq.     Percent        Cum.
------------+-----------------------------------
     -1. NA |         13        0.07        0.07
          8 |          1        0.01        0.08
         14 |          1        0.01        0.09
         15 |          8        0.05        0.13
         16 |         41        0.24        0.37
         17 |        142        0.82        1.18
         18 |        306        1.76        2.94
         19 |        494        2.84        5.78
         20 |        772        4.43       10.21
         21 |        951        5.46       15.67
         22 |      1,033        5.93       21.60
         23 |      1,122        6.44       28.04
         24 |      1,139        6.54       34.59
         25 |      1,141        6.55       41.14
         26 |      1,118        6.42       47.56
         27 |      1,255        7.21       54.76
         28 |      1,025        5.89       60.65
         29 |      1,046        6.01       66.66
         30 |        915        5.25       71.91
         31 |        724        4.16       76.07
         32 |        700        4.02       80.09
         33 |        633        3.63       83.72
         34 |        560        3.22       86.94
         35 |        452        2.60       89.53
         36 |        407        2.34       91.87
         37 |        469        2.69       94.56
         38 |        295        1.69       96.26
         39 |        202        1.16       97.42
         40 |        122        0.70       98.12
         41 |        120        0.69       98.81
         42 |         85        0.49       99.29
         43 |         63        0.36       99.66
         44 |         27        0.16       99.81
         45 |         17        0.10       99.91
         46 |         11        0.06       99.97
         47 |          4        0.02       99.99
         48 |          1        0.01      100.00
------------+-----------------------------------
      Total |     17,415      100.00

. recode n553 (-1=.)
(n553: 13 changes made)

. rename n553 ncds_mumagebirth

. label variable ncds_mumagebirth "NCDS Mother's Age at Cohort Member's Birth"

. tab ncds_mumagebirth

       NCDS |
   Mother's |
     Age at |
     Cohort |
   Member's |
      Birth |      Freq.     Percent        Cum.
------------+-----------------------------------
          8 |          1        0.01        0.01
         14 |          1        0.01        0.01
         15 |          8        0.05        0.06
         16 |         41        0.24        0.29
         17 |        142        0.82        1.11
         18 |        306        1.76        2.87
         19 |        494        2.84        5.71
         20 |        772        4.44       10.14
         21 |        951        5.46       15.61
         22 |      1,033        5.94       21.54
         23 |      1,122        6.45       27.99
         24 |      1,139        6.55       34.54
         25 |      1,141        6.56       41.09
         26 |      1,118        6.42       47.52
         27 |      1,255        7.21       54.73
         28 |      1,025        5.89       60.62
         29 |      1,046        6.01       66.63
         30 |        915        5.26       71.89
         31 |        724        4.16       76.05
         32 |        700        4.02       80.07
         33 |        633        3.64       83.71
         34 |        560        3.22       86.93
         35 |        452        2.60       89.52
         36 |        407        2.34       91.86
         37 |        469        2.70       94.56
         38 |        295        1.70       96.25
         39 |        202        1.16       97.41
         40 |        122        0.70       98.12
         41 |        120        0.69       98.80
         42 |         85        0.49       99.29
         43 |         63        0.36       99.66
         44 |         27        0.16       99.81
         45 |         17        0.10       99.91
         46 |         11        0.06       99.97
         47 |          4        0.02       99.99
         48 |          1        0.01      100.00
------------+-----------------------------------
      Total |     17,402      100.00

. 
. *Parity at the cohort member's birth

. tab n504

            0 Parity |      Freq.     Percent        Cum.
---------------------+-----------------------------------
              -1. NA |          3        0.02        0.02
1. No prev aft 28wks |      6,396       36.73       36.74
    2. 1 after 28wks |      5,364       30.80       67.55
    3. 2 after 28wks |      2,730       15.68       83.22
    4. 3 after 28wks |      1,357        7.79       91.01
    5. 4 after 28wks |        705        4.05       95.06
    6. 5 after 28wks |        391        2.25       97.31
    7. 6 after 28wks |        216        1.24       98.55
    8. 7 after 28wks |        129        0.74       99.29
    9. 8 after 28wks |         67        0.38       99.67
   10. 9 after 28wks |         57        0.33      100.00
---------------------+-----------------------------------
               Total |     17,415      100.00

. recode n504 (-1=.)
(n504: 3 changes made)

.     rename n504 ncds_parity

.     label variable ncds_parity "NCDS Parity at Birth"

.     _strip_labels ncds_parity

.     replace ncds_parity = (ncds_parity-1)
(17,412 real changes made)

.     tab ncds_parity

NCDS Parity |
   at Birth |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |      6,396       36.73       36.73
          1 |      5,364       30.81       67.54
          2 |      2,730       15.68       83.22
          3 |      1,357        7.79       91.01
          4 |        705        4.05       95.06
          5 |        391        2.25       97.31
          6 |        216        1.24       98.55
          7 |        129        0.74       99.29
          8 |         67        0.38       99.67
          9 |         57        0.33      100.00
------------+-----------------------------------
      Total |     17,412      100.00

. 
. *Region at the first survey

. tab n0region

     Region at PMS |
    (1958) - Birth |      Freq.     Percent        Cum.
-------------------+-----------------------------------
    -2. Not in PMS |      1,141        6.15        6.15
          1. North |      1,234        6.65       12.80
     2. North West |      2,295       12.37       25.17
   3. E & W.Riding |      1,433        7.72       32.89
 4. North Midlands |      1,299        7.00       39.89
       5. Midlands |      1,648        8.88       48.77
           6. East |      1,242        6.69       55.46
     7. South East |      3,445       18.56       74.03
          8. South |        955        5.15       79.17
     9. South West |        966        5.21       84.38
         10. Wales |        914        4.93       89.30
      11. Scotland |      1,985       10.70      100.00
-------------------+-----------------------------------
             Total |     18,557      100.00

.     rename n0region ncds_region

.     recode ncds_region (-2=.)
(ncds_region: 1141 changes made)

.     tab ncds_region

     Region at PMS |
    (1958) - Birth |      Freq.     Percent        Cum.
-------------------+-----------------------------------
          1. North |      1,234        7.09        7.09
     2. North West |      2,295       13.18       20.26
   3. E & W.Riding |      1,433        8.23       28.49
 4. North Midlands |      1,299        7.46       35.95
       5. Midlands |      1,648        9.46       45.41
           6. East |      1,242        7.13       52.54
     7. South East |      3,445       19.78       72.32
          8. South |        955        5.48       77.81
     9. South West |        966        5.55       83.35
         10. Wales |        914        5.25       88.60
      11. Scotland |      1,985       11.40      100.00
-------------------+-----------------------------------
             Total |     17,416      100.00

. 
. *return to jupyter

In [31]:
sort ncdsid

save $path3\temp5.dta, replace

*return to jupyter
. sort ncdsid

. 
. save $path3\temp5.dta, replace
(note: file F:\Data\MYDATA\TEMP\temp5.dta not found)
file F:\Data\MYDATA\TEMP\temp5.dta saved

. 
. *return to jupyter

Merge all these pieces of information together to make a single NCDS working data file.

In [32]:
use $path3\temp1.dta, clear
    sort ncdsid
    merge 1:1 ncdsid using $path3\temp2.dta
    drop _merge
    duplicates report ncdsid
    sort ncdsid
    merge 1:1 ncdsid using $path3\temp3.dta
    drop _merge
    duplicates report ncdsid
    sort ncdsid
    merge 1:1 ncdsid using $path3\temp4.dta
    drop _merge
    duplicates report ncdsid
    sort ncdsid
    merge 1:1 ncdsid using $path3\temp5.dta
    drop _merge
    duplicates report ncdsid
    sort ncdsid 
    
capture drop cohort
    gen cohort=1
    label variable cohort "Cohort"
    label define cohort 1 "NCDS" 2 "BCS", replace
    label values cohort cohort
    tab cohort, mi

save $path2\NCDS_MAIN.dta, replace

*return to jupyter
. use $path3\temp1.dta, clear

.     sort ncdsid

.     merge 1:1 ncdsid using $path3\temp2.dta

    Result                           # of obs.
    -----------------------------------------
    not matched                             0
    matched                            18,558  (_merge==3)
    -----------------------------------------

.     drop _merge

.     duplicates report ncdsid

Duplicates in terms of ncdsid

--------------------------------------
   copies | observations       surplus
----------+---------------------------
        1 |        18558             0
--------------------------------------

.     sort ncdsid

.     merge 1:1 ncdsid using $path3\temp3.dta

    Result                           # of obs.
    -----------------------------------------
    not matched                         3,221
        from master                     3,221  (_merge==1)
        from using                          0  (_merge==2)

    matched                            15,337  (_merge==3)
    -----------------------------------------

.     drop _merge

.     duplicates report ncdsid

Duplicates in terms of ncdsid

--------------------------------------
   copies | observations       surplus
----------+---------------------------
        1 |        18558             0
--------------------------------------

.     sort ncdsid

.     merge 1:1 ncdsid using $path3\temp4.dta
(label yesno already defined)

    Result                           # of obs.
    -----------------------------------------
    not matched                             0
    matched                            18,558  (_merge==3)
    -----------------------------------------

.     drop _merge

.     duplicates report ncdsid

Duplicates in terms of ncdsid

--------------------------------------
   copies | observations       surplus
----------+---------------------------
        1 |        18558             0
--------------------------------------

.     sort ncdsid

.     merge 1:1 ncdsid using $path3\temp5.dta
(label yesno already defined)

    Result                           # of obs.
    -----------------------------------------
    not matched                             0
    matched                            18,558  (_merge==3)
    -----------------------------------------

.     drop _merge

.     duplicates report ncdsid

Duplicates in terms of ncdsid

--------------------------------------
   copies | observations       surplus
----------+---------------------------
        1 |        18558             0
--------------------------------------

.     sort ncdsid 

.     
. capture drop cohort

.     gen cohort=1

.     label variable cohort "Cohort"

.     label define cohort 1 "NCDS" 2 "BCS", replace

.     label values cohort cohort

.     tab cohort, mi

     Cohort |      Freq.     Percent        Cum.
------------+-----------------------------------
       NCDS |     18,558      100.00      100.00
------------+-----------------------------------
      Total |     18,558      100.00

. 
. save $path2\NCDS_MAIN.dta, replace
file F:\Data\MYDATA\WORK\NCDS_MAIN.dta saved

. 
. *return to jupyter

Delete the temporary data files.

In [33]:
erase $path3\temp1.dta
erase $path3\temp2.dta
erase $path3\temp3.dta
erase $path3\temp4.dta
erase $path3\temp5.dta

*return to jupyter
. erase $path3\temp1.dta

. erase $path3\temp2.dta

. erase $path3\temp3.dta

. erase $path3\temp4.dta

. erase $path3\temp5.dta

. 
. *return to jupyter


Preparation of BCS Datasets

Open raw BCS data file.

In [34]:
use $path1\ARCHIVE\BCS\S1\bcs7072a.dta, clear
keep bcsid a0009 a0010 a0014 a0018 a0255

count

*return to jupyter
. use $path1\ARCHIVE\BCS\S1\bcs7072a.dta, clear

. keep bcsid a0009 a0010 a0014 a0018 a0255

. 
. count
  17,196

. 
. *return to jupyter

Here we code father's and mother's RGSC (information collected in the age 0 survey). We are not using this RGSC measure in our main analysis but we code it here as it can potentially be used in producing the weights and in the multiple imputation.

In [35]:
*Age 0 Parental Social Class

*father's RGSC Age 0
tab a0014
capture drop bcs0_olddadrgsc
    gen bcs0_olddadrgsc = a0014
    recode bcs0_olddadrgsc (-2=.) (7=.) (8=.)
    label variable bcs0_olddadrgsc "BCS Age 0 Dad RGSC Old Coding"
    label define rgsc 1 "I" 2 "II" 3 "III NM" 4 "III M" 5 "IV" 6 "V"
    label values bcs0_olddadrgsc rgsc
    tab bcs0_olddadrgsc a0014
    drop a0014

*mother's RGSC Age 0
tab a0018
capture drop bcs0_oldmumrgsc
    gen bcs0_oldmumrgsc = a0018
    recode bcs0_oldmumrgsc  (-2=.) (6=.) (7=.)
    label variable bcs0_oldmumrgsc "BCS Age 0 Mum RGSC Old Coding"
    label values bcs0_oldmumrgsc rgsc
    tab bcs0_oldmumrgsc a0018
    drop a0018

sort bcsid
save $path3\temp1.dta, replace

*return to jupyter
. *Age 0 Parental Social Class

. 
. *father's RGSC Age 0

. tab a0014

     Social |
   Class of |
  Father in |
       1970 |      Freq.     Percent        Cum.
------------+-----------------------------------
    NK / NS |         97        0.56        0.56
       SC 1 |        820        4.77        5.33
       SC 2 |      1,906       11.08       16.42
    SC 3 NM |      1,924       11.19       27.61
     SC 3 M |      7,544       43.87       71.48
       SC 4 |      2,473       14.38       85.86
       SC 5 |      1,106        6.43       92.29
      Other |        502        2.92       95.21
Unsupported |        824        4.79      100.00
------------+-----------------------------------
      Total |     17,196      100.00

. capture drop bcs0_olddadrgsc

.     gen bcs0_olddadrgsc = a0014

.     recode bcs0_olddadrgsc (-2=.) (7=.) (8=.)
(bcs0_olddadrgsc: 1423 changes made)

.     label variable bcs0_olddadrgsc "BCS Age 0 Dad RGSC Old Coding"

.     label define rgsc 1 "I" 2 "II" 3 "III NM" 4 "III M" 5 "IV" 6 "V"

.     label values bcs0_olddadrgsc rgsc

.     tab bcs0_olddadrgsc a0014

 BCS Age 0 |
  Dad RGSC |                  Social Class of Father in 1970
Old Coding |      SC 1       SC 2    SC 3 NM     SC 3 M       SC 4       SC 5 |     Total
-----------+------------------------------------------------------------------+----------
         I |       820          0          0          0          0          0 |       820 
        II |         0      1,906          0          0          0          0 |     1,906 
    III NM |         0          0      1,924          0          0          0 |     1,924 
     III M |         0          0          0      7,544          0          0 |     7,544 
        IV |         0          0          0          0      2,473          0 |     2,473 
         V |         0          0          0          0          0      1,106 |     1,106 
-----------+------------------------------------------------------------------+----------
     Total |       820      1,906      1,924      7,544      2,473      1,106 |    15,773 


.     drop a0014

. 
. *mother's RGSC Age 0

. tab a0018

    Mothers |
     Social |
   Class in |
       1970 |      Freq.     Percent        Cum.
------------+-----------------------------------
  Not Known |      1,508        8.77        8.77
   SC 1 & 2 |      1,466        8.53       17.29
    SC 3 NM |      4,682       27.23       44.52
     SC 3 M |        841        4.89       49.41
       SC 4 |      3,276       19.05       68.46
       SC 5 |        211        1.23       69.69
      Other |        108        0.63       70.32
 Housewives |      5,104       29.68      100.00
------------+-----------------------------------
      Total |     17,196      100.00

. capture drop bcs0_oldmumrgsc

.     gen bcs0_oldmumrgsc = a0018

.     recode bcs0_oldmumrgsc  (-2=.) (6=.) (7=.)
(bcs0_oldmumrgsc: 6720 changes made)

.     label variable bcs0_oldmumrgsc "BCS Age 0 Mum RGSC Old Coding"

.     label values bcs0_oldmumrgsc rgsc

.     tab bcs0_oldmumrgsc a0018

 BCS Age 0 |
  Mum RGSC |              Mothers Social Class in 1970
Old Coding |  SC 1 & 2    SC 3 NM     SC 3 M       SC 4       SC 5 |     Total
-----------+-------------------------------------------------------+----------
         I |     1,466          0          0          0          0 |     1,466 
        II |         0      4,682          0          0          0 |     4,682 
    III NM |         0          0        841          0          0 |       841 
     III M |         0          0          0      3,276          0 |     3,276 
        IV |         0          0          0          0        211 |       211 
-----------+-------------------------------------------------------+----------
     Total |     1,466      4,682        841      3,276        211 |    10,476 


.     drop a0018

. 
. sort bcsid

. save $path3\temp1.dta, replace
(note: file F:\Data\MYDATA\TEMP\temp1.dta not found)
file F:\Data\MYDATA\TEMP\temp1.dta saved

. 
. *return to jupyter

Here we code father's education. As above we code this in line with the method used in Cheung and Egerton (2007, page 206-207).

We use variable e196 which comes from the age 5 survey. This is question E3 which asks: How many completed years of full-time education did the present parents have after leaving school? (e.g. college of education, polytechnic, university etc.)

This variable is coded in the data as the number of completed years after age 15.

In [36]:
use $path1\ARCHIVE\BCS\S2\f699b.dta, clear
keep bcsid e008 e009 e189a e189b e190 e191 e192 e193 e194 e195 e196 e245
numlabel, add

quietly mvdecode e008 e009 e189a e189b e190 e191 e192 e193 e194 e195 e196 e245, mv(-2=. \ -1=. \ -6=. \ -3=. \ 9=.)

*Father's Education

tab e196
capture drop bcs_paed_cat
gen bcs_paed_cat = .
*Left school at age 15
*No years completed after age 15
replace bcs_paed_cat = 1 if (e196==0)
*Left school at age 16, 17 or 18
*This one to three years after age 15
replace bcs_paed_cat = 2 if ((e196>=1)&(e196<=3))
*Left school at age 19 or 20
*This is 4 or 5 years after age 15
replace bcs_paed_cat = 3 if ((e196>=4)&(e196<=5))
*Left school at age 21+
*This is 6 or more years after age 15
replace bcs_paed_cat = 4 if ((e196>=6)&(e196<=19))
tab bcs_paed_cat
label define ed_cat 1 "Comp" 2 "Comp+1-3" 3 "Comp+4-5" 4 "Comp+6+", replace
label values bcs_paed_cat ed_cat
label variable bcs_paed_cat "BCS Father's Education Categories"
tab bcs_paed_cat

tab e196 bcs_paed_cat, mi

*return to jupyter
. use $path1\ARCHIVE\BCS\S2\f699b.dta, clear

. keep bcsid e008 e009 e189a e189b e190 e191 e192 e193 e194 e195 e196 e245

. numlabel, add

. 
. quietly mvdecode e008 e009 e189a e189b e190 e191 e192 e193 e194 e195 e196 e245, mv(-2=. \ -1=. \ -6=. \ -3=. \ 9=.)

. 
. *Father's Education

. 
. tab e196

  Years of Ft Educ |
         After Age |
         15-Father |      Freq.     Percent        Cum.
-------------------+-----------------------------------
           0. None |      8,018       65.03       65.03
                 1 |      1,615       13.10       78.13
                 2 |        784        6.36       84.49
                 3 |        543        4.40       88.90
                 4 |        253        2.05       90.95
                 5 |        293        2.38       93.32
                 6 |        346        2.81       96.13
                 7 |        278        2.25       98.39
                 8 |        152        1.23       99.62
                10 |         41        0.33       99.95
                11 |          3        0.02       99.98
                12 |          1        0.01       99.98
                13 |          1        0.01       99.99
                19 |          1        0.01      100.00
-------------------+-----------------------------------
             Total |     12,329      100.00

. capture drop bcs_paed_cat

. gen bcs_paed_cat = .
(13,135 missing values generated)

. *Left school at age 15

. *No years completed after age 15

. replace bcs_paed_cat = 1 if (e196==0)
(8,018 real changes made)

. *Left school at age 16, 17 or 18

. *This one to three years after age 15

. replace bcs_paed_cat = 2 if ((e196>=1)&(e196<=3))
(2,942 real changes made)

. *Left school at age 19 or 20

. *This is 4 or 5 years after age 15

. replace bcs_paed_cat = 3 if ((e196>=4)&(e196<=5))
(546 real changes made)

. *Left school at age 21+

. *This is 6 or more years after age 15

. replace bcs_paed_cat = 4 if ((e196>=6)&(e196<=19))
(823 real changes made)

. tab bcs_paed_cat

bcs_paed_ca |
          t |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |      8,018       65.03       65.03
          2 |      2,942       23.86       88.90
          3 |        546        4.43       93.32
          4 |        823        6.68      100.00
------------+-----------------------------------
      Total |     12,329      100.00

. label define ed_cat 1 "Comp" 2 "Comp+1-3" 3 "Comp+4-5" 4 "Comp+6+", replace

. label values bcs_paed_cat ed_cat

. label variable bcs_paed_cat "BCS Father's Education Categories"

. tab bcs_paed_cat

        BCS |
   Father's |
  Education |
 Categories |      Freq.     Percent        Cum.
------------+-----------------------------------
       Comp |      8,018       65.03       65.03
   Comp+1-3 |      2,942       23.86       88.90
   Comp+4-5 |        546        4.43       93.32
    Comp+6+ |        823        6.68      100.00
------------+-----------------------------------
      Total |     12,329      100.00

. 
. tab e196 bcs_paed_cat, mi

  Years of Ft Educ |
         After Age |           BCS Father's Education Categories
         15-Father |      Comp   Comp+1-3   Comp+4-5    Comp+6+          . |     Total
-------------------+-------------------------------------------------------+----------
           0. None |     8,018          0          0          0          0 |     8,018 
                 1 |         0      1,615          0          0          0 |     1,615 
                 2 |         0        784          0          0          0 |       784 
                 3 |         0        543          0          0          0 |       543 
                 4 |         0          0        253          0          0 |       253 
                 5 |         0          0        293          0          0 |       293 
                 6 |         0          0          0        346          0 |       346 
                 7 |         0          0          0        278          0 |       278 
                 8 |         0          0          0        152          0 |       152 
                10 |         0          0          0         41          0 |        41 
                11 |         0          0          0          3          0 |         3 
                12 |         0          0          0          1          0 |         1 
                13 |         0          0          0          1          0 |         1 
                19 |         0          0          0          1          0 |         1 
                 . |         0          0          0          0        806 |       806 
-------------------+-------------------------------------------------------+----------
             Total |     8,018      2,942        546        823        806 |    13,135 


. 
. *return to jupyter

Here we code mother's education. Again we used the method described in Cheung and Egerton (2007 page 206-207).

We use variable e195 which comes from the age 5 survey. This is question E3 which asks: How many completed years of full-time education did the present parents have after leaving school? (e.g. college of education, polytechnic, university etc.)

This variable is coded in the data as the number of completed years after age 15.

In [37]:
*Mother's Education Categories
tab e195
capture drop bcs_moed_cat
gen bcs_moed_cat = .
*Left school at age 15
*No years completed after age 15
replace bcs_moed_cat = 1 if (e195==0)
*Left school at age 16, 17 or 18
*This one to three years after age 15
replace bcs_moed_cat = 2 if ((e195>=1)&(e195<=3))
*Left school at age 19 or 20
*This is 4 or 5 years after age 15
replace bcs_moed_cat = 3 if ((e195>=4)&(e195<=5))
*Left school at age 21+
*This is 6 or more years after age 15
replace bcs_moed_cat = 4 if ((e195>=6)&(e195<=19))
tab bcs_moed_cat
label values bcs_moed_cat ed_cat
label variable bcs_moed_cat "BCS Mother's Education Categories"
tab bcs_moed_cat

tab e195 bcs_moed_cat, mi

*return to jupyter
. *Mother's Education Categories

. tab e195

  Years of Ft Educ |
         After Age |
         15-Mother |      Freq.     Percent        Cum.
-------------------+-----------------------------------
           0. None |      8,515       65.34       65.34
                 1 |      2,015       15.46       80.80
                 2 |      1,020        7.83       88.63
                 3 |        547        4.20       92.83
                 4 |        237        1.82       94.64
                 5 |        245        1.88       96.52
                 6 |        280        2.15       98.67
                 7 |        122        0.94       99.61
                 8 |         45        0.35       99.95
                10 |          5        0.04       99.99
                11 |          1        0.01      100.00
-------------------+-----------------------------------
             Total |     13,032      100.00

. capture drop bcs_moed_cat

. gen bcs_moed_cat = .
(13,135 missing values generated)

. *Left school at age 15

. *No years completed after age 15

. replace bcs_moed_cat = 1 if (e195==0)
(8,515 real changes made)

. *Left school at age 16, 17 or 18

. *This one to three years after age 15

. replace bcs_moed_cat = 2 if ((e195>=1)&(e195<=3))
(3,582 real changes made)

. *Left school at age 19 or 20

. *This is 4 or 5 years after age 15

. replace bcs_moed_cat = 3 if ((e195>=4)&(e195<=5))
(482 real changes made)

. *Left school at age 21+

. *This is 6 or more years after age 15

. replace bcs_moed_cat = 4 if ((e195>=6)&(e195<=19))
(453 real changes made)

. tab bcs_moed_cat

bcs_moed_ca |
          t |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |      8,515       65.34       65.34
          2 |      3,582       27.49       92.83
          3 |        482        3.70       96.52
          4 |        453        3.48      100.00
------------+-----------------------------------
      Total |     13,032      100.00

. label values bcs_moed_cat ed_cat

. label variable bcs_moed_cat "BCS Mother's Education Categories"

. tab bcs_moed_cat

        BCS |
   Mother's |
  Education |
 Categories |      Freq.     Percent        Cum.
------------+-----------------------------------
       Comp |      8,515       65.34       65.34
   Comp+1-3 |      3,582       27.49       92.83
   Comp+4-5 |        482        3.70       96.52
    Comp+6+ |        453        3.48      100.00
------------+-----------------------------------
      Total |     13,032      100.00

. 
. tab e195 bcs_moed_cat, mi

  Years of Ft Educ |
         After Age |           BCS Mother's Education Categories
         15-Mother |      Comp   Comp+1-3   Comp+4-5    Comp+6+          . |     Total
-------------------+-------------------------------------------------------+----------
           0. None |     8,515          0          0          0          0 |     8,515 
                 1 |         0      2,015          0          0          0 |     2,015 
                 2 |         0      1,020          0          0          0 |     1,020 
                 3 |         0        547          0          0          0 |       547 
                 4 |         0          0        237          0          0 |       237 
                 5 |         0          0        245          0          0 |       245 
                 6 |         0          0          0        280          0 |       280 
                 7 |         0          0          0        122          0 |       122 
                 8 |         0          0          0         45          0 |        45 
                10 |         0          0          0          5          0 |         5 
                11 |         0          0          0          1          0 |         1 
                 . |         0          0          0          0        103 |       103 
-------------------+-------------------------------------------------------+----------
             Total |     8,515      3,582        482        453        103 |    13,135 


. 
. *return to jupyter

Again in line with Cheung and Egerton (2007, p206-207) we take the highest of the parent's education to create a parental educational level variable.

In [38]:
*Highest of the parent's education 

capture drop bcs_parented
egen bcs_parented = rmax(bcs_paed_cat bcs_moed_cat)
tab bcs_parented
label values bcs_parented ed_cat
label variable bcs_parented "BCS Parent's Highest Education"
tab bcs_parented

tab bcs_parented bcs_paed_cat
tab bcs_parented bcs_moed_cat

sort bcsid
save $path3\temp2.dta, replace

*return to jupyter
. *Highest of the parent's education 

. 
. capture drop bcs_parented

. egen bcs_parented = rmax(bcs_paed_cat bcs_moed_cat)
(47 missing values generated)

. tab bcs_parented

bcs_parente |
          d |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |      6,840       52.26       52.26
          2 |      4,413       33.72       85.98
          3 |        775        5.92       91.90
          4 |      1,060        8.10      100.00
------------+-----------------------------------
      Total |     13,088      100.00

. label values bcs_parented ed_cat

. label variable bcs_parented "BCS Parent's Highest Education"

. tab bcs_parented

        BCS |
   Parent's |
    Highest |
  Education |      Freq.     Percent        Cum.
------------+-----------------------------------
       Comp |      6,840       52.26       52.26
   Comp+1-3 |      4,413       33.72       85.98
   Comp+4-5 |        775        5.92       91.90
    Comp+6+ |      1,060        8.10      100.00
------------+-----------------------------------
      Total |     13,088      100.00

. 
. tab bcs_parented bcs_paed_cat

       BCS |
  Parent's |
   Highest |      BCS Father's Education Categories
 Education |      Comp   Comp+1-3   Comp+4-5    Comp+6+ |     Total
-----------+--------------------------------------------+----------
      Comp |     6,342          0          0          0 |     6,342 
  Comp+1-3 |     1,544      2,691          0          0 |     4,235 
  Comp+4-5 |        92        150        491          0 |       733 
   Comp+6+ |        40        101         55        823 |     1,019 
-----------+--------------------------------------------+----------
     Total |     8,018      2,942        546        823 |    12,329 


. tab bcs_parented bcs_moed_cat

       BCS |
  Parent's |
   Highest |      BCS Mother's Education Categories
 Education |      Comp   Comp+1-3   Comp+4-5    Comp+6+ |     Total
-----------+--------------------------------------------+----------
      Comp |     6,805          0          0          0 |     6,805 
  Comp+1-3 |     1,371      3,035          0          0 |     4,406 
  Comp+4-5 |       187        232        352          0 |       771 
   Comp+6+ |       152        315        130        453 |     1,050 
-----------+--------------------------------------------+----------
     Total |     8,515      3,582        482        453 |    13,032 


. 
. sort bcsid

. save $path3\temp2.dta, replace
(note: file F:\Data\MYDATA\TEMP\temp2.dta not found)
file F:\Data\MYDATA\TEMP\temp2.dta saved

. 
. *return to jupyter

Here we produce variables indicating what UK country the cohort members live in, at each sweep.

In [39]:
*Country at each sweep
use $path1\ARCHIVE\BCS\S1\bcs1derived.dta, clear
keep BCSID BD1CNTRY
rename BCSID bcsid
tab BD1CNTRY
rename BD1CNTRY bcs0_country

duplicates report bcsid

sort bcsid
save $path3\temp3.dta, replace

use $path1\ARCHIVE\BCS\S2\bcs2derived.dta, clear
keep BCSID BD2CNTRY
rename BCSID bcsid
tab BD2CNTRY
rename BD2CNTRY bcs5_country

duplicates report bcsid

sort bcsid
save $path3\temp4.dta, replace

use $path1\ARCHIVE\BCS\S3\bcs3derived.dta, clear
keep BCSID BD3CNTRY
rename BCSID bcsid
tab BD3CNTRY
rename BD3CNTRY bcs10_country

duplicates report bcsid
*There are some duplicates of BCSID - these have missing info in one version
*Drop these
drop if (bcs10_country==.)
duplicates report bcsid

sort bcsid
save $path3\temp5.dta, replace

*return to jupyter
. *Country at each sweep

. use $path1\ARCHIVE\BCS\S1\bcs1derived.dta, clear

. keep BCSID BD1CNTRY

. rename BCSID bcsid

. tab BD1CNTRY

1970: Country of |
     Interview   |      Freq.     Percent        Cum.
-----------------+-----------------------------------
         England |     14,072       81.83       81.83
           Wales |        879        5.11       86.94
        Scotland |      1,617        9.40       96.35
Northern Ireland |        628        3.65      100.00
-----------------+-----------------------------------
           Total |     17,196      100.00

. rename BD1CNTRY bcs0_country

. 
. duplicates report bcsid

Duplicates in terms of bcsid

--------------------------------------
   copies | observations       surplus
----------+---------------------------
        1 |        17196             0
--------------------------------------

. 
. sort bcsid

. save $path3\temp3.dta, replace
(note: file F:\Data\MYDATA\TEMP\temp3.dta not found)
file F:\Data\MYDATA\TEMP\temp3.dta saved

. 
. use $path1\ARCHIVE\BCS\S2\bcs2derived.dta, clear

. keep BCSID BD2CNTRY

. rename BCSID bcsid

. tab BD2CNTRY

1975: Country of |
     Interview   |      Freq.     Percent        Cum.
-----------------+-----------------------------------
         England |     11,157       84.94       84.94
           Wales |        748        5.69       90.64
        Scotland |      1,166        8.88       99.51
        Overseas |         64        0.49      100.00
-----------------+-----------------------------------
           Total |     13,135      100.00

. rename BD2CNTRY bcs5_country

. 
. duplicates report bcsid

Duplicates in terms of bcsid

--------------------------------------
   copies | observations       surplus
----------+---------------------------
        1 |        13135             0
--------------------------------------

. 
. sort bcsid

. save $path3\temp4.dta, replace
(note: file F:\Data\MYDATA\TEMP\temp4.dta not found)
file F:\Data\MYDATA\TEMP\temp4.dta saved

. 
. use $path1\ARCHIVE\BCS\S3\bcs3derived.dta, clear

. keep BCSID BD3CNTRY

. rename BCSID bcsid

. tab BD3CNTRY

1980: Country of |
     Interview   |      Freq.     Percent        Cum.
-----------------+-----------------------------------
         Unknown |         81        0.54        0.54
         England |     12,514       84.13       84.67
           Wales |        825        5.55       90.22
        Scotland |      1,455        9.78      100.00
-----------------+-----------------------------------
           Total |     14,875      100.00

. rename BD3CNTRY bcs10_country

. 
. duplicates report bcsid

Duplicates in terms of bcsid

--------------------------------------
   copies | observations       surplus
----------+---------------------------
        1 |        18937             0
        2 |          166            83
--------------------------------------

. *There are some duplicates of BCSID - these have missing info in one version

. *Drop these

. drop if (bcs10_country==.)
(4,228 observations deleted)

. duplicates report bcsid

Duplicates in terms of bcsid

--------------------------------------
   copies | observations       surplus
----------+---------------------------
        1 |        14875             0
--------------------------------------

. 
. sort bcsid

. save $path3\temp5.dta, replace
(note: file F:\Data\MYDATA\TEMP\temp5.dta not found)
file F:\Data\MYDATA\TEMP\temp5.dta saved

. 
. *return to jupyter

Here we code more parental occupational information. These are the old RGSC and SEG measures deposited in the BCS datasets. We will not use these variables in our main analyses. These variables are prepared to be potentially used in the production of the weights and in the multiple imputation.

As above, we convert the available Socio-Economic Group information to an approximation of the Goldthorpe Schema using the method outlined in Goldthorpe and Jackson (2007).

Goldthorpe, J. H., & Jackson, M. (2007). Intergenerational class mobility in contemporary Britain: political concerns and empirical findings. The British journal of sociology, 58(4), 525-546. Chicago.

This method builds on an approximation developed by Health and McDonald (1987).

Heath, A., & McDonald, S. K. (1987). Social change and the future of the left. The Political Quarterly, 58(4), 364-377.

In [40]:
*Age 10 parental Social Class
use $path1\ARCHIVE\BCS\S3\sn3723.dta, clear
keep bcsid c3_4 c3_11 back10p back20p c4_1a c4_2a

numlabel, add

*father's RGSC Age 10
tab c3_4
capture drop bcs10_olddadrgsc
    gen bcs10_olddadrgsc = c3_4
    recode bcs10_olddadrgsc (8=.) (9=.)
    label variable bcs10_olddadrgsc "BCS Age 10 Dad RGSC Old Coding"
    label define rgsc 1 "I" 2 "II" 3 "III NM" 4 "III M" 5 "IV" 6 "V"
    label values bcs10_olddadrgsc rgsc
    tab bcs10_olddadrgsc c3_4

*mother's RGSC Age 10
tab c3_11
capture drop bcs10_olddadrgsc
    gen bcs10_olddadrgsc = c3_11
    recode bcs10_olddadrgsc (8=.) (9=.)
    label variable bcs10_olddadrgsc "BCS Age 10 Dad RGSC Old Coding"
    label values bcs10_olddadrgsc rgsc
    tab bcs10_olddadrgsc c3_11

*Father's SEG Age 10
tab back10p
capture drop bcs10_dadseg2egp
    gen bcs10_dadseg2egp = .
    replace bcs10_dadseg2egp = 1 if (back10p==11)|(back10p==12)|(back10p==30)|(back10p==40)
    replace bcs10_dadseg2egp = 2 if (back10p==21)|(back10p==22)|(back10p==51)|(back10p==52)
    replace bcs10_dadseg2egp = 3 if (back10p==60)|(back10p==70)
    replace bcs10_dadseg2egp = 4 if (back10p==120)|(back10p==130)|(back10p==140)
    replace bcs10_dadseg2egp = 5 if (back10p==80)
    replace bcs10_dadseg2egp = 6 if (back10p==90)
    replace bcs10_dadseg2egp = 7 if (back10p==100)|(back10p==110)|(back10p==150)
    replace bcs10_dadseg2egp = . if (back10p==-9)
    label define egp 1 "I" 2 "II+IVa" 3 "III" 4 "IVb+c" 5 "V" 6 "VI" 7 "VII"
    label values bcs10_dadseg2egp egp
    label variable bcs10_dadseg2egp "BCS Age 10 Dad's EGP from SEG"
    tab back10p bcs10_dadseg2egp

*Mother's SEG Age 10
tab back20p
capture drop bcs10_mumseg2egp
    gen bcs10_mumseg2egp = .
    replace bcs10_mumseg2egp = 1 if (back20p==11)|(back20p==12)|(back20p==30)|(back20p==40)
    replace bcs10_mumseg2egp = 2 if (back20p==21)|(back20p==22)|(back20p==51)|(back20p==52)
    replace bcs10_mumseg2egp = 3 if (back20p==60)|(back20p==70)
    replace bcs10_mumseg2egp = 4 if (back20p==120)|(back20p==130)|(back20p==140)
    replace bcs10_mumseg2egp = 5 if (back20p==80)
    replace bcs10_mumseg2egp = 6 if (back20p==90)
    replace bcs10_mumseg2egp = 7 if (back20p==100)|(back20p==110)|(back20p==150)
    replace bcs10_mumseg2egp = . if (back20p==-9)
    label values bcs10_mumseg2egp egp
    label variable bcs10_mumseg2egp "BCS Age 10 Mum's EGP from SEG"
    tab back20p bcs10_mumseg2egp

drop c3_4 c3_11 back10p back20p c4_1a c4_2a

sort bcsid

save $path3\temp6.dta, replace

*return to jupyter
. *Age 10 parental Social Class

. use $path1\ARCHIVE\BCS\S3\sn3723.dta, clear

. keep bcsid c3_4 c3_11 back10p back20p c4_1a c4_2a

. 
. numlabel, add

. 
. *father's RGSC Age 10

. tab c3_4

  FATHER'S CORRECTED |
   SOCIAL CLASS 1980 |      Freq.     Percent        Cum.
---------------------+-----------------------------------
                1. I |        767        5.53        5.53
               2. II |      2,922       21.07       26.60
            3. IIINM |      1,123        8.10       34.70
             4. IIIM |      5,418       39.07       73.77
               5. IV |      1,513       10.91       84.68
                6. V |        489        3.53       88.20
8. Insufficient data |        267        1.93       90.13
          9. No data |      1,369        9.87      100.00
---------------------+-----------------------------------
               Total |     13,868      100.00

. capture drop bcs10_olddadrgsc

.     gen bcs10_olddadrgsc = c3_4
(1,002 missing values generated)

.     recode bcs10_olddadrgsc (8=.) (9=.)
(bcs10_olddadrgsc: 1636 changes made)

.     label variable bcs10_olddadrgsc "BCS Age 10 Dad RGSC Old Coding"

.     label define rgsc 1 "I" 2 "II" 3 "III NM" 4 "III M" 5 "IV" 6 "V"

.     label values bcs10_olddadrgsc rgsc

.     tab bcs10_olddadrgsc c3_4

BCS Age 10 |
  Dad RGSC |               FATHER'S CORRECTED SOCIAL CLASS 1980
Old Coding |      1. I      2. II   3. IIINM    4. IIIM      5. IV       6. V |     Total
-----------+------------------------------------------------------------------+----------
         I |       767          0          0          0          0          0 |       767 
        II |         0      2,922          0          0          0          0 |     2,922 
    III NM |         0          0      1,123          0          0          0 |     1,123 
     III M |         0          0          0      5,418          0          0 |     5,418 
        IV |         0          0          0          0      1,513          0 |     1,513 
         V |         0          0          0          0          0        489 |       489 
-----------+------------------------------------------------------------------+----------
     Total |       767      2,922      1,123      5,418      1,513        489 |    12,232 


. 
. *mother's RGSC Age 10

. tab c3_11

  MOTHER'S CORRECTED |
   SOCIAL CLASS 1980 |      Freq.     Percent        Cum.
---------------------+-----------------------------------
                1. I |         55        0.40        0.40
               2. II |      1,714       12.36       12.76
            3. IIINM |      3,192       23.02       35.77
             4. IIIM |        908        6.55       42.32
               5. IV |      2,759       19.89       62.22
                6. V |        908        6.55       68.76
8. Insufficient data |        165        1.19       69.95
          9. No data |      4,167       30.05      100.00
---------------------+-----------------------------------
               Total |     13,868      100.00

. capture drop bcs10_olddadrgsc

.     gen bcs10_olddadrgsc = c3_11
(1,002 missing values generated)

.     recode bcs10_olddadrgsc (8=.) (9=.)
(bcs10_olddadrgsc: 4332 changes made)

.     label variable bcs10_olddadrgsc "BCS Age 10 Dad RGSC Old Coding"

.     label values bcs10_olddadrgsc rgsc

.     tab bcs10_olddadrgsc c3_11

BCS Age 10 |
  Dad RGSC |               MOTHER'S CORRECTED SOCIAL CLASS 1980
Old Coding |      1. I      2. II   3. IIINM    4. IIIM      5. IV       6. V |     Total
-----------+------------------------------------------------------------------+----------
         I |        55          0          0          0          0          0 |        55 
        II |         0      1,714          0          0          0          0 |     1,714 
    III NM |         0          0      3,192          0          0          0 |     3,192 
     III M |         0          0          0        908          0          0 |       908 
        IV |         0          0          0          0      2,759          0 |     2,759 
         V |         0          0          0          0          0        908 |       908 
-----------+------------------------------------------------------------------+----------
     Total |        55      1,714      3,192        908      2,759        908 |     9,536 


. 
. *Father's SEG Age 10

. tab back10p

   FATHER'S CORRECTED |
SOCIAL VARS SEG 1980  |      Freq.     Percent        Cum.
----------------------+-----------------------------------
-9. No code available |      2,755       18.53       18.53
                   11 |         58        0.39       18.92
                   12 |        853        5.74       24.65
                   21 |        443        2.98       27.63
                   22 |        864        5.81       33.44
                   30 |        164        1.10       34.55
                   40 |        583        3.92       38.47
                   51 |        614        4.13       42.60
                   52 |        273        1.84       44.43
                   60 |        616        4.14       48.57
                   70 |         61        0.41       48.98
                   80 |      1,194        8.03       57.01
                   90 |      3,216       21.63       78.64
                  100 |      1,226        8.24       86.89
                  110 |        443        2.98       89.87
                  120 |      1,014        6.82       96.68
                  130 |         84        0.56       97.25
                  140 |        129        0.87       98.12
                  150 |        119        0.80       98.92
                  160 |        161        1.08      100.00
----------------------+-----------------------------------
                Total |     14,870      100.00

. capture drop bcs10_dadseg2egp

.     gen bcs10_dadseg2egp = .
(14,870 missing values generated)

.     replace bcs10_dadseg2egp = 1 if (back10p==11)|(back10p==12)|(back10p==30)|(back10p==40)
(1,658 real changes made)

.     replace bcs10_dadseg2egp = 2 if (back10p==21)|(back10p==22)|(back10p==51)|(back10p==52)
(2,194 real changes made)

.     replace bcs10_dadseg2egp = 3 if (back10p==60)|(back10p==70)
(677 real changes made)

.     replace bcs10_dadseg2egp = 4 if (back10p==120)|(back10p==130)|(back10p==140)
(1,227 real changes made)

.     replace bcs10_dadseg2egp = 5 if (back10p==80)
(1,194 real changes made)

.     replace bcs10_dadseg2egp = 6 if (back10p==90)
(3,216 real changes made)

.     replace bcs10_dadseg2egp = 7 if (back10p==100)|(back10p==110)|(back10p==150)
(1,788 real changes made)

.     replace bcs10_dadseg2egp = . if (back10p==-9)
(0 real changes made)

.     label define egp 1 "I" 2 "II+IVa" 3 "III" 4 "IVb+c" 5 "V" 6 "VI" 7 "VII"

.     label values bcs10_dadseg2egp egp

.     label variable bcs10_dadseg2egp "BCS Age 10 Dad's EGP from SEG"

.     tab back10p bcs10_dadseg2egp

   FATHER'S CORRECTED |                        BCS Age 10 Dad's EGP from SEG
SOCIAL VARS SEG 1980  |         I     II+IVa        III      IVb+c          V         VI        VII |     Total
----------------------+-----------------------------------------------------------------------------+----------
                   11 |        58          0          0          0          0          0          0 |        58 
                   12 |       853          0          0          0          0          0          0 |       853 
                   21 |         0        443          0          0          0          0          0 |       443 
                   22 |         0        864          0          0          0          0          0 |       864 
                   30 |       164          0          0          0          0          0          0 |       164 
                   40 |       583          0          0          0          0          0          0 |       583 
                   51 |         0        614          0          0          0          0          0 |       614 
                   52 |         0        273          0          0          0          0          0 |       273 
                   60 |         0          0        616          0          0          0          0 |       616 
                   70 |         0          0         61          0          0          0          0 |        61 
                   80 |         0          0          0          0      1,194          0          0 |     1,194 
                   90 |         0          0          0          0          0      3,216          0 |     3,216 
                  100 |         0          0          0          0          0          0      1,226 |     1,226 
                  110 |         0          0          0          0          0          0        443 |       443 
                  120 |         0          0          0      1,014          0          0          0 |     1,014 
                  130 |         0          0          0         84          0          0          0 |        84 
                  140 |         0          0          0        129          0          0          0 |       129 
                  150 |         0          0          0          0          0          0        119 |       119 
----------------------+-----------------------------------------------------------------------------+----------
                Total |     1,658      2,194        677      1,227      1,194      3,216      1,788 |    11,954 


. 
. *Mother's SEG Age 10

. tab back20p

   MOTHER'S CORRECTED |
SOCIAL VARS SEG 1980  |      Freq.     Percent        Cum.
----------------------+-----------------------------------
-9. No code available |      5,359       36.04       36.04
                   11 |         26        0.17       36.21
                   12 |         51        0.34       36.56
                   21 |        152        1.02       37.58
                   22 |        242        1.63       39.21
                   30 |         16        0.11       39.31
                   40 |         39        0.26       39.58
                   51 |      1,155        7.77       47.34
                   52 |        216        1.45       48.80
                   60 |      2,842       19.11       67.91
                   70 |      1,725       11.60       79.51
                   80 |        115        0.77       80.28
                   90 |        370        2.49       82.77
                  100 |      1,139        7.66       90.43
                  110 |        901        6.06       96.49
                  120 |        356        2.39       98.88
                  130 |         10        0.07       98.95
                  140 |         21        0.14       99.09
                  150 |        133        0.89       99.99
                  160 |          2        0.01      100.00
----------------------+-----------------------------------
                Total |     14,870      100.00

. capture drop bcs10_mumseg2egp

.     gen bcs10_mumseg2egp = .
(14,870 missing values generated)

.     replace bcs10_mumseg2egp = 1 if (back20p==11)|(back20p==12)|(back20p==30)|(back20p==40)
(132 real changes made)

.     replace bcs10_mumseg2egp = 2 if (back20p==21)|(back20p==22)|(back20p==51)|(back20p==52)
(1,765 real changes made)

.     replace bcs10_mumseg2egp = 3 if (back20p==60)|(back20p==70)
(4,567 real changes made)

.     replace bcs10_mumseg2egp = 4 if (back20p==120)|(back20p==130)|(back20p==140)
(387 real changes made)

.     replace bcs10_mumseg2egp = 5 if (back20p==80)
(115 real changes made)

.     replace bcs10_mumseg2egp = 6 if (back20p==90)
(370 real changes made)

.     replace bcs10_mumseg2egp = 7 if (back20p==100)|(back20p==110)|(back20p==150)
(2,173 real changes made)

.     replace bcs10_mumseg2egp = . if (back20p==-9)
(0 real changes made)

.     label values bcs10_mumseg2egp egp

.     label variable bcs10_mumseg2egp "BCS Age 10 Mum's EGP from SEG"

.     tab back20p bcs10_mumseg2egp

   MOTHER'S CORRECTED |                        BCS Age 10 Mum's EGP from SEG
SOCIAL VARS SEG 1980  |         I     II+IVa        III      IVb+c          V         VI        VII |     Total
----------------------+-----------------------------------------------------------------------------+----------
                   11 |        26          0          0          0          0          0          0 |        26 
                   12 |        51          0          0          0          0          0          0 |        51 
                   21 |         0        152          0          0          0          0          0 |       152 
                   22 |         0        242          0          0          0          0          0 |       242 
                   30 |        16          0          0          0          0          0          0 |        16 
                   40 |        39          0          0          0          0          0          0 |        39 
                   51 |         0      1,155          0          0          0          0          0 |     1,155 
                   52 |         0        216          0          0          0          0          0 |       216 
                   60 |         0          0      2,842          0          0          0          0 |     2,842 
                   70 |         0          0      1,725          0          0          0          0 |     1,725 
                   80 |         0          0          0          0        115          0          0 |       115 
                   90 |         0          0          0          0          0        370          0 |       370 
                  100 |         0          0          0          0          0          0      1,139 |     1,139 
                  110 |         0          0          0          0          0          0        901 |       901 
                  120 |         0          0          0        356          0          0          0 |       356 
                  130 |         0          0          0         10          0          0          0 |        10 
                  140 |         0          0          0         21          0          0          0 |        21 
                  150 |         0          0          0          0          0          0        133 |       133 
----------------------+-----------------------------------------------------------------------------+----------
                Total |       132      1,765      4,567        387        115        370      2,173 |     9,509 


. 
. drop c3_4 c3_11 back10p back20p c4_1a c4_2a

. 
. sort bcsid

. 
. save $path3\temp6.dta, replace
(note: file F:\Data\MYDATA\TEMP\temp6.dta not found)
file F:\Data\MYDATA\TEMP\temp6.dta saved

. 
. *return to jupyter

Here we code more parental occupational information from the age 16 survey. These variables will not be used in the main analysis but they may potentially be used in producing the weights and in the multiple imputation.

In [41]:
*Age 16 Parental Social Class

use $path1\ARCHIVE\BCS\S4\bcs7016x.dta, clear


*Father's SEG Age 16
tab t11_2 
capture drop bcs16_olddadrgsc
    gen bcs16_olddadrgsc = t11_2 
    recode bcs16_olddadrgsc (-2=.) (-1=.) (7=.) (8=.)
    label variable bcs16_olddadrgsc "BCS Age 16 Dad RGSC Old Coding"
    label define rgsc 1 "I" 2 "II" 3 "III NM" 4 "III M" 5 "IV" 6 "V"
    label values bcs16_olddadrgsc rgsc
    tab bcs16_olddadrgsc t11_2 , mi

*Mother's RGSC Age 16
tab t11_9
capture drop bcs16_oldmumrgsc
    gen bcs16_oldmumrgsc = t11_9 
    recode bcs16_oldmumrgsc (-2=.) (-1=.) (7=.) (8=.)
    label variable bcs16_oldmumrgsc "BCS Age 16 Mum RGSC Old Coding"
    label values bcs16_oldmumrgsc rgsc
    tab bcs16_oldmumrgsc t11_9 , mi

keep bcsid bcs16_olddadrgsc bcs16_oldmumrgsc

save $path3\temp7.dta, replace

*return to jupyter
. *Age 16 Parental Social Class

. 
. use $path1\ARCHIVE\BCS\S4\bcs7016x.dta, clear

. 
. 
. *Father's SEG Age 16

. tab t11_2 

 Father's social |
           class |      Freq.     Percent        Cum.
-----------------+-----------------------------------
      Not stated |        601        5.17        5.17
No questionnaire |      4,279       36.84       42.01
               I |        511        4.40       46.41
              II |      1,888       16.25       62.67
  III non-manual |        653        5.62       68.29
      III manual |      2,567       22.10       90.39
              IV |        594        5.11       95.51
               V |        153        1.32       96.82
         Student |        133        1.15       97.97
            Dead |        236        2.03      100.00
-----------------+-----------------------------------
           Total |     11,615      100.00

. capture drop bcs16_olddadrgsc

.     gen bcs16_olddadrgsc = t11_2 

.     recode bcs16_olddadrgsc (-2=.) (-1=.) (7=.) (8=.)
(bcs16_olddadrgsc: 5249 changes made)

.     label variable bcs16_olddadrgsc "BCS Age 16 Dad RGSC Old Coding"

.     label define rgsc 1 "I" 2 "II" 3 "III NM" 4 "III M" 5 "IV" 6 "V"

.     label values bcs16_olddadrgsc rgsc

.     tab bcs16_olddadrgsc t11_2 , mi

BCS Age 16 |
  Dad RGSC |                                             Father's social class
Old Coding | Not state  No questi          I         II  III non-m  III manua         IV          V    Student       Dead |     Total
-----------+--------------------------------------------------------------------------------------------------------------+----------
         I |         0          0        511          0          0          0          0          0          0          0 |       511 
        II |         0          0          0      1,888          0          0          0          0          0          0 |     1,888 
    III NM |         0          0          0          0        653          0          0          0          0          0 |       653 
     III M |         0          0          0          0          0      2,567          0          0          0          0 |     2,567 
        IV |         0          0          0          0          0          0        594          0          0          0 |       594 
         V |         0          0          0          0          0          0          0        153          0          0 |       153 
         . |       601      4,279          0          0          0          0          0          0        133        236 |     5,249 
-----------+--------------------------------------------------------------------------------------------------------------+----------
     Total |       601      4,279        511      1,888        653      2,567        594        153        133        236 |    11,615 


. 
. *Mother's RGSC Age 16

. tab t11_9

 Mother's social |
           class |      Freq.     Percent        Cum.
-----------------+-----------------------------------
      Not stated |        424        3.65        3.65
No questionnaire |      4,279       36.84       40.49
               I |         49        0.42       40.91
              II |      1,144        9.85       50.76
  III non-manual |      2,058       17.72       68.48
      III manual |        399        3.44       71.92
              IV |      1,095        9.43       81.34
               V |        433        3.73       85.07
         Student |      1,681       14.47       99.54
            Dead |         53        0.46      100.00
-----------------+-----------------------------------
           Total |     11,615      100.00

. capture drop bcs16_oldmumrgsc

.     gen bcs16_oldmumrgsc = t11_9 

.     recode bcs16_oldmumrgsc (-2=.) (-1=.) (7=.) (8=.)
(bcs16_oldmumrgsc: 6437 changes made)

.     label variable bcs16_oldmumrgsc "BCS Age 16 Mum RGSC Old Coding"

.     label values bcs16_oldmumrgsc rgsc

.     tab bcs16_oldmumrgsc t11_9 , mi

BCS Age 16 |
  Mum RGSC |                                             Mother's social class
Old Coding | Not state  No questi          I         II  III non-m  III manua         IV          V    Student       Dead |     Total
-----------+--------------------------------------------------------------------------------------------------------------+----------
         I |         0          0         49          0          0          0          0          0          0          0 |        49 
        II |         0          0          0      1,144          0          0          0          0          0          0 |     1,144 
    III NM |         0          0          0          0      2,058          0          0          0          0          0 |     2,058 
     III M |         0          0          0          0          0        399          0          0          0          0 |       399 
        IV |         0          0          0          0          0          0      1,095          0          0          0 |     1,095 
         V |         0          0          0          0          0          0          0        433          0          0 |       433 
         . |       424      4,279          0          0          0          0          0          0      1,681         53 |     6,437 
-----------+--------------------------------------------------------------------------------------------------------------+----------
     Total |       424      4,279         49      1,144      2,058        399      1,095        433      1,681         53 |    11,615 


. 
. keep bcsid bcs16_olddadrgsc bcs16_oldmumrgsc

. 
. save $path3\temp7.dta, replace
(note: file F:\Data\MYDATA\TEMP\temp7.dta not found)
file F:\Data\MYDATA\TEMP\temp7.dta saved

. 
. *return to jupyter

Here we prepare the cognitive ability test measure.

We would like to use the variable score20 (general ability test score) which has been used in previous social stratification studies such as Breen and Goldthorpe (2001).

Breen, Richard, and John H. Goldthorpe. Class, mobility and merit the experience of two British birth cohorts. European sociological review 17.2 (2001): 81-101.

This variable is no longer deposited in the BCS datasets. However, SPSS code to produce this variable is provided in the BCS documentation here. The procedure involves computing total scores for each individual test, then computing an overall score.

A data note on the cognitive test scores in the BCS is available here.

In [42]:
clear

*return to jupyter
. clear

. 
. *return to jupyter

In [43]:
use $path1\ARCHIVE\BCS\S3\sn3723.dta, clear

*return to jupyter
. use $path1\ARCHIVE\BCS\S3\sn3723.dta, clear

. 
. *return to jupyter

In [44]:
*BAS Word Definitions Sub-Test

* In the test items 
* -6 means no questionnaire
* -3 means not stated
* 9 means no response
* 1 means acceptable response (i.e. correct)
* 2 means unacceptable response (i.e. not correct)

quietly mvdecode i3504-i3540, mv(-6=. \ -3=. \ 9=.)

*return to jupyter
. *BAS Word Definitions Sub-Test

. 
. * In the test items 

. * -6 means no questionnaire

. * -3 means not stated

. * 9 means no response

. * 1 means acceptable response (i.e. correct)

. * 2 means unacceptable response (i.e. not correct)

. 
. quietly mvdecode i3504-i3540, mv(-6=. \ -3=. \ 9=.)

. 
. *return to jupyter

In [45]:
*Here we identify cohort members who have no responses
* to any of the test items, which indicates that they
* did not take the subtest.

tab i3504

*No correct or incorrect responses
capture drop miss
    egen miss = rmiss(i3504-i3540)
    tab miss

*This variable identified those who did not complete 
* this element of the test.
capture drop bcs10_baswd_notest
    gen bcs10_baswd_notest = 0
    replace bcs10_baswd_notest = 1 if (miss==37)
    tab bcs10_baswd_notest
    label variable bcs10_baswd_notest "BCS10 No Test for BAS Word Defin"
    label values bcs10_baswd_notest yesno
    drop miss

*In these test items 2 means unacceptable response
* (i.e. wrong answer). We recode this to 0.
* Now the test items are coded 1 (correct) 0 (incorrect)
recode i3504-i3540 (2=0)

*We create a new variable which indicates the number of 
* correct answers in this subtest
capture drop bcs10_worddefin
    egen bcs10_worddefin = rowtotal(i3504-i3540) if (bcs10_baswd_notest==0)

*return to jupyter
. *Here we identify cohort members who have no responses

. * to any of the test items, which indicates that they

. * did not take the subtest.

. 
. tab i3504

             BAS-WORD |
  DEFINITIONS-SPORT   |      Freq.     Percent        Cum.
----------------------+-----------------------------------
  Acceptable response |      6,880       59.98       59.98
Unacceptable response |      4,591       40.02      100.00
----------------------+-----------------------------------
                Total |     11,471      100.00

. 
. *No correct or incorrect responses

. capture drop miss

.     egen miss = rmiss(i3504-i3540)

.     tab miss

       miss |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |         16        0.11        0.11
          1 |         11        0.07        0.18
          2 |         24        0.16        0.34
          3 |         16        0.11        0.45
          4 |         26        0.17        0.63
          5 |         47        0.32        0.94
          6 |         71        0.48        1.42
          7 |        104        0.70        2.12
          8 |        125        0.84        2.96
          9 |        162        1.09        4.05
         10 |        196        1.32        5.37
         11 |        233        1.57        6.93
         12 |        317        2.13        9.07
         13 |        382        2.57       11.63
         14 |        510        3.43       15.06
         15 |        552        3.71       18.78
         16 |        756        5.08       23.86
         17 |        852        5.73       29.59
         18 |        914        6.15       35.74
         19 |        855        5.75       41.49
         20 |        905        6.09       47.57
         21 |        811        5.45       53.03
         22 |        735        4.94       57.97
         23 |        750        5.04       63.01
         24 |        632        4.25       67.26
         25 |        506        3.40       70.67
         26 |        409        2.75       73.42
         27 |        254        1.71       75.12
         28 |        122        0.82       75.94
         29 |         77        0.52       76.46
         30 |         44        0.30       76.76
         31 |         46        0.31       77.07
         32 |         23        0.15       77.22
         33 |         15        0.10       77.32
         34 |         12        0.08       77.40
         35 |          9        0.06       77.46
         36 |          6        0.04       77.51
         37 |      3,345       22.49      100.00
------------+-----------------------------------
      Total |     14,870      100.00

. 
. *This variable identified those who did not complete 

. * this element of the test.

. capture drop bcs10_baswd_notest

.     gen bcs10_baswd_notest = 0

.     replace bcs10_baswd_notest = 1 if (miss==37)
(3,345 real changes made)

.     tab bcs10_baswd_notest

bcs10_baswd |
    _notest |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |     11,525       77.51       77.51
          1 |      3,345       22.49      100.00
------------+-----------------------------------
      Total |     14,870      100.00

.     label variable bcs10_baswd_notest "BCS10 No Test for BAS Word Defin"

.     label values bcs10_baswd_notest yesno

.     drop miss

. 
. *In these test items 2 means unacceptable response

. * (i.e. wrong answer). We recode this to 0.

. * Now the test items are coded 1 (correct) 0 (incorrect)

. recode i3504-i3540 (2=0)
(i3504: 4591 changes made)
(i3505: 1925 changes made)
(i3506: 2536 changes made)
(i3507: 5147 changes made)
(i3508: 2975 changes made)
(i3509: 3907 changes made)
(i3510: 3719 changes made)
(i3511: 3169 changes made)
(i3512: 3808 changes made)
(i3513: 8212 changes made)
(i3514: 4225 changes made)
(i3515: 4227 changes made)
(i3516: 4421 changes made)
(i3517: 4487 changes made)
(i3518: 3498 changes made)
(i3519: 2321 changes made)
(i3520: 6467 changes made)
(i3521: 2340 changes made)
(i3522: 3403 changes made)
(i3523: 2907 changes made)
(i3524: 1773 changes made)
(i3525: 1255 changes made)
(i3526: 4041 changes made)
(i3527: 1324 changes made)
(i3528: 413 changes made)
(i3529: 1616 changes made)
(i3530: 1350 changes made)
(i3531: 493 changes made)
(i3532: 367 changes made)
(i3533: 769 changes made)
(i3534: 525 changes made)
(i3535: 313 changes made)
(i3536: 232 changes made)
(i3537: 185 changes made)
(i3538: 137 changes made)
(i3539: 89 changes made)
(i3540: 69 changes made)

. 
. *We create a new variable which indicates the number of 

. * correct answers in this subtest

. capture drop bcs10_worddefin

.     egen bcs10_worddefin = rowtotal(i3504-i3540) if (bcs10_baswd_notest==0)
(3345 missing values generated)

. 
. *return to jupyter

In [46]:
*standardise to mean 0 sd 1
    capture drop sbcs10_worddefin
    egen sbcs10_worddefin = std(bcs10_worddefin)

    summ sbcs10_worddefin

*standardise to mean 100 sd 15
    capture drop bcs10_stdworddefin
    gen bcs10_stdworddefin = (sbcs10_worddefin*15)+100

    summ bcs10_stdworddefin   

*return to jupyter
. *standardise to mean 0 sd 1

.     capture drop sbcs10_worddefin

.     egen sbcs10_worddefin = std(bcs10_worddefin)
(3345 missing values generated)

. 
.     summ sbcs10_worddefin

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
sbcs10_wor~n |     11,525    4.93e-09           1  -2.023293   4.369977

. 
. *standardise to mean 100 sd 15

.     capture drop bcs10_stdworddefin

.     gen bcs10_stdworddefin = (sbcs10_worddefin*15)+100
(3,345 missing values generated)

. 
.     summ bcs10_stdworddefin   

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
bcs10_stdw~n |     11,525         100          15    69.6506   165.5497

. 
. *return to jupyter

In [47]:
*BAS Recall of Digits Sub-test

quietly mvdecode i3541-i3574, mv(-6=. \ -3=. \ 9=.)

*We identify the cases with no responses to any of the
* test items
capture drop miss
    egen miss = rmiss(i3541-i3574)

*Create a variable to indicate whether cohort member
* took the test
capture drop bcs10_basrd_notest
    gen bcs10_basrd_notest = 0
    replace bcs10_basrd_notest = 1 if (miss==34)
    label variable bcs10_basrd_notest "BCS10 No Test for BAS Recall Digits"
    label values bcs10_basrd_notest yesno
    drop miss

*Recode the items to indicate (1) correct response
* (0) incorrect response.
recode i3541-i3574 (2=0) 

*Create a variable that indicates the number of correct 
* responses
capture drop bcs10_digits
    egen bcs10_digits = rowtotal(i3541-i3574) if (bcs10_basrd_notest==0)

*standardise to mean 0 sd 1
    capture drop sbcs10_digits
    egen sbcs10_digits = std(bcs10_digits)
    summ sbcs10_digits

*standardise to mean 100 sd 15
capture drop bcs10_stddigits
    gen bcs10_stddigits = (sbcs10_digits*15)+100
    summ bcs10_stddigits
    
*return to jupyter
. *BAS Recall of Digits Sub-test

. 
. quietly mvdecode i3541-i3574, mv(-6=. \ -3=. \ 9=.)

. 
. *We identify the cases with no responses to any of the

. * test items

. capture drop miss

.     egen miss = rmiss(i3541-i3574)

. 
. *Create a variable to indicate whether cohort member

. * took the test

. capture drop bcs10_basrd_notest

.     gen bcs10_basrd_notest = 0

.     replace bcs10_basrd_notest = 1 if (miss==34)
(3,358 real changes made)

.     label variable bcs10_basrd_notest "BCS10 No Test for BAS Recall Digits"

.     label values bcs10_basrd_notest yesno

.     drop miss

. 
. *Recode the items to indicate (1) correct response

. * (0) incorrect response.

. recode i3541-i3574 (2=0) 
(i3541: 3 changes made)
(i3542: 7 changes made)
(i3543: 7 changes made)
(i3544: 4 changes made)
(i3545: 5 changes made)
(i3546: 15 changes made)
(i3547: 27 changes made)
(i3548: 21 changes made)
(i3549: 25 changes made)
(i3550: 41 changes made)
(i3551: 215 changes made)
(i3552: 445 changes made)
(i3553: 360 changes made)
(i3554: 619 changes made)
(i3555: 1069 changes made)
(i3556: 2353 changes made)
(i3557: 1919 changes made)
(i3558: 4473 changes made)
(i3559: 1825 changes made)
(i3560: 1749 changes made)
(i3561: 6262 changes made)
(i3562: 3585 changes made)
(i3563: 4319 changes made)
(i3564: 4586 changes made)
(i3565: 6070 changes made)
(i3566: 7580 changes made)
(i3567: 6062 changes made)
(i3568: 6464 changes made)
(i3569: 4480 changes made)
(i3570: 2931 changes made)
(i3571: 4254 changes made)
(i3572: 4181 changes made)
(i3573: 3698 changes made)
(i3574: 2885 changes made)

. 
. *Create a variable that indicates the number of correct 

. * responses

. capture drop bcs10_digits

.     egen bcs10_digits = rowtotal(i3541-i3574) if (bcs10_basrd_notest==0)
(3358 missing values generated)

. 
. *standardise to mean 0 sd 1

.     capture drop sbcs10_digits

.     egen sbcs10_digits = std(bcs10_digits)
(3358 missing values generated)

.     summ sbcs10_digits

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
sbcs10_dig~s |     11,512    2.71e-09           1  -5.002222   2.712919

. 
. *standardise to mean 100 sd 15

. capture drop bcs10_stddigits

.     gen bcs10_stddigits = (sbcs10_digits*15)+100
(3,358 missing values generated)

.     summ bcs10_stddigits

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
bcs10_stdd~s |     11,512         100          15   24.96668   140.6938

.     
. *return to jupyter

In [48]:
*BAS Matrices Sub-test

*Identify missing values
quietly mvdecode i3617-i3644, mv(-6=. \ -3=. \ 9=.)

*Identify the cases with no response to the test items
capture drop miss
    egen miss = rmiss(i3617-i3644)

*Create a new variable that indicates that no test items
* were responded do
capture drop bcs10_basmat_notest
    gen bcs10_basmat_notest = 0
    replace bcs10_basmat_notest = 1 if (miss==28)
    label variable bcs10_basmat_notest "BCS10 No Test for BAS Matrices"
    label values bcs10_basmat_notest yesno
    drop miss

*Recode incorrect responses to 0
recode i3617-i3644 (2=0) 

*Create a variable that indicates the total number of correct
* responses to this subtest
capture drop bcs10_mat
    egen bcs10_mat = rowtotal(i3617-i3644) if (bcs10_basmat_notest==0)


*standardise to mean 0 sd 1
    capture drop sbcs10_mat
    egen sbcs10_mat = std(bcs10_mat)
    summ sbcs10_mat

*standardise to mean 100 sd 15
capture drop bcs10_stdmat
    gen bcs10_stdmat = (sbcs10_mat*15)+100
    summ bcs10_stdmat
    
*return to jupyter
. *BAS Matrices Sub-test

. 
. *Identify missing values

. quietly mvdecode i3617-i3644, mv(-6=. \ -3=. \ 9=.)

. 
. *Identify the cases with no response to the test items

. capture drop miss

.     egen miss = rmiss(i3617-i3644)

. 
. *Create a new variable that indicates that no test items

. * were responded do

. capture drop bcs10_basmat_notest

.     gen bcs10_basmat_notest = 0

.     replace bcs10_basmat_notest = 1 if (miss==28)
(3,374 real changes made)

.     label variable bcs10_basmat_notest "BCS10 No Test for BAS Matrices"

.     label values bcs10_basmat_notest yesno

.     drop miss

. 
. *Recode incorrect responses to 0

. recode i3617-i3644 (2=0) 
(i3617: 57 changes made)
(i3618: 45 changes made)
(i3619: 582 changes made)
(i3620: 794 changes made)
(i3621: 574 changes made)
(i3622: 1156 changes made)
(i3623: 2613 changes made)
(i3624: 1478 changes made)
(i3625: 2797 changes made)
(i3626: 2611 changes made)
(i3627: 4705 changes made)
(i3628: 4415 changes made)
(i3629: 4311 changes made)
(i3630: 6446 changes made)
(i3631: 4149 changes made)
(i3632: 4403 changes made)
(i3633: 4960 changes made)
(i3634: 6104 changes made)
(i3635: 5395 changes made)
(i3636: 5707 changes made)
(i3637: 5172 changes made)
(i3638: 5646 changes made)
(i3639: 5047 changes made)
(i3640: 6641 changes made)
(i3641: 6048 changes made)
(i3642: 5089 changes made)
(i3643: 7176 changes made)
(i3644: 6567 changes made)

. 
. *Create a variable that indicates the total number of correct

. * responses to this subtest

. capture drop bcs10_mat

.     egen bcs10_mat = rowtotal(i3617-i3644) if (bcs10_basmat_notest==0)
(3374 missing values generated)

. 
. 
. *standardise to mean 0 sd 1

.     capture drop sbcs10_mat

.     egen sbcs10_mat = std(bcs10_mat)
(3374 missing values generated)

.     summ sbcs10_mat

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
  sbcs10_mat |     11,496   -8.16e-09           1  -2.842147   2.344447

. 
. *standardise to mean 100 sd 15

. capture drop bcs10_stdmat

.     gen bcs10_stdmat = (sbcs10_mat*15)+100
(3,374 missing values generated)

.     summ bcs10_stdmat

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
bcs10_stdmat |     11,496         100          15    57.3678   135.1667

.     
. *return to jupyter

In [49]:
*BAS Verbal Similarities Sub-test

*Identify the missing values
quietly mvdecode i3575 i3577 i3579 i3581 i3583 i3585 i3587 i3589 i3591 i3593 i3595 i3597 i3599 i3601 i3603 i3605 i3607 i3609 i3611 i3613 i3615 i3576 i3578 i3580 i3582 i3584 i3586 i3588 i3590 i3592 i3594 i3596 i3598 i3600 i3602 i3604 i3606 i3608 i3610 i3612 i3614 i3616, mv(-6=. \ -3=. \ 9=.)

*This variable indicates the number of items with missing values
capture drop miss
egen miss = rmiss(i3575 i3577 i3579 i3581 i3583 i3585 i3587 i3589 i3591 i3593 i3595 i3597 i3599 i3601 i3603 i3605 i3607 i3609 i3611 i3613 i3615 i3576 i3578 i3580 i3582 i3584 i3586 i3588 i3590 i3592 i3594 i3596 i3598 i3600 i3602 i3604 i3606 i3608 i3610 i3612 i3614 i3616)

* return to jupyter
. *BAS Verbal Similarities Sub-test

. 
. *Identify the missing values

. quietly mvdecode i3575 i3577 i3579 i3581 i3583 i3585 i3587 i3589 i3591 i3593 i3595 i3597 i3599 i3601 i3603 i3605 i3607 i3609 i3611 i3613 i
> 3615 i3576 i3578 i3580 i3582 i3584 i3586 i3588 i3590 i3592 i3594 i3596 i3598 i3600 i3602 i3604 i3606 i3608 i3610 i3612 i3614 i3616, mv(-6=
> . \ -3=. \ 9=.)

. 
. *This variable indicates the number of items with missing values

. capture drop miss

. egen miss = rmiss(i3575 i3577 i3579 i3581 i3583 i3585 i3587 i3589 i3591 i3593 i3595 i3597 i3599 i3601 i3603 i3605 i3607 i3609 i3611 i3613 
> i3615 i3576 i3578 i3580 i3582 i3584 i3586 i3588 i3590 i3592 i3594 i3596 i3598 i3600 i3602 i3604 i3606 i3608 i3610 i3612 i3614 i3616)

. 
. * return to jupyter

In [50]:
*To get a point on this test the child has to successfully state what the
* items have in common and also successfully provide a further congruent
* example.

*These variables indicate whether the child got both elements correct for
* each item pair in the test
capture drop score1
    gen score1 = .
    replace score1 = 1 if (i3575==1)&(i3576==1)

capture drop score2
    gen score2 = .
    replace score2 = 1 if (i3577==1)&(i3578==1)

capture drop score3
    gen score3 = .
    replace score3 = 1 if (i3579==1)&(i3580==1)

capture drop score4
    gen score4 = .
    replace score4 = 1 if (i3581==1)&(i3582==1)

capture drop score5
    gen score5 = .
    replace score5 = 1 if (i3583==1)&(i3584==1)

capture drop score6
    gen score6 = .
    replace score6 = 1 if (i3585==1)&(i3586==1)

capture drop score7
    gen score7 = .
    replace score7 = 1 if (i3587==1)&(i3588==1)

capture drop score8
    gen score8 = .
    replace score8 = 1 if (i3589==1)&(i3590==1)

capture drop score9
    gen score9 = .
    replace score9 = 1 if (i3591==1)&(i3592==1)

capture drop score10
    gen score10 = .
    replace score10 = 1 if (i3593==1)&(i3594==1)

capture drop score11
    gen score11 = .
    replace score11 = 1 if (i3595==1)&(i3596==1)

capture drop score12
    gen score12 = .
    replace score12 = 1 if (i3597==1)&(i3598==1)

capture drop score13
    gen score13 = .
    replace score13 = 1 if (i3599==1)&(i3600==1)

capture drop score14
    gen score14 = .
    replace score14 = 1 if (i3601==1)&(i3602==1)

capture drop score15
    gen score15 = .
    replace score15 = 1 if (i3603==1)&(i3604==1)

capture drop score16
    gen score16 = .
    replace score16 = 1 if (i3605==1)&(i3606==1)

capture drop score17
    gen score17 = .
    replace score17 = 1 if (i3607==1)&(i3608==1)

capture drop score18
    gen score18 = .
    replace score18 = 1 if (i3609==1)&(i3610==1)

capture drop score19
    gen score19 = .
    replace score19 = 1 if (i3611==1)&(i3612==1)

capture drop score20
    gen score20 = .
    replace score20 = 1 if (i3613==1)&(i3614==1)

capture drop score21
    gen score21 = .
    replace score21 = 1 if (i3615==1)&(i3616==1)

* return to jupyter
. *To get a point on this test the child has to successfully state what the

. * items have in common and also successfully provide a further congruent

. * example.

. 
. *These variables indicate whether the child got both elements correct for

. * each item pair in the test

. capture drop score1

.     gen score1 = .
(14,870 missing values generated)

.     replace score1 = 1 if (i3575==1)&(i3576==1)
(11,342 real changes made)

. 
. capture drop score2

.     gen score2 = .
(14,870 missing values generated)

.     replace score2 = 1 if (i3577==1)&(i3578==1)
(11,257 real changes made)

. 
. capture drop score3

.     gen score3 = .
(14,870 missing values generated)

.     replace score3 = 1 if (i3579==1)&(i3580==1)
(11,343 real changes made)

. 
. capture drop score4

.     gen score4 = .
(14,870 missing values generated)

.     replace score4 = 1 if (i3581==1)&(i3582==1)
(11,289 real changes made)

. 
. capture drop score5

.     gen score5 = .
(14,870 missing values generated)

.     replace score5 = 1 if (i3583==1)&(i3584==1)
(11,178 real changes made)

. 
. capture drop score6

.     gen score6 = .
(14,870 missing values generated)

.     replace score6 = 1 if (i3585==1)&(i3586==1)
(10,954 real changes made)

. 
. capture drop score7

.     gen score7 = .
(14,870 missing values generated)

.     replace score7 = 1 if (i3587==1)&(i3588==1)
(9,999 real changes made)

. 
. capture drop score8

.     gen score8 = .
(14,870 missing values generated)

.     replace score8 = 1 if (i3589==1)&(i3590==1)
(10,300 real changes made)

. 
. capture drop score9

.     gen score9 = .
(14,870 missing values generated)

.     replace score9 = 1 if (i3591==1)&(i3592==1)
(9,448 real changes made)

. 
. capture drop score10

.     gen score10 = .
(14,870 missing values generated)

.     replace score10 = 1 if (i3593==1)&(i3594==1)
(5,773 real changes made)

. 
. capture drop score11

.     gen score11 = .
(14,870 missing values generated)

.     replace score11 = 1 if (i3595==1)&(i3596==1)
(7,961 real changes made)

. 
. capture drop score12

.     gen score12 = .
(14,870 missing values generated)

.     replace score12 = 1 if (i3597==1)&(i3598==1)
(8,092 real changes made)

. 
. capture drop score13

.     gen score13 = .
(14,870 missing values generated)

.     replace score13 = 1 if (i3599==1)&(i3600==1)
(4,601 real changes made)

. 
. capture drop score14

.     gen score14 = .
(14,870 missing values generated)

.     replace score14 = 1 if (i3601==1)&(i3602==1)
(5,709 real changes made)

. 
. capture drop score15

.     gen score15 = .
(14,870 missing values generated)

.     replace score15 = 1 if (i3603==1)&(i3604==1)
(3,205 real changes made)

. 
. capture drop score16

.     gen score16 = .
(14,870 missing values generated)

.     replace score16 = 1 if (i3605==1)&(i3606==1)
(2,412 real changes made)

. 
. capture drop score17

.     gen score17 = .
(14,870 missing values generated)

.     replace score17 = 1 if (i3607==1)&(i3608==1)
(1,754 real changes made)

. 
. capture drop score18

.     gen score18 = .
(14,870 missing values generated)

.     replace score18 = 1 if (i3609==1)&(i3610==1)
(1,066 real changes made)

. 
. capture drop score19

.     gen score19 = .
(14,870 missing values generated)

.     replace score19 = 1 if (i3611==1)&(i3612==1)
(237 real changes made)

. 
. capture drop score20

.     gen score20 = .
(14,870 missing values generated)

.     replace score20 = 1 if (i3613==1)&(i3614==1)
(495 real changes made)

. 
. capture drop score21

.     gen score21 = .
(14,870 missing values generated)

.     replace score21 = 1 if (i3615==1)&(i3616==1)
(26 real changes made)

. 
. * return to jupyter

In [51]:
*This variable indicates cases which have no responses to any item
* on this subtest.

capture drop bcs10_basvs_notest
    gen bcs10_basvs_notest = 0
    replace bcs10_basvs_notest = 1 if (miss==42)
    label variable bcs10_basvs_notest "BCS10 No Test for BAS Verbal Sim both"
    label values bcs10_basvs_notest yesno
    drop miss

*This variable provides the total score for the test
capture drop bcs10_vs
egen bcs10_vs = rowtotal(score1 score2 score3 score4 score5 score6 score7 score8 score9 score10 score11 score12 score13 score14 score15 score16 score17 score18 score19 score20 score21) if (bcs10_basvs_notest==0)
sum bcs10_vs

*standardise to mean 0 sd 1
capture drop sbcs10_vs
    egen sbcs10_vs = std(bcs10_vs)
    summ sbcs10_vs

*standardise to mean 100 sd 15
capture drop bcs10_stdvs
    gen bcs10_stdvs = (sbcs10_vs*15)+100
    
summ bcs10_stdvs

* return to jupyter
. *This variable indicates cases which have no responses to any item

. * on this subtest.

. 
. capture drop bcs10_basvs_notest

.     gen bcs10_basvs_notest = 0

.     replace bcs10_basvs_notest = 1 if (miss==42)
(3,386 real changes made)

.     label variable bcs10_basvs_notest "BCS10 No Test for BAS Verbal Sim both"

.     label values bcs10_basvs_notest yesno

.     drop miss

. 
. *This variable provides the total score for the test

. capture drop bcs10_vs

. egen bcs10_vs = rowtotal(score1 score2 score3 score4 score5 score6 score7 score8 score9 score10 score11 score12 score13 score14 score15 sc
> ore16 score17 score18 score19 score20 score21) if (bcs10_basvs_notest==0)
(3386 missing values generated)

. sum bcs10_vs

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
    bcs10_vs |     11,484    12.05512    2.610513          0         20

. 
. *standardise to mean 0 sd 1

. capture drop sbcs10_vs

.     egen sbcs10_vs = std(bcs10_vs)
(3386 missing values generated)

.     summ sbcs10_vs

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
   sbcs10_vs |     11,484    1.53e-09           1  -4.617912   3.043417

. 
. *standardise to mean 100 sd 15

. capture drop bcs10_stdvs

.     gen bcs10_stdvs = (sbcs10_vs*15)+100
(3,386 missing values generated)

.     
. summ bcs10_stdvs

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
 bcs10_stdvs |     11,484         100          15   30.73132   145.6512

. 
. * return to jupyter

In [52]:
*Word Definitions
sum bcs10_stdworddefin
label variable bcs10_stdworddefin "BCS 10 BAS Word Definitions std"

*Verbal Similarities
sum bcs10_stdvs
label variable bcs10_stdvs "BCS 10 BAS Verbal Similarities std"

* return to jupyter
. *Word Definitions

. sum bcs10_stdworddefin

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
bcs10_stdw~n |     11,525         100          15    69.6506   165.5497

. label variable bcs10_stdworddefin "BCS 10 BAS Word Definitions std"

. 
. *Verbal Similarities

. sum bcs10_stdvs

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
 bcs10_stdvs |     11,484         100          15   30.73132   145.6512

. label variable bcs10_stdvs "BCS 10 BAS Verbal Similarities std"

. 
. * return to jupyter

In [53]:
*Digit Recall
sum bcs10_stddigits
label variable bcs10_stddigits "BCS 10 BAS Digit Recall std"

*Matrices
sum bcs10_stdmat
label variable bcs10_stdmat "BCS 10 BAS Matrices std"

* return to jupyter
. *Digit Recall

. sum bcs10_stddigits

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
bcs10_stdd~s |     11,512         100          15   24.96668   140.6938

. label variable bcs10_stddigits "BCS 10 BAS Digit Recall std"

. 
. *Matrices

. sum bcs10_stdmat

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
bcs10_stdmat |     11,496         100          15    57.3678   135.1667

. label variable bcs10_stdmat "BCS 10 BAS Matrices std"

. 
. * return to jupyter

In [54]:
*Here we compute the total non-verbal score

*We identify cases where the cohort members didn't complete the two non-verbal
* tests
egen rmiss = rmiss(bcs10_stddigits bcs10_stdmat)
tab rmiss
*11,454 completed both tests

*Create a variable that indicates the total score across these tests
* only if the cohort member completed both tests
capture drop bcs10_nonverbscore
egen bcs10_nonverbscore = rowtotal(bcs10_stddigits bcs10_stdmat) if (rmiss==0)

*standardise to mean 0 sd 1
capture drop sbcs10_nonverbscore
egen sbcs10_nonverbscore = std(bcs10_nonverbscore)
summ sbcs10_nonverbscore

*standardise to mean 100 sd 15
capture drop bcs10_stdnonverbscore
gen bcs10_stdnonverbscore = (sbcs10_nonverbscore*15)+100

summ bcs10_stdnonverbscore

*return to jupyter
. *Here we compute the total non-verbal score

. 
. *We identify cases where the cohort members didn't complete the two non-verbal

. * tests

. egen rmiss = rmiss(bcs10_stddigits bcs10_stdmat)

. tab rmiss

      rmiss |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |     11,454       77.03       77.03
          1 |        100        0.67       77.70
          2 |      3,316       22.30      100.00
------------+-----------------------------------
      Total |     14,870      100.00

. *11,454 completed both tests

. 
. *Create a variable that indicates the total score across these tests

. * only if the cohort member completed both tests

. capture drop bcs10_nonverbscore

. egen bcs10_nonverbscore = rowtotal(bcs10_stddigits bcs10_stdmat) if (rmiss==0)
(3416 missing values generated)

. 
. *standardise to mean 0 sd 1

. capture drop sbcs10_nonverbscore

. egen sbcs10_nonverbscore = std(bcs10_nonverbscore)
(3416 missing values generated)

. summ sbcs10_nonverbscore

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
sbcs10_non~e |     11,454    9.65e-11           1  -3.523276   2.895324

. 
. *standardise to mean 100 sd 15

. capture drop bcs10_stdnonverbscore

. gen bcs10_stdnonverbscore = (sbcs10_nonverbscore*15)+100
(3,416 missing values generated)

. 
. summ bcs10_stdnonverbscore

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
bcs10_stdn~e |     11,454         100          15   47.15086   143.4299

. 
. *return to jupyter

In [55]:
summ bcs10_stdnonverbscore

label variable bcs10_stdnonverbscore "BCS Age 10 Total Non-Verbal Score std"

* return to jupyter
. summ bcs10_stdnonverbscore

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
bcs10_stdn~e |     11,454         100          15   47.15086   143.4299

. 
. label variable bcs10_stdnonverbscore "BCS Age 10 Total Non-Verbal Score std"

. 
. * return to jupyter

In [56]:
*Total Verbal Score

*Identify cases which did not complete both tests that make up the verbal score
capture drop rmiss
    egen rmiss = rmiss(bcs10_stdvs bcs10_stdworddefin)
    tab rmiss
*11,464 completed both tests

*Create a variable that indicates the total verbal score 
* if the cohort member completed both tests
capture drop bcs10_verbscore
egen bcs10_verbscore = rowtotal(bcs10_stdvs bcs10_stdworddefin) if (rmiss==0)

drop rmiss

*standardise to mean 0 sd 1
capture drop sbcs10_verbscore
egen sbcs10_verbscore = std(bcs10_verbscore)
summ sbcs10_verbscore

*standardise to mean 100 sd 15
capture drop bcs10_stdverbscore
gen bcs10_stdverbscore = (sbcs10_verbscore*15)+100
summ bcs10_stdverbscore

summ bcs10_stdverbscore

label variable bcs10_stdverbscore "BCS Age 10 Total Verbal Score std"

* return to jupyter
. *Total Verbal Score

. 
. *Identify cases which did not complete both tests that make up the verbal score

. capture drop rmiss

.     egen rmiss = rmiss(bcs10_stdvs bcs10_stdworddefin)

.     tab rmiss

      rmiss |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |     11,463       77.09       77.09
          1 |         83        0.56       77.65
          2 |      3,324       22.35      100.00
------------+-----------------------------------
      Total |     14,870      100.00

. *11,464 completed both tests

. 
. *Create a variable that indicates the total verbal score 

. * if the cohort member completed both tests

. capture drop bcs10_verbscore

. egen bcs10_verbscore = rowtotal(bcs10_stdvs bcs10_stdworddefin) if (rmiss==0)
(3407 missing values generated)

. 
. drop rmiss

. 
. *standardise to mean 0 sd 1

. capture drop sbcs10_verbscore

. egen sbcs10_verbscore = std(bcs10_verbscore)
(3407 missing values generated)

. summ sbcs10_verbscore

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
sbcs10_ver~e |     11,463    1.57e-09           1  -3.656487    3.86983

. 
. *standardise to mean 100 sd 15

. capture drop bcs10_stdverbscore

. gen bcs10_stdverbscore = (sbcs10_verbscore*15)+100
(3,407 missing values generated)

. summ bcs10_stdverbscore

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
bcs10_stdv~e |     11,463         100          15    45.1527   158.0475

. 
. summ bcs10_stdverbscore

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
bcs10_stdv~e |     11,463         100          15    45.1527   158.0475

. 
. label variable bcs10_stdverbscore "BCS Age 10 Total Verbal Score std"

. 
In [57]:
*This bit of code computes the total overall score on the 
* cognitive ability test

*First we create a variable to indicate whether the cohort members
* complete all of the tests that make up the overall test scores
egen rmiss = rmiss(bcs10_stdworddefin bcs10_stddigits bcs10_stdvs bcs10_stdmat)
tab rmiss

*We create a variable that indicates the overall test score
* only for thise cohort members who completed all of the required tests
capture drop bcs10_abilityscore
egen bcs10_abilityscore = rowtotal(bcs10_stdworddefin bcs10_stddigits bcs10_stdvs bcs10_stdmat) if (rmiss==0)

*standardise to mean 0 sd 1
capture drop sbcs10_abilityscore
egen sbcs10_abilityscore = std(bcs10_abilityscore)
summ sbcs10_abilityscore

*standardise to mean 100 sd 15
capture drop bcs10_stdabilityscore
gen bcs10_stdabilityscore = (sbcs10_abilityscore*15)+100
summ bcs10_stdabilityscore

label variable bcs10_stdabilityscore "BCS Age 10 Total Ability Test Score std"

* return to jupyter
. *This bit of code computes the total overall score on the 

. * cognitive ability test

. 
. *First we create a variable to indicate whether the cohort members

. * complete all of the tests that make up the overall test scores

. egen rmiss = rmiss(bcs10_stdworddefin bcs10_stddigits bcs10_stdvs bcs10_stdmat)

. tab rmiss

      rmiss |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |     11,397       76.64       76.64
          1 |        121        0.81       77.46
          2 |         21        0.14       77.60
          3 |         24        0.16       77.76
          4 |      3,307       22.24      100.00
------------+-----------------------------------
      Total |     14,870      100.00

. 
. *We create a variable that indicates the overall test score

. * only for thise cohort members who completed all of the required tests

. capture drop bcs10_abilityscore

. egen bcs10_abilityscore = rowtotal(bcs10_stdworddefin bcs10_stddigits bcs10_stdvs bcs10_stdmat) if (rmiss==0)
(3473 missing values generated)

. 
. *standardise to mean 0 sd 1

. capture drop sbcs10_abilityscore

. egen sbcs10_abilityscore = std(bcs10_abilityscore)
(3473 missing values generated)

. summ sbcs10_abilityscore

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
sbcs10_abi~e |     11,397   -1.67e-10           1  -3.760751   3.412836

. 
. *standardise to mean 100 sd 15

. capture drop bcs10_stdabilityscore

. gen bcs10_stdabilityscore = (sbcs10_abilityscore*15)+100
(3,473 missing values generated)

. summ bcs10_stdabilityscore

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
bcs10_stda~e |     11,397         100          15   43.58873   151.1925

. 
. label variable bcs10_stdabilityscore "BCS Age 10 Total Ability Test Score std"

. 
. * return to jupyter

We could also produce an overall cognitive ability test scores using principal components analysis (see for example Schoon 2010). Here we compute a general ability test scores using the method described in Schoon (2010).

In [58]:
* We undertake principal components analysis of the four sub-tests 
* that make up the general ability test. Here are these items:

tab bcs10_stdworddefin, mi 
tab bcs10_stddigits, mi
tab bcs10_stdvs, mi
tab bcs10_stdmat, mi


* We only want to include those who completed all four tests. This variable
* was create above.

tab rmiss

* We examine the correlation between these tests:

pwcorr bcs10_stdworddefin bcs10_stddigits bcs10_stdvs bcs10_stdmat if (rmiss==0), sig

* Principal components analysis of the four tests that make up the 
* general ability test:

pca bcs10_stdworddefin bcs10_stddigits bcs10_stdvs bcs10_stdmat if (rmiss==0)

* Only the first component has an eigenvalue greater than 1.


* Here we predict the score for each individual on the first principal
* component. This score is obtained by applying the elements of the 
* corresponding eigenvector to the starndardised values of the original
* observations for an individual.

predict bcs10_pc1 if (rmiss==0), score
label variable bcs10_pc1 "BCS Age 10 PCA Score"


* We standardise this variable:

capture drop bcs10_stdpc1
egen bcs10_stdpc1 = std(bcs10_pc1)
summ bcs10_stdpc1
label variable bcs10_stdpc1 "BCS Age 10 standardised PCA Score"

summ bcs10_stdpc1

* return to jupyter
. * We undertake principal components analysis of the four sub-tests 

. * that make up the general ability test. Here are these items:

. 
. tab bcs10_stdworddefin, mi 

 BCS 10 BAS |
       Word |
Definitions |
        std |      Freq.     Percent        Cum.
------------+-----------------------------------
    69.6506 |         77        0.52        0.52
   72.64745 |        160        1.08        1.59
   75.64429 |        268        1.80        3.40
   78.64114 |        404        2.72        6.11
   81.63799 |        554        3.73        9.84
   84.63483 |        685        4.61       14.45
   87.63168 |        776        5.22       19.66
   90.62852 |        869        5.84       25.51
   93.62537 |        900        6.05       31.56
   96.62221 |        902        6.07       37.63
   99.61906 |        873        5.87       43.50
   102.6159 |        807        5.43       48.92
   105.6127 |        792        5.33       54.25
   108.6096 |        697        4.69       58.94
   111.6064 |        587        3.95       62.89
   114.6033 |        497        3.34       66.23
   117.6001 |        380        2.56       68.78
    120.597 |        347        2.33       71.12
   123.5938 |        270        1.82       72.93
   126.5907 |        196        1.32       74.25
   129.5875 |        153        1.03       75.28
   132.5844 |         97        0.65       75.93
   135.5812 |         68        0.46       76.39
    138.578 |         61        0.41       76.80
   141.5749 |         35        0.24       77.03
   144.5717 |         23        0.15       77.19
   147.5686 |         15        0.10       77.29
   150.5654 |         13        0.09       77.38
   153.5623 |          8        0.05       77.43
   156.5591 |          7        0.05       77.48
    159.556 |          3        0.02       77.50
   165.5497 |          1        0.01       77.51
          . |      3,345       22.49      100.00
------------+-----------------------------------
      Total |     14,870      100.00

. tab bcs10_stddigits, mi

 BCS 10 BAS |
      Digit |
 Recall std |      Freq.     Percent        Cum.
------------+-----------------------------------
   24.96668 |          1        0.01        0.01
   28.47356 |          1        0.01        0.01
   46.00797 |          3        0.02        0.03
   49.51485 |          3        0.02        0.05
   53.02174 |          4        0.03        0.08
   56.52862 |         14        0.09        0.17
    60.0355 |         22        0.15        0.32
   63.54238 |         53        0.36        0.68
   67.04926 |         90        0.61        1.28
   70.55614 |        161        1.08        2.37
   74.06303 |        241        1.62        3.99
   77.56991 |        301        2.02        6.01
   81.07679 |        453        3.05        9.06
   84.58367 |        593        3.99       13.05
   88.09055 |        882        5.93       18.98
   91.59743 |      1,043        7.01       25.99
   95.10432 |      1,117        7.51       33.50
    98.6112 |      1,151        7.74       41.24
   102.1181 |      1,047        7.04       48.29
    105.625 |        913        6.14       54.43
   109.1318 |        736        4.95       59.37
   112.6387 |        618        4.16       63.53
   116.1456 |        561        3.77       67.30
   119.6525 |        489        3.29       70.59
   123.1594 |        372        2.50       73.09
   126.6663 |        257        1.73       74.82
   130.1731 |        178        1.20       76.02
     133.68 |        121        0.81       76.83
   137.1869 |         52        0.35       77.18
   140.6938 |         35        0.24       77.42
          . |      3,358       22.58      100.00
------------+-----------------------------------
      Total |     14,870      100.00

. tab bcs10_stdvs, mi

 BCS 10 BAS |
     Verbal |
Similaritie |
      s std |      Freq.     Percent        Cum.
------------+-----------------------------------
   30.73132 |         20        0.13        0.13
   36.47731 |          2        0.01        0.15
   42.22332 |          6        0.04        0.19
   47.96931 |         10        0.07        0.26
   53.71531 |         26        0.17        0.43
    59.4613 |         51        0.34        0.77
    65.2073 |        120        0.81        1.58
   70.95329 |        229        1.54        3.12
    76.6993 |        481        3.23        6.36
   82.44529 |        851        5.72       12.08
   88.19128 |      1,252        8.42       20.50
   93.93729 |      1,571       10.56       31.06
   99.68328 |      1,730       11.63       42.70
   105.4293 |      1,781       11.98       54.67
   111.1753 |      1,378        9.27       63.94
   116.9213 |        991        6.66       70.61
   122.6673 |        580        3.90       74.51
   128.4133 |        278        1.87       76.38
   134.1593 |         90        0.61       76.98
   139.9053 |         29        0.20       77.18
   145.6512 |          8        0.05       77.23
          . |      3,386       22.77      100.00
------------+-----------------------------------
      Total |     14,870      100.00

. tab bcs10_stdmat, mi

 BCS 10 BAS |
   Matrices |
        std |      Freq.     Percent        Cum.
------------+-----------------------------------
    57.3678 |          4        0.03        0.03
   60.14633 |          4        0.03        0.05
   62.92487 |         73        0.49        0.54
    65.7034 |         84        0.56        1.11
   68.48193 |        112        0.75        1.86
   71.26046 |        155        1.04        2.91
   74.03899 |        255        1.71        4.62
   76.81753 |        301        2.02        6.64
   79.59606 |        397        2.67        9.31
   82.37459 |        440        2.96       12.27
   85.15312 |        513        3.45       15.72
   87.93166 |        559        3.76       19.48
   90.71019 |        618        4.16       23.64
   93.48872 |        661        4.45       28.08
   96.26725 |        708        4.76       32.84
   99.04578 |        791        5.32       38.16
   101.8243 |        734        4.94       43.10
   104.6029 |        739        4.97       48.07
   107.3814 |        756        5.08       53.15
   110.1599 |        780        5.25       58.40
   112.9384 |        684        4.60       63.00
    115.717 |        605        4.07       67.07
   118.4955 |        477        3.21       70.28
    121.274 |        380        2.56       72.83
   124.0526 |        314        2.11       74.94
   126.8311 |        195        1.31       76.25
   129.6096 |         95        0.64       76.89
   132.3882 |         46        0.31       77.20
   135.1667 |         16        0.11       77.31
          . |      3,374       22.69      100.00
------------+-----------------------------------
      Total |     14,870      100.00

. 
. 
. * We only want to include those who completed all four tests. This variable

. * was create above.

. 
. tab rmiss

      rmiss |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |     11,397       76.64       76.64
          1 |        121        0.81       77.46
          2 |         21        0.14       77.60
          3 |         24        0.16       77.76
          4 |      3,307       22.24      100.00
------------+-----------------------------------
      Total |     14,870      100.00

. 
. * We examine the correlation between these tests:

. 
. pwcorr bcs10_stdworddefin bcs10_stddigits bcs10_stdvs bcs10_stdmat if (rmiss==0), sig

             | bcs10_.. ~ddigits bcs1~dvs bcs~dmat
-------------+------------------------------------
bcs10_stdw~n |   1.0000 
             |
             |
bcs10_stdd~s |   0.3257   1.0000 
             |   0.0000
             |
 bcs10_stdvs |   0.6521   0.3254   1.0000 
             |   0.0000   0.0000
             |
bcs10_stdmat |   0.4757   0.3093   0.4838   1.0000 
             |   0.0000   0.0000   0.0000
             |

. 
. * Principal components analysis of the four tests that make up the 

. * general ability test:

. 
. pca bcs10_stdworddefin bcs10_stddigits bcs10_stdvs bcs10_stdmat if (rmiss==0)

Principal components/correlation                 Number of obs    =     11,397
                                                 Number of comp.  =          4
                                                 Trace            =          4
    Rotation: (unrotated = principal)            Rho              =     1.0000

    --------------------------------------------------------------------------
       Component |   Eigenvalue   Difference         Proportion   Cumulative
    -------------+------------------------------------------------------------
           Comp1 |      2.31293      1.54579             0.5782       0.5782
           Comp2 |      .767141      .195023             0.1918       0.7700
           Comp3 |      .572119       .22431             0.1430       0.9130
           Comp4 |      .347808            .             0.0870       1.0000
    --------------------------------------------------------------------------

Principal components (eigenvectors) 

    --------------------------------------------------------------------
        Variable |    Comp1     Comp2     Comp3     Comp4 | Unexplained 
    -------------+----------------------------------------+-------------
    bcs10_stdw~n |   0.5490   -0.2604   -0.3745    0.7004 |           0 
    bcs10_stdd~s |   0.3890    0.9182   -0.0743   -0.0034 |           0 
     bcs10_stdvs |   0.5510   -0.2638   -0.3432   -0.7135 |           0 
    bcs10_stdmat |   0.4936   -0.1396    0.8582    0.0200 |           0 
    --------------------------------------------------------------------

. 
. * Only the first component has an eigenvalue greater than 1.

. 
. 
. * Here we predict the score for each individual on the first principal

. * component. This score is obtained by applying the elements of the 

. * corresponding eigenvector to the starndardised values of the original

. * observations for an individual.

. 
. predict bcs10_pc1 if (rmiss==0), score
(3 components skipped)

Scoring coefficients 
    sum of squares(column-loading) = 1

    ------------------------------------------------------
        Variable |    Comp1     Comp2     Comp3     Comp4 
    -------------+----------------------------------------
    bcs10_stdw~n |   0.5490   -0.2604   -0.3745    0.7004 
    bcs10_stdd~s |   0.3890    0.9182   -0.0743   -0.0034 
     bcs10_stdvs |   0.5510   -0.2638   -0.3432   -0.7135 
    bcs10_stdmat |   0.4936   -0.1396    0.8582    0.0200 
    ------------------------------------------------------

. label variable bcs10_pc1 "BCS Age 10 PCA Score"

. 
. 
. * We standardise this variable:

. 
. capture drop bcs10_stdpc1

. egen bcs10_stdpc1 = std(bcs10_pc1)
(3473 missing values generated)

. summ bcs10_stdpc1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
bcs10_stdpc1 |     11,397   -4.73e-10           1  -3.701819   3.371969

. label variable bcs10_stdpc1 "BCS Age 10 standardised PCA Score"

. 
. summ bcs10_stdpc1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
bcs10_stdpc1 |     11,397   -4.73e-10           1  -3.701819   3.371969

. 
. * return to jupyter

In [59]:
keep bcsid bcs10_stdworddefin bcs10_stdvs bcs10_stddigits bcs10_stdmat bcs10_stdabilityscore bcs10_stdverbscore bcs10_stdnonverbscore bcs10_abilityscore bcs10_vs bcs10_verbscore bcs10_stdpc1

sort bcsid

save $path3\temp8.dta, replace

* return to jupyter
. keep bcsid bcs10_stdworddefin bcs10_stdvs bcs10_stddigits bcs10_stdmat bcs10_stdabilityscore bcs10_stdverbscore bcs10_stdnonverbscore bcs1
> 0_abilityscore bcs10_vs bcs10_verbscore bcs10_stdpc1

. 
. sort bcsid

. 
. save $path3\temp8.dta, replace
(note: file F:\Data\MYDATA\TEMP\temp8.dta not found)
file F:\Data\MYDATA\TEMP\temp8.dta saved

. 
. * return to jupyter

Using the SPSS code, noted above, we have also coded the ability test scores in SPSS. We now compare the scores provided by the SPSS coding and the Stata coding to ensure that the procedures we have carried out in Stata are equivalent to the procedures previously used to compute the variable used in previous published studies.

In [60]:
use $path3\temp8.dta, clear

* return to jupyter
. use $path3\temp8.dta, clear

. 
In [61]:
merge 1:1 bcsid using $path2\SPSSBASTOTALSCORE.dta
drop _merge

* return to jupyter
. merge 1:1 bcsid using $path2\SPSSBASTOTALSCORE.dta
(note: variable bcsid was str7, now str21 to accommodate using data's values)

    Result                           # of obs.
    -----------------------------------------
    not matched                             0
    matched                            14,870  (_merge==3)
    -----------------------------------------

. drop _merge

. 
. * return to jupyter

In [62]:
summ score14
summ bcs10_stdworddefin
pwcorr score14 bcs10_stdworddefin, sig

* return to jupyter
. summ score14

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     score14 |     11,525         100          15    69.6506   165.5497

. summ bcs10_stdworddefin

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
bcs10_stdw~n |     11,525         100          15    69.6506   165.5497

. pwcorr score14 bcs10_stdworddefin, sig

             |  score14 bcs10_~n
-------------+------------------
     score14 |   1.0000 
             |
             |
bcs10_stdw~n |   1.0000   1.0000 
             |   0.0000
             |

. 
. * return to jupyter

In [63]:
summ score15
summ bcs10_stddigits
pwcorr score15 bcs10_stddigits, sig

* return to jupyter
. summ score15

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     score15 |     11,512         100          15   24.96668   140.6938

. summ bcs10_stddigits

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
bcs10_stdd~s |     11,512         100          15   24.96668   140.6938

. pwcorr score15 bcs10_stddigits, sig

             |  score15 bcs10~ts
-------------+------------------
     score15 |   1.0000 
             |
             |
bcs10_stdd~s |   1.0000   1.0000 
             |   0.0000
             |

. 
. * return to jupyter

In [64]:
summ score16
summ bcs10_stdvs
pwcorr score16 bcs10_stdvs, sig

*There is a one case difference here between the SPSS coding and the coding 
*in Stata. There is one fewer case in the SPSS coding.
*By examining the data we can ascertain that this is because the SPSS coding
*determines that this cohort member did not complete the test at all because
*the have no correct or incorrect answers on the two items required to 
*gain a score. By examining the data we can see that this cohort member did 
*complete the test as they scored many correct answers on the naming 
*element of the test. But most of their answers were 'not stated' on the 
*example element of the case. It can therefore be determined that the cohort
*member did take the test. Therefore have chosen not to exclude this case.

* return to jupyter
. summ score16

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     score16 |     11,483         100          15   30.66389   145.6857

. summ bcs10_stdvs

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
 bcs10_stdvs |     11,484         100          15   30.73132   145.6512

. pwcorr score16 bcs10_stdvs, sig

             |  score16 bcs1~dvs
-------------+------------------
     score16 |   1.0000 
             |
             |
 bcs10_stdvs |   1.0000   1.0000 
             |   0.0000
             |

. 
. *There is a one case difference here between the SPSS coding and the coding 

. *in Stata. There is one fewer case in the SPSS coding.

. *By examining the data we can ascertain that this is because the SPSS coding

. *determines that this cohort member did not complete the test at all because

. *the have no correct or incorrect answers on the two items required to 

. *gain a score. By examining the data we can see that this cohort member did 

. *complete the test as they scored many correct answers on the naming 

. *element of the test. But most of their answers were 'not stated' on the 

. *example element of the case. It can therefore be determined that the cohort

. *member did take the test. Therefore have chosen not to exclude this case.

. 
. * return to jupyter

In [65]:
summ score17
summ bcs10_stdmat
pwcorr score17 bcs10_stdmat, sig

* return to jupyter
. summ score17

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     score17 |     11,496         100          15    57.3678   135.1667

. summ bcs10_stdmat

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
bcs10_stdmat |     11,496         100          15    57.3678   135.1667

. pwcorr score17 bcs10_stdmat, sig

             |  score17 bcs10_~t
-------------+------------------
     score17 |   1.0000 
             |
             |
bcs10_stdmat |   1.0000   1.0000 
             |   0.0000
             |

. 
. * return to jupyter

In [66]:
summ score20
summ bcs10_stdabilityscore
pwcorr score20 bcs10_stdabilityscore, sig

*This score varies very slightly because of the 1 additional case in our
*coding which is excluded in the SPSS coding.

* return to jupyter
. summ score20

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     score20 |     11,396         100          15   43.57055    151.197

. summ bcs10_stdabilityscore

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
bcs10_stda~e |     11,397         100          15   43.58873   151.1925

. pwcorr score20 bcs10_stdabilityscore, sig

             |  score20 bcs10_..
-------------+------------------
     score20 |   1.0000 
             |
             |
bcs10_stda~e |   1.0000   1.0000 
             |   0.0000
             |

. 
. *This score varies very slightly because of the 1 additional case in our

. *coding which is excluded in the SPSS coding.

. 

Here we code father's NS-SEC, this will be our main parental social class measure in the analysis.

The occupational information we are going to use for our parental social class measure comes from the new occupational coding files (SN7023).

Gregg, P. (2012). Occupational Coding for the National Child Development Study (1969, 1991-2008) and the 1970 British Cohort Study (1980, 2000-2008). [data collection]. University of London. Institute of Education. Centre for Longitudinal Studies, [original data producer(s)]. UK Data Service. SN: 7023.

"Researchers from the Avon Longitudinal Study of Parents and Children (ALSPAC), based at the University of Bristol, worked on data from selected waves of the NCDS and BCS70. To create occupational code classifications, the computerised questionnaire response text strings were converted into comma separated value (CSV) files and processed using the CASCOT (Computer Assisted Structured COding Tool) software programme, which used automatic and semi-automatic processing to assign Standard Occupational Classification 2000 (SOC2000) codes (SOC2000) to entries."

In [67]:
****Parent's Occupations
use $path1\ARCHIVE\NCDSBCS_OCCS\bcs3_occupation_coding_father.dta, clear
keep BCSID B3FSNSSEC B3FSSOCC B3FSSOC90
rename BCSID bcsid


*Father's NSSEC
tab B3FSNSSEC
capture drop bcs_panssec
    gen bcs_panssec = .
    replace bcs_panssec = 1 if (B3FSNSSEC>=1)&(B3FSNSSEC<=2) 
    *1.1 Large Employers and Higher Managerial
    replace bcs_panssec = 2 if (B3FSNSSEC>=3.1)&(B3FSNSSEC<=3.4) 
    *1.2 Higher Professional
    replace bcs_panssec = 3 if (B3FSNSSEC>=4.1)&(B3FSNSSEC<=6) 
    *lower managerial and professional
    replace bcs_panssec = 4 if (B3FSNSSEC>=7.1)&(B3FSNSSEC<=7.4) 
    *intermediate
    replace bcs_panssec = 5 if (B3FSNSSEC>=8.1)&(B3FSNSSEC<=9.2) 
    *small employers and own account
    replace bcs_panssec = 6 if (B3FSNSSEC>=10)&(B3FSNSSEC<=11.2) 
    *lower supervisory and technical
    replace bcs_panssec = 7 if (B3FSNSSEC>=12.1)&(B3FSNSSEC<=12.7) 
    *semiroutine
    replace bcs_panssec = 8 if (B3FSNSSEC>=13.1)&(B3FSNSSEC<=13.5) 
    *routine
    tab bcs_panssec
    label variable bcs_panssec "BCS Age 10 Father's NSSEC"
    label define nssec 1 "Large Employers and Higher Managerial" 2 "Higher Professional" 3 "Lower managerial and professional" 4 "Intermediate" 5 "Small employers and own account" 6 "Lower Supervisory and Technical" 7 "Semi-Routine" 8 "Routine" 
    label values bcs_panssec nssec
    numlabel, add
    tab bcs_panssec, mi

* return to jupyter
. ****Parent's Occupations

. use $path1\ARCHIVE\NCDSBCS_OCCS\bcs3_occupation_coding_father.dta, clear

. keep BCSID B3FSNSSEC B3FSSOCC B3FSSOC90

. rename BCSID bcsid

. 
. 
. *Father's NSSEC

. tab B3FSNSSEC

   BCS 1980 |
    Father: |
     NS-SEC |
     social |
 class code |
       SEMI |
 processing |      Freq.     Percent        Cum.
------------+-----------------------------------
          2 |        570        4.73        4.73
        3.1 |        624        5.18        9.91
        3.2 |        120        1.00       10.91
        3.3 |         16        0.13       11.04
        4.1 |        905        7.51       18.56
        4.2 |        176        1.46       20.02
        4.3 |         30        0.25       20.27
          5 |        695        5.77       26.04
        7.1 |        329        2.73       28.77
        7.2 |        471        3.91       32.68
        7.3 |        137        1.14       33.81
        7.4 |        153        1.27       35.09
        8.1 |        224        1.86       36.94
        9.1 |      1,137        9.44       46.38
        9.2 |        221        1.83       48.22
         10 |        182        1.51       49.73
       11.1 |      1,574       13.07       62.80
       11.2 |        276        2.29       65.09
       12.1 |        106        0.88       65.97
       12.2 |        290        2.41       68.38
       12.3 |        603        5.01       73.38
       12.4 |        566        4.70       78.08
       12.5 |         88        0.73       78.81
       12.6 |         74        0.61       79.43
       12.7 |         12        0.10       79.53
       13.1 |         52        0.43       79.96
       13.2 |        132        1.10       81.05
       13.3 |      1,461       12.13       93.18
       13.4 |        800        6.64       99.83
       13.5 |         21        0.17      100.00
------------+-----------------------------------
      Total |     12,045      100.00

. capture drop bcs_panssec

.     gen bcs_panssec = .
(14,874 missing values generated)

.     replace bcs_panssec = 1 if (B3FSNSSEC>=1)&(B3FSNSSEC<=2) 
(570 real changes made)

.     *1.1 Large Employers and Higher Managerial

.     replace bcs_panssec = 2 if (B3FSNSSEC>=3.1)&(B3FSNSSEC<=3.4) 
(760 real changes made)

.     *1.2 Higher Professional

.     replace bcs_panssec = 3 if (B3FSNSSEC>=4.1)&(B3FSNSSEC<=6) 
(1,806 real changes made)

.     *lower managerial and professional

.     replace bcs_panssec = 4 if (B3FSNSSEC>=7.1)&(B3FSNSSEC<=7.4) 
(1,090 real changes made)

.     *intermediate

.     replace bcs_panssec = 5 if (B3FSNSSEC>=8.1)&(B3FSNSSEC<=9.2) 
(1,582 real changes made)

.     *small employers and own account

.     replace bcs_panssec = 6 if (B3FSNSSEC>=10)&(B3FSNSSEC<=11.2) 
(2,032 real changes made)

.     *lower supervisory and technical

.     replace bcs_panssec = 7 if (B3FSNSSEC>=12.1)&(B3FSNSSEC<=12.7) 
(1,739 real changes made)

.     *semiroutine

.     replace bcs_panssec = 8 if (B3FSNSSEC>=13.1)&(B3FSNSSEC<=13.5) 
(2,466 real changes made)

.     *routine

.     tab bcs_panssec

bcs_panssec |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |        570        4.73        4.73
          2 |        760        6.31       11.04
          3 |      1,806       14.99       26.04
          4 |      1,090        9.05       35.09
          5 |      1,582       13.13       48.22
          6 |      2,032       16.87       65.09
          7 |      1,739       14.44       79.53
          8 |      2,466       20.47      100.00
------------+-----------------------------------
      Total |     12,045      100.00

.     label variable bcs_panssec "BCS Age 10 Father's NSSEC"

.     label define nssec 1 "Large Employers and Higher Managerial" 2 "Higher Professional" 3 "Lower managerial and professional" 4 "Intermed
> iate" 5 "Small employers and own account" 6 "Lower Supervisory and Technical" 7 "Semi-Routine" 8 "Routine" 

.     label values bcs_panssec nssec

.     numlabel, add

.     tab bcs_panssec, mi

              BCS Age 10 Father's NSSEC |      Freq.     Percent        Cum.
----------------------------------------+-----------------------------------
1. Large Employers and Higher Manageria |        570        3.83        3.83
                 2. Higher Professional |        760        5.11        8.94
   3. Lower managerial and professional |      1,806       12.14       21.08
                        4. Intermediate |      1,090        7.33       28.41
     5. Small employers and own account |      1,582       10.64       39.05
     6. Lower Supervisory and Technical |      2,032       13.66       52.71
                        7. Semi-Routine |      1,739       11.69       64.40
                             8. Routine |      2,466       16.58       80.98
                                      . |      2,829       19.02      100.00
----------------------------------------+-----------------------------------
                                  Total |     14,874      100.00

. 
. * return to jupyter

In [68]:
****I am going to recode NSSEC from above just to double check
capture drop ukempst
gen ukempst = 0

describe
capture drop soc2000
    gen soc2000 = real(B3FSSOCC)

sort soc2000 ukempst
merge m:m soc2000 ukempst using $path1\OTHER\SOC2000_to_NSSEC_20160527_RC_V1.dta

tab nssec

sort soc2000
drop if _merge==2

drop ukempst

tab nssec bcs_panssec
kap nssec bcs_panssec
*Perfect match

drop _merge bcs_panssec

rename nssec bcs_panssecsimp
rename soc2000 bcs_dadsoc2000simp

sort bcsid
save $path3\temp9.dta, replace

* return to jupyter
. ****I am going to recode NSSEC from above just to double check

. capture drop ukempst

. gen ukempst = 0

. 
. describe

Contains data from F:\Data\RAWDATA\ARCHIVE\NCDSBCS_OCCS\bcs3_occupation_coding_father.dta
  obs:        14,874                          
 vars:             6                          
 size:       431,346                          
--------------------------------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
--------------------------------------------------------------------------------------------------------------------------------------------
bcsid           str7    %7s                   bcsid
B3FSSOCC        str4    %4s                   BCS 1980 Father: SEMI auto SOC2000
B3FSSOC90       int     %8.0g                 BCS 1980 Father: SEMI automatic SOC90
B3FSNSSEC       double  %10.0g                BCS 1980 Father: NS-SEC social class code SEMI processing
bcs_panssec     float   %40.0g     nssec      BCS Age 10 Father's NSSEC
ukempst         float   %9.0g                 
--------------------------------------------------------------------------------------------------------------------------------------------
Sorted by: 
     Note: Dataset has changed since last saved.

. capture drop soc2000

.     gen soc2000 = real(B3FSSOCC)
(2,829 missing values generated)

. 
. sort soc2000 ukempst

. merge m:m soc2000 ukempst using $path1\OTHER\SOC2000_to_NSSEC_20160527_RC_V1.dta
(label nssec already defined)

    Result                           # of obs.
    -----------------------------------------
    not matched                         5,330
        from master                     2,829  (_merge==1)
        from using                      2,501  (_merge==2)

    matched                            12,045  (_merge==3)
    -----------------------------------------

. 
. tab nssec

                                  nssec |      Freq.     Percent        Cum.
----------------------------------------+-----------------------------------
1. Large Employers and Higher Manageria |        916        6.30        6.30
                 2. Higher Professional |      1,052        7.23       13.53
   3. Lower managerial and professional |      2,390       16.43       29.96
                        4. Intermediate |      1,248        8.58       38.54
     5. Small employers and own account |      2,050       14.09       52.63
     6. Lower Supervisory and Technical |      2,292       15.76       68.39
                        7. Semi-Routine |      1,940       13.34       81.73
                             8. Routine |      2,658       18.27      100.00
----------------------------------------+-----------------------------------
                                  Total |     14,546      100.00

. 
. sort soc2000

. drop if _merge==2
(2,501 observations deleted)

. 
. drop ukempst

. 
. tab nssec bcs_panssec

                      |                                BCS Age 10 Father's NSSEC
                nssec | 1. Large   2. Higher  3. Lower   4. Interm  5. Small   6. Lower   7. Semi-R  8. Routin |     Total
----------------------+----------------------------------------------------------------------------------------+----------
1. Large Employers an |       570          0          0          0          0          0          0          0 |       570 
2. Higher Professiona |         0        760          0          0          0          0          0          0 |       760 
3. Lower managerial a |         0          0      1,806          0          0          0          0          0 |     1,806 
      4. Intermediate |         0          0          0      1,090          0          0          0          0 |     1,090 
5. Small employers an |         0          0          0          0      1,582          0          0          0 |     1,582 
6. Lower Supervisory  |         0          0          0          0          0      2,032          0          0 |     2,032 
      7. Semi-Routine |         0          0          0          0          0          0      1,739          0 |     1,739 
           8. Routine |         0          0          0          0          0          0          0      2,466 |     2,466 
----------------------+----------------------------------------------------------------------------------------+----------
                Total |       570        760      1,806      1,090      1,582      2,032      1,739      2,466 |    12,045 


. kap nssec bcs_panssec

             Expected
Agreement   Agreement     Kappa   Std. Err.         Z      Prob>Z
-----------------------------------------------------------------
 100.00%      14.54%     1.0000     0.0037     270.50      0.0000

. *Perfect match

. 
. drop _merge bcs_panssec

. 
. rename nssec bcs_panssecsimp

. rename soc2000 bcs_dadsoc2000simp

. 
. sort bcsid

. save $path3\temp9.dta, replace
(note: file F:\Data\MYDATA\TEMP\temp9.dta not found)
file F:\Data\MYDATA\TEMP\temp9.dta saved

. 
. * return to jupyter

We also code mother's NS-SEC here. This is not used in the main analysis as mother's NS-SEC is not available in the NCDS.

In [69]:
use $path1\ARCHIVE\NCDSBCS_OCCS\bcs3_occupation_coding_mother.dta,  clear
keep BCSID B3MSNSSEC B3MSSOCC
rename BCSID bcsid

*Mother's NSSEC
capture drop bcs_manssec
    gen bcs_manssec = .
    replace bcs_manssec = 1 if (B3MSNSSEC>=1)&(B3MSNSSEC<=2) 
    *1.1 Large Employers and Higher Managerial
    replace bcs_manssec = 2 if (B3MSNSSEC>=3.1)&(B3MSNSSEC<=3.4) 
    *1.2 Higher Professional
    replace bcs_manssec = 3 if (B3MSNSSEC>=4.1)&(B3MSNSSEC<=6) 
    *lower managerial and professional
    replace bcs_manssec = 4 if (B3MSNSSEC>=7.1)&(B3MSNSSEC<=7.4) 
    *intermediate
    replace bcs_manssec = 5 if (B3MSNSSEC>=8.1)&(B3MSNSSEC<=9.2) 
    *small employers and own account
    replace bcs_manssec = 6 if (B3MSNSSEC>=10)&(B3MSNSSEC<=11.2) 
    *lower supervisory and technical
    replace bcs_manssec = 7 if (B3MSNSSEC>=12.1)&(B3MSNSSEC<=12.7) 
    *semiroutine
    replace bcs_manssec = 8 if (B3MSNSSEC>=13.1)&(B3MSNSSEC<=13.5) 
    *routine
    tab bcs_manssec
    label variable bcs_manssec "BCS Age 10 Mother's NSSEC"
    label define nssec 1 "Large Employers and Higher Managerial" 2 "Higher Professional" 3 "Lower managerial and professional" 4 "Intermediate" 5 "Small employers and own account" 6 "Lower Supervisory and Technical" 7 "Semi-Routine" 8 "Routine" , replace
    label values bcs_manssec nssec
    numlabel, add
    tab bcs_manssec, mi
    
* return to jupyter
. use $path1\ARCHIVE\NCDSBCS_OCCS\bcs3_occupation_coding_mother.dta,  clear

. keep BCSID B3MSNSSEC B3MSSOCC

. rename BCSID bcsid

. 
. *Mother's NSSEC

. capture drop bcs_manssec

.     gen bcs_manssec = .
(14,874 missing values generated)

.     replace bcs_manssec = 1 if (B3MSNSSEC>=1)&(B3MSNSSEC<=2) 
(51 real changes made)

.     *1.1 Large Employers and Higher Managerial

.     replace bcs_manssec = 2 if (B3MSNSSEC>=3.1)&(B3MSNSSEC<=3.4) 
(95 real changes made)

.     *1.2 Higher Professional

.     replace bcs_manssec = 3 if (B3MSNSSEC>=4.1)&(B3MSNSSEC<=6) 
(1,134 real changes made)

.     *lower managerial and professional

.     replace bcs_manssec = 4 if (B3MSNSSEC>=7.1)&(B3MSNSSEC<=7.4) 
(2,154 real changes made)

.     *intermediate

.     replace bcs_manssec = 5 if (B3MSNSSEC>=8.1)&(B3MSNSSEC<=9.2) 
(365 real changes made)

.     *small employers and own account

.     replace bcs_manssec = 6 if (B3MSNSSEC>=10)&(B3MSNSSEC<=11.2) 
(174 real changes made)

.     *lower supervisory and technical

.     replace bcs_manssec = 7 if (B3MSNSSEC>=12.1)&(B3MSNSSEC<=12.7) 
(2,616 real changes made)

.     *semiroutine

.     replace bcs_manssec = 8 if (B3MSNSSEC>=13.1)&(B3MSNSSEC<=13.5) 
(2,936 real changes made)

.     *routine

.     tab bcs_manssec

bcs_manssec |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |         51        0.54        0.54
          2 |         95        1.00        1.53
          3 |      1,134       11.91       13.44
          4 |      2,154       22.61       36.05
          5 |        365        3.83       39.88
          6 |        174        1.83       41.71
          7 |      2,616       27.46       69.18
          8 |      2,936       30.82      100.00
------------+-----------------------------------
      Total |      9,525      100.00

.     label variable bcs_manssec "BCS Age 10 Mother's NSSEC"

.     label define nssec 1 "Large Employers and Higher Managerial" 2 "Higher Professional" 3 "Lower managerial and professional" 4 "Intermed
> iate" 5 "Small employers and own account" 6 "Lower Supervisory and Technical" 7 "Semi-Routine" 8 "Routine" , replace

.     label values bcs_manssec nssec

.     numlabel, add

.     tab bcs_manssec, mi

              BCS Age 10 Mother's NSSEC |      Freq.     Percent        Cum.
----------------------------------------+-----------------------------------
1. Large Employers and Higher Manageria |         51        0.34        0.34
                 2. Higher Professional |         95        0.64        0.98
   3. Lower managerial and professional |      1,134        7.62        8.61
                        4. Intermediate |      2,154       14.48       23.09
     5. Small employers and own account |        365        2.45       25.54
     6. Lower Supervisory and Technical |        174        1.17       26.71
                        7. Semi-Routine |      2,616       17.59       44.30
                             8. Routine |      2,936       19.74       64.04
                                      . |      5,349       35.96      100.00
----------------------------------------+-----------------------------------
                                  Total |     14,874      100.00

.     
. * return to jupyter

In [70]:
****I am going to recode NSSEC from above just to double check
capture drop ukempst
    gen ukempst = 0
    describe
    capture drop soc2000
    gen soc2000 = real(B3MSSOCC)
    
sort soc2000 ukempst
merge m:m soc2000 ukempst using $path1\OTHER\SOC2000_to_NSSEC_20160527_RC_V1.dta

sort soc2000
drop if _merge==2

tab nssec bcs_manssec
kap nssec bcs_manssec
*The two NS-SEC codings agree

drop _merge bcs_manssec ukempst

rename nssec bcs_manssecsimp
rename soc2000 bcs_mumsoc2000

label variable bcs_manssecsimp "BCS Age 10 Mother's NSSEC Simplified"

sort bcsid
save $path3\temp10.dta, replace

* return to jupyter
. ****I am going to recode NSSEC from above just to double check

. capture drop ukempst

.     gen ukempst = 0

.     describe

Contains data from F:\Data\RAWDATA\ARCHIVE\NCDSBCS_OCCS\bcs3_occupation_coding_mother.dta
  obs:        14,874                          
 vars:             5                          
 size:       401,598                          
--------------------------------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
--------------------------------------------------------------------------------------------------------------------------------------------
bcsid           str7    %7s                   bcsid
B3MSSOCC        str4    %4s                   (BCS 1980 Mother) SEMI auto SOC2000
B3MSNSSEC       double  %10.0g                (BCS 1980 Mother) NS-SEC social class code SEMI processing
bcs_manssec     float   %40.0g     nssec      BCS Age 10 Mother's NSSEC
ukempst         float   %9.0g                 
--------------------------------------------------------------------------------------------------------------------------------------------
Sorted by: 
     Note: Dataset has changed since last saved.

.     capture drop soc2000

.     gen soc2000 = real(B3MSSOCC)
(5,349 missing values generated)

.     
. sort soc2000 ukempst

. merge m:m soc2000 ukempst using $path1\OTHER\SOC2000_to_NSSEC_20160527_RC_V1.dta
(label nssec already defined)

    Result                           # of obs.
    -----------------------------------------
    not matched                         7,907
        from master                     5,349  (_merge==1)
        from using                      2,558  (_merge==2)

    matched                             9,525  (_merge==3)
    -----------------------------------------

. 
. sort soc2000

. drop if _merge==2
(2,558 observations deleted)

. 
. tab nssec bcs_manssec

                      |                                BCS Age 10 Mother's NSSEC
                nssec | 1. Large   2. Higher  3. Lower   4. Interm  5. Small   6. Lower   7. Semi-R  8. Routin |     Total
----------------------+----------------------------------------------------------------------------------------+----------
1. Large Employers an |        51          0          0          0          0          0          0          0 |        51 
2. Higher Professiona |         0         95          0          0          0          0          0          0 |        95 
3. Lower managerial a |         0          0      1,134          0          0          0          0          0 |     1,134 
      4. Intermediate |         0          0          0      2,154          0          0          0          0 |     2,154 
5. Small employers an |         0          0          0          0        365          0          0          0 |       365 
6. Lower Supervisory  |         0          0          0          0          0        174          0          0 |       174 
      7. Semi-Routine |         0          0          0          0          0          0      2,616          0 |     2,616 
           8. Routine |         0          0          0          0          0          0          0      2,936 |     2,936 
----------------------+----------------------------------------------------------------------------------------+----------
                Total |        51         95      1,134      2,154        365        174      2,616      2,936 |     9,525 


. kap nssec bcs_manssec

             Expected
Agreement   Agreement     Kappa   Std. Err.         Z      Prob>Z
-----------------------------------------------------------------
 100.00%      23.77%     1.0000     0.0055     181.76      0.0000

. *The two NS-SEC codings agree

. 
. drop _merge bcs_manssec ukempst

. 
. rename nssec bcs_manssecsimp

. rename soc2000 bcs_mumsoc2000

. 
. label variable bcs_manssecsimp "BCS Age 10 Mother's NSSEC Simplified"

. 
. sort bcsid

. save $path3\temp10.dta, replace
(note: file F:\Data\MYDATA\TEMP\temp10.dta not found)
file F:\Data\MYDATA\TEMP\temp10.dta saved

. 
. * return to jupyter

Here we use the response files to produce variables indicating the outcome at each sweep of the survey (e.g. productive, not productive).

We also code gender using the response files, as there is less missing data in this variable than the gender variable available in the individual sweeps.

In [71]:
****Information on response 

use $path1\ARCHIVE\BCS\response\bcs_response.dta, clear

keep BCSID OUTCME01 OUTCME02 OUTCME03 SEX
numlabel, add

capture drop bcs_male
    gen bcs_male = .
    replace bcs_male = 1 if (SEX==1)
    replace bcs_male = 0 if (SEX==2)
    label variable bcs_male "BCS Cohort member Male"
    label define yesno 1 "Yes" 0 "No", replace
    label values bcs_male yesno
    tab bcs_male, mi
    tab bcs_male SEX
    drop SEX

rename BCSID bcsid

*Outcome of the first survey
tab OUTCME01
rename OUTCME01 bcs_0outcome
    label variable bcs_0outcome "BCS response outcome 1970 (age 0)"
    
*Outcome of the age 5 survey
tab OUTCME02
rename OUTCME02 bcs_5outcome
    label variable bcs_5outcome "BCS response outcome 1975 (age 5)"
    
*Outcome of the age 10 survey
tab OUTCME03
rename OUTCME03 bcs_10outcome
    label variable bcs_10outcome "BCS response outcome 1980 (age 10)"

*Here we create a simple dummy variable to indicate whether the cohort
*member had a productive interview at the age 10 survey
tab bcs_10outcome
    gen sweeptestoutcome = 0
    replace sweeptestoutcome = 1 if (bcs_10outcome==1)
    tab bcs_10outcome sweeptestoutcome
    label variable sweeptestoutcome "Productive at age 10 survey"

sort bcsid
save $path3\temp11.dta, replace

* return to jupyter
. ****Information on response 

. 
. use $path1\ARCHIVE\BCS\response\bcs_response.dta, clear

. 
. keep BCSID OUTCME01 OUTCME02 OUTCME03 SEX

. numlabel, add

. 
. capture drop bcs_male

.     gen bcs_male = .
(19,006 missing values generated)

.     replace bcs_male = 1 if (SEX==1)
(9,686 real changes made)

.     replace bcs_male = 0 if (SEX==2)
(8,943 real changes made)

.     label variable bcs_male "BCS Cohort member Male"

.     label define yesno 1 "Yes" 0 "No", replace

.     label values bcs_male yesno

.     tab bcs_male, mi

 BCS Cohort |
member Male |      Freq.     Percent        Cum.
------------+-----------------------------------
         No |      8,943       47.05       47.05
        Yes |      9,686       50.96       98.02
          . |        377        1.98      100.00
------------+-----------------------------------
      Total |     19,006      100.00

.     tab bcs_male SEX

BCS Cohort |
    member | Sex of cohort member
      Male |   1. Male  2. Female |     Total
-----------+----------------------+----------
        No |         0      8,943 |     8,943 
       Yes |     9,686          0 |     9,686 
-----------+----------------------+----------
     Total |     9,686      8,943 |    18,629 


.     drop SEX

. 
. rename BCSID bcsid

. 
. *Outcome of the first survey

. tab OUTCME01

Outcome to BCS1 (1970)   |      Freq.     Percent        Cum.
-------------------------+-----------------------------------
           1. Productive |     17,196       90.48       90.48
   4. Other unproductive |         18        0.09       90.57
           6. Not Issued |      1,792        9.43      100.00
-------------------------+-----------------------------------
                   Total |     19,006      100.00

. rename OUTCME01 bcs_0outcome

.     label variable bcs_0outcome "BCS response outcome 1970 (age 0)"

.     
. *Outcome of the age 5 survey

. tab OUTCME02

Outcome to BCS2 (1975)   |      Freq.     Percent        Cum.
-------------------------+-----------------------------------
           1. Productive |     13,135       69.11       69.11
   4. Other unproductive |      3,256       17.13       86.24
           6. Not Issued |      2,016       10.61       96.85
                 8. Dead |        599        3.15      100.00
-------------------------+-----------------------------------
                   Total |     19,006      100.00

. rename OUTCME02 bcs_5outcome

.     label variable bcs_5outcome "BCS response outcome 1975 (age 5)"

.     
. *Outcome of the age 10 survey

. tab OUTCME03

Outcome to BCS3 (1980)   |      Freq.     Percent        Cum.
-------------------------+-----------------------------------
           1. Productive |     14,869       78.23       78.23
   4. Other unproductive |      2,381       12.53       90.76
           6. Not Issued |      1,146        6.03       96.79
                 8. Dead |        610        3.21      100.00
-------------------------+-----------------------------------
                   Total |     19,006      100.00

. rename OUTCME03 bcs_10outcome

.     label variable bcs_10outcome "BCS response outcome 1980 (age 10)"

. 
. *Here we create a simple dummy variable to indicate whether the cohort

. *member had a productive interview at the age 10 survey

. tab bcs_10outcome

    BCS response outcome |
           1980 (age 10) |      Freq.     Percent        Cum.
-------------------------+-----------------------------------
           1. Productive |     14,869       78.23       78.23
   4. Other unproductive |      2,381       12.53       90.76
           6. Not Issued |      1,146        6.03       96.79
                 8. Dead |        610        3.21      100.00
-------------------------+-----------------------------------
                   Total |     19,006      100.00

.     gen sweeptestoutcome = 0

.     replace sweeptestoutcome = 1 if (bcs_10outcome==1)
(14,869 real changes made)

.     tab bcs_10outcome sweeptestoutcome

 BCS response outcome |   sweeptestoutcome
        1980 (age 10) |         0          1 |     Total
----------------------+----------------------+----------
        1. Productive |         0     14,869 |    14,869 
4. Other unproductive |     2,381          0 |     2,381 
        6. Not Issued |     1,146          0 |     1,146 
              8. Dead |       610          0 |       610 
----------------------+----------------------+----------
                Total |     4,137     14,869 |    19,006 


.     label variable sweeptestoutcome "Productive at age 10 survey"

. 
. sort bcsid

. save $path3\temp11.dta, replace
(note: file F:\Data\MYDATA\TEMP\temp11.dta not found)
file F:\Data\MYDATA\TEMP\temp11.dta saved

. 
. * return to jupyter

We also clean some additional variables that may potentially be used when producing the weights, and in the multiple imputation.

In [72]:
use $path1\ARCHIVE\BCS\S1\bcs1derived.dta, clear

keep BCSID BD1MAGE BD1REGN BD1AGEFB BD1FAGE BD1MAGM

numlabel, add

*Mother's age at cohort member's birth
tab BD1MAGE
    recode BD1MAGE (-8=.)
    rename BD1MAGE bcs_mumagebirth
    label variable bcs_mumagebirth "BCS Mother's Age at Cohort Member's Birth"
    tab bcs_mumagebirth

*Father's age at cohort member's birth
tab BD1FAGE
    recode BD1FAGE (-8=.) (-1=.)
    rename BD1FAGE bcs_dadagebirth
    label variable bcs_dadagebirth "BCS Father's Age at Cohort Member's Birth"
    tab bcs_dadagebirth

*Mother was married at cohort member's birth
tab BD1MAGM
    capture drop bcs_mummarried
    gen bcs_mummarried = .
    replace bcs_mummarried = 1 if (BD1MAGM>=5)
    replace bcs_mummarried = 0 if (BD1MAGM==-1)
    label variable bcs_mummarried "BCS Mother married at Cohort Member's Birth"
    label define yesno 1 "Yes" 0 "No"
    label values bcs_mummarried yesno
    tab bcs_mummarried, mi
    drop BD1MAGM

*Mother's age at first birth
tab BD1AGEFB
    recode BD1AGEFB (-8=.)
    rename BD1AGEFB bcs_mumagefirstbirth
    label variable bcs_mumagefirstbirth "BCS Mother's Age at First Birth"
    tab bcs_mumagefirstbirth

*Region at the cohort member's birth
tab BD1REGN
    rename BD1REGN bcs_region
    label variable bcs_region "BCS Region at Birth"

rename BCSID bcsid

sort bcsid

save $path3\temp12.dta, replace

* return to jupyter
. use $path1\ARCHIVE\BCS\S1\bcs1derived.dta, clear

. 
. keep BCSID BD1MAGE BD1REGN BD1AGEFB BD1FAGE BD1MAGM

. 
. numlabel, add

. 
. *Mother's age at cohort member's birth

. tab BD1MAGE

 1970: Age of mother at CM's |
birth (from s1 var a0005a/s2 |
                 var e008)   |      Freq.     Percent        Cum.
-----------------------------+-----------------------------------
-8. No information available |         33        0.19        0.19
                          14 |          2        0.01        0.20
                          15 |         26        0.15        0.35
                          16 |        130        0.76        1.11
                          17 |        304        1.77        2.88
                          18 |        517        3.01        5.89
                          19 |        689        4.01        9.89
                          20 |        926        5.38       15.28
                          21 |      1,104        6.42       21.70
                          22 |      1,349        7.84       29.54
                          23 |      1,489        8.66       38.20
                          24 |      1,226        7.13       45.33
                          25 |      1,308        7.61       52.94
                          26 |      1,200        6.98       59.92
                          27 |      1,124        6.54       66.45
                          28 |        848        4.93       71.38
                          29 |        820        4.77       76.15
                          30 |        728        4.23       80.38
                          31 |        618        3.59       83.98
                          32 |        499        2.90       86.88
                          33 |        409        2.38       89.26
                          34 |        364        2.12       91.38
                          35 |        330        1.92       93.29
                          36 |        236        1.37       94.67
                          37 |        218        1.27       95.94
                          38 |        186        1.08       97.02
                          39 |        161        0.94       97.95
                          40 |        128        0.74       98.70
                          41 |         85        0.49       99.19
                          42 |         61        0.35       99.55
                          43 |         36        0.21       99.76
                          44 |         21        0.12       99.88
                          45 |          7        0.04       99.92
                          46 |          7        0.04       99.96
                          47 |          2        0.01       99.97
                          49 |          1        0.01       99.98
                          50 |          1        0.01       99.98
                          51 |          1        0.01       99.99
                          52 |          1        0.01       99.99
                          53 |          1        0.01      100.00
-----------------------------+-----------------------------------
                       Total |     17,196      100.00

.     recode BD1MAGE (-8=.)
(BD1MAGE: 33 changes made)

.     rename BD1MAGE bcs_mumagebirth

.     label variable bcs_mumagebirth "BCS Mother's Age at Cohort Member's Birth"

.     tab bcs_mumagebirth

  BCS Mother's Age at Cohort |
              Member's Birth |      Freq.     Percent        Cum.
-----------------------------+-----------------------------------
                          14 |          2        0.01        0.01
                          15 |         26        0.15        0.16
                          16 |        130        0.76        0.92
                          17 |        304        1.77        2.69
                          18 |        517        3.01        5.70
                          19 |        689        4.01        9.72
                          20 |        926        5.40       15.11
                          21 |      1,104        6.43       21.55
                          22 |      1,349        7.86       29.41
                          23 |      1,489        8.68       38.08
                          24 |      1,226        7.14       45.23
                          25 |      1,308        7.62       52.85
                          26 |      1,200        6.99       59.84
                          27 |      1,124        6.55       66.39
                          28 |        848        4.94       71.33
                          29 |        820        4.78       76.11
                          30 |        728        4.24       80.35
                          31 |        618        3.60       83.95
                          32 |        499        2.91       86.86
                          33 |        409        2.38       89.24
                          34 |        364        2.12       91.36
                          35 |        330        1.92       93.28
                          36 |        236        1.38       94.66
                          37 |        218        1.27       95.93
                          38 |        186        1.08       97.01
                          39 |        161        0.94       97.95
                          40 |        128        0.75       98.69
                          41 |         85        0.50       99.19
                          42 |         61        0.36       99.55
                          43 |         36        0.21       99.76
                          44 |         21        0.12       99.88
                          45 |          7        0.04       99.92
                          46 |          7        0.04       99.96
                          47 |          2        0.01       99.97
                          49 |          1        0.01       99.98
                          50 |          1        0.01       99.98
                          51 |          1        0.01       99.99
                          52 |          1        0.01       99.99
                          53 |          1        0.01      100.00
-----------------------------+-----------------------------------
                       Total |     17,163      100.00

. 
. *Father's age at cohort member's birth

. tab BD1FAGE

 1970: Age of father at CM's |
    birth (from s2 var e009) |      Freq.     Percent        Cum.
-----------------------------+-----------------------------------
-8. No information available |      4,729       27.50       27.50
    -1. N/A No father figure |        529        3.08       30.58
                          14 |          1        0.01       30.58
                          15 |          4        0.02       30.61
                          16 |         11        0.06       30.67
                          17 |         40        0.23       30.90
                          18 |         89        0.52       31.42
                          19 |        190        1.10       32.53
                          20 |        252        1.47       33.99
                          21 |        426        2.48       36.47
                          22 |        686        3.99       40.46
                          23 |        721        4.19       44.65
                          24 |        741        4.31       48.96
                          25 |        831        4.83       53.79
                          26 |        897        5.22       59.01
                          27 |        842        4.90       63.90
                          28 |        700        4.07       67.98
                          29 |        685        3.98       71.96
                          30 |        736        4.28       76.24
                          31 |        563        3.27       79.51
                          32 |        502        2.92       82.43
                          33 |        500        2.91       85.34
                          34 |        417        2.42       87.76
                          35 |        304        1.77       89.53
                          36 |        316        1.84       91.37
                          37 |        228        1.33       92.70
                          38 |        245        1.42       94.12
                          39 |        160        0.93       95.05
                          40 |        178        1.04       96.09
                          41 |        122        0.71       96.80
                          42 |        102        0.59       97.39
                          43 |         91        0.53       97.92
                          44 |         69        0.40       98.32
                          45 |         50        0.29       98.61
                          46 |         26        0.15       98.76
                          47 |         40        0.23       98.99
                          48 |         39        0.23       99.22
                          49 |         27        0.16       99.38
                          50 |         26        0.15       99.53
                          51 |         11        0.06       99.59
                          52 |         13        0.08       99.67
                          53 |          5        0.03       99.70
                          54 |          8        0.05       99.74
                          55 |          8        0.05       99.79
                          56 |          6        0.03       99.83
                          57 |          5        0.03       99.85
                          58 |          4        0.02       99.88
                          59 |          5        0.03       99.91
                          60 |          1        0.01       99.91
                          61 |          1        0.01       99.92
                          62 |          2        0.01       99.93
                          63 |          3        0.02       99.95
                          64 |          2        0.01       99.96
                          65 |          3        0.02       99.98
                          67 |          1        0.01       99.98
                          68 |          1        0.01       99.99
                          70 |          1        0.01       99.99
                          72 |          1        0.01      100.00
-----------------------------+-----------------------------------
                       Total |     17,196      100.00

.     recode BD1FAGE (-8=.) (-1=.)
(BD1FAGE: 5258 changes made)

.     rename BD1FAGE bcs_dadagebirth

.     label variable bcs_dadagebirth "BCS Father's Age at Cohort Member's Birth"

.     tab bcs_dadagebirth

  BCS Father's Age at Cohort |
              Member's Birth |      Freq.     Percent        Cum.
-----------------------------+-----------------------------------
                          14 |          1        0.01        0.01
                          15 |          4        0.03        0.04
                          16 |         11        0.09        0.13
                          17 |         40        0.34        0.47
                          18 |         89        0.75        1.21
                          19 |        190        1.59        2.81
                          20 |        252        2.11        4.92
                          21 |        426        3.57        8.49
                          22 |        686        5.75       14.23
                          23 |        721        6.04       20.27
                          24 |        741        6.21       26.48
                          25 |        831        6.96       33.44
                          26 |        897        7.51       40.95
                          27 |        842        7.05       48.01
                          28 |        700        5.86       53.87
                          29 |        685        5.74       59.61
                          30 |        736        6.17       65.77
                          31 |        563        4.72       70.49
                          32 |        502        4.21       74.69
                          33 |        500        4.19       78.88
                          34 |        417        3.49       82.38
                          35 |        304        2.55       84.92
                          36 |        316        2.65       87.57
                          37 |        228        1.91       89.48
                          38 |        245        2.05       91.53
                          39 |        160        1.34       92.87
                          40 |        178        1.49       94.36
                          41 |        122        1.02       95.38
                          42 |        102        0.85       96.24
                          43 |         91        0.76       97.00
                          44 |         69        0.58       97.58
                          45 |         50        0.42       98.00
                          46 |         26        0.22       98.22
                          47 |         40        0.34       98.55
                          48 |         39        0.33       98.88
                          49 |         27        0.23       99.10
                          50 |         26        0.22       99.32
                          51 |         11        0.09       99.41
                          52 |         13        0.11       99.52
                          53 |          5        0.04       99.56
                          54 |          8        0.07       99.63
                          55 |          8        0.07       99.70
                          56 |          6        0.05       99.75
                          57 |          5        0.04       99.79
                          58 |          4        0.03       99.82
                          59 |          5        0.04       99.87
                          60 |          1        0.01       99.87
                          61 |          1        0.01       99.88
                          62 |          2        0.02       99.90
                          63 |          3        0.03       99.92
                          64 |          2        0.02       99.94
                          65 |          3        0.03       99.97
                          67 |          1        0.01       99.97
                          68 |          1        0.01       99.98
                          70 |          1        0.01       99.99
                          72 |          1        0.01      100.00
-----------------------------+-----------------------------------
                       Total |     11,938      100.00

. 
. *Mother was married at cohort member's birth

. tab BD1MAGM

      1970: Age of mother at |
           present marriage  |      Freq.     Percent        Cum.
-----------------------------+-----------------------------------
-8. No information available |        185        1.08        1.08
         -1. N/A Not married |      1,000        5.82        6.89
                           5 |          1        0.01        6.90
                           9 |          1        0.01        6.90
                          11 |          1        0.01        6.91
                          12 |          4        0.02        6.93
                          13 |         17        0.10        7.03
                          14 |         15        0.09        7.12
                          15 |        182        1.06        8.18
                          16 |        590        3.43       11.61
                          17 |      1,123        6.53       18.14
                          18 |      1,891       11.00       29.13
                          19 |      2,334       13.57       42.71
                          20 |      2,521       14.66       57.37
                          21 |      2,053       11.94       69.31
                          22 |      1,488        8.65       77.96
                          23 |      1,091        6.34       84.30
                          24 |        716        4.16       88.47
                          25 |        524        3.05       91.52
                          26 |        317        1.84       93.36
                          27 |        281        1.63       94.99
                          28 |        198        1.15       96.14
                          29 |        173        1.01       97.15
                          30 |        101        0.59       97.74
                          31 |         94        0.55       98.28
                          32 |         78        0.45       98.74
                          33 |         53        0.31       99.05
                          34 |         39        0.23       99.27
                          35 |         30        0.17       99.45
                          36 |         32        0.19       99.63
                          37 |         21        0.12       99.76
                          38 |         16        0.09       99.85
                          39 |         13        0.08       99.92
                          40 |          9        0.05       99.98
                          41 |          1        0.01       99.98
                          42 |          1        0.01       99.99
                          44 |          1        0.01       99.99
                          51 |          1        0.01      100.00
-----------------------------+-----------------------------------
                       Total |     17,196      100.00

.     capture drop bcs_mummarried

.     gen bcs_mummarried = .
(17,196 missing values generated)

.     replace bcs_mummarried = 1 if (BD1MAGM>=5)
(16,011 real changes made)

.     replace bcs_mummarried = 0 if (BD1MAGM==-1)
(1,000 real changes made)

.     label variable bcs_mummarried "BCS Mother married at Cohort Member's Birth"

.     label define yesno 1 "Yes" 0 "No"

.     label values bcs_mummarried yesno

.     tab bcs_mummarried, mi

 BCS Mother |
 married at |
     Cohort |
   Member's |
      Birth |      Freq.     Percent        Cum.
------------+-----------------------------------
         No |      1,000        5.82        5.82
        Yes |     16,011       93.11       98.92
          . |        185        1.08      100.00
------------+-----------------------------------
      Total |     17,196      100.00

.     drop BD1MAGM

. 
. *Mother's age at first birth

. tab BD1AGEFB

1970: Age of mother at first |
                     birth   |      Freq.     Percent        Cum.
-----------------------------+-----------------------------------
-8. No information available |        118        0.69        0.69
                          12 |          1        0.01        0.69
                          13 |          5        0.03        0.72
                          14 |         34        0.20        0.92
                          15 |        127        0.74        1.66
                          16 |        450        2.62        4.27
                          17 |        947        5.51        9.78
                          18 |      1,411        8.21       17.99
                          19 |      1,662        9.67       27.65
                          20 |      1,855       10.79       38.44
                          21 |      1,887       10.97       49.41
                          22 |      1,848       10.75       60.16
                          23 |      1,525        8.87       69.03
                          24 |      1,287        7.48       76.51
                          25 |      1,080        6.28       82.79
                          26 |        782        4.55       87.34
                          27 |        581        3.38       90.72
                          28 |        415        2.41       93.13
                          29 |        314        1.83       94.96
                          30 |        235        1.37       96.32
                          31 |        147        0.85       97.18
                          32 |        130        0.76       97.94
                          33 |         87        0.51       98.44
                          34 |         72        0.42       98.86
                          35 |         55        0.32       99.18
                          36 |         33        0.19       99.37
                          37 |         25        0.15       99.52
                          38 |         25        0.15       99.66
                          39 |         20        0.12       99.78
                          40 |         19        0.11       99.89
                          41 |          4        0.02       99.91
                          42 |          7        0.04       99.95
                          43 |          5        0.03       99.98
                          45 |          1        0.01       99.99
                          46 |          1        0.01       99.99
                          47 |          1        0.01      100.00
-----------------------------+-----------------------------------
                       Total |     17,196      100.00

.     recode BD1AGEFB (-8=.)
(BD1AGEFB: 118 changes made)

.     rename BD1AGEFB bcs_mumagefirstbirth

.     label variable bcs_mumagefirstbirth "BCS Mother's Age at First Birth"

.     tab bcs_mumagefirstbirth

   BCS Mother's Age at First |
                       Birth |      Freq.     Percent        Cum.
-----------------------------+-----------------------------------
                          12 |          1        0.01        0.01
                          13 |          5        0.03        0.04
                          14 |         34        0.20        0.23
                          15 |        127        0.74        0.98
                          16 |        450        2.63        3.61
                          17 |        947        5.55        9.16
                          18 |      1,411        8.26       17.42
                          19 |      1,662        9.73       27.15
                          20 |      1,855       10.86       38.01
                          21 |      1,887       11.05       49.06
                          22 |      1,848       10.82       59.88
                          23 |      1,525        8.93       68.81
                          24 |      1,287        7.54       76.35
                          25 |      1,080        6.32       82.67
                          26 |        782        4.58       87.25
                          27 |        581        3.40       90.65
                          28 |        415        2.43       93.08
                          29 |        314        1.84       94.92
                          30 |        235        1.38       96.30
                          31 |        147        0.86       97.16
                          32 |        130        0.76       97.92
                          33 |         87        0.51       98.43
                          34 |         72        0.42       98.85
                          35 |         55        0.32       99.17
                          36 |         33        0.19       99.37
                          37 |         25        0.15       99.51
                          38 |         25        0.15       99.66
                          39 |         20        0.12       99.78
                          40 |         19        0.11       99.89
                          41 |          4        0.02       99.91
                          42 |          7        0.04       99.95
                          43 |          5        0.03       99.98
                          45 |          1        0.01       99.99
                          46 |          1        0.01       99.99
                          47 |          1        0.01      100.00
-----------------------------+-----------------------------------
                       Total |     17,078      100.00

. 
. *Region at the cohort member's birth

. tab BD1REGN

  1970: Standard Region |
         of residence   |      Freq.     Percent        Cum.
------------------------+-----------------------------------
               1. North |      1,023        5.95        5.95
2. Yorks and Humberside |      1,486        8.64       14.59
       3. East Midlands |      1,036        6.02       20.62
         4. East Anglia |        539        3.13       23.75
          5. South East |      5,022       29.20       52.95
          6. South West |      1,051        6.11       59.07
       7. West Midlands |      1,745       10.15       69.21
          8. North West |      2,170       12.62       81.83
               9. Wales |        879        5.11       86.94
           10. Scotland |      1,617        9.40       96.35
   11. Northern Ireland |        628        3.65      100.00
------------------------+-----------------------------------
                  Total |     17,196      100.00

.     rename BD1REGN bcs_region

.     label variable bcs_region "BCS Region at Birth"

. 
. rename BCSID bcsid

. 
. sort bcsid

. 
. save $path3\temp12.dta, replace
(note: file F:\Data\MYDATA\TEMP\temp12.dta not found)
file F:\Data\MYDATA\TEMP\temp12.dta saved

. 
. * return to jupyter

In [73]:
use $path1\ARCHIVE\BCS\S1\bcs7072a.dta, clear

keep bcsid a0166 a0037 a0038 a0297 a0255 
numlabel, add

tab a0166
recode a0166 (-2=.)
rename a0166 bcs_parity
label variable bcs_parity "BCS Parity at Birth"
tab bcs_parity

*Mother attended mothercraft classes
tab a0037
capture drop bcs_mothercraft
    gen bcs_mothercraft = .
    replace bcs_mothercraft = 1 if (a0037>=2)&(a0037<=8)
    replace bcs_mothercraft = 0 if (a0037==1)
    label variable bcs_mothercraft "BCS Mother Attended Mothercraft Classes"
    label define yesno 1 "Yes" 0 "No", replace
    label values bcs_mothercraft yesno
    numlabel, add
    tab bcs_mothercraft, mi

*Mother attended labour classes
tab a0038
capture drop bcs_labourclass
    gen bcs_labourclass = .
    replace bcs_labourclass = 1 if (a0038>=2)&(a0038<=8)
    replace bcs_labourclass = 0 if (a0038==1)
    label variable bcs_labourclass "BCS Mother Attended Labour Classes"
    label values bcs_labourclass yesno
    numlabel, add
    tab bcs_labourclass, mi

drop a0037 a0038

*Mother attempted breast feeding
tab a0297
    capture drop bcs_breast
    gen bcs_breast = .
    replace bcs_breast = 1 if (a0297==1)
    replace bcs_breast = 0 if (a0297==2)
    label variable bcs_breast "BCS Mother Attempted Breast Feeding"
    label values bcs_breast yesno
    tab bcs_breast, mi
    drop a0297

sort bcsid

save $path3\temp13.dta, replace

* return to jupyter
. use $path1\ARCHIVE\BCS\S1\bcs7072a.dta, clear

. 
. keep bcsid a0166 a0037 a0038 a0297 a0255 

. numlabel, add

. 
. tab a0166

             Parity |      Freq.     Percent        Cum.
--------------------+-----------------------------------
      -2. Not Known |         32        0.19        0.19
                  0 |      6,389       37.15       37.34
                  1 |      5,520       32.10       69.44
                  2 |      2,787       16.21       85.65
                  3 |      1,266        7.36       93.01
                  4 |        609        3.54       96.55
                  5 |        297        1.73       98.28
                  6 |        136        0.79       99.07
                  7 |         74        0.43       99.50
                  8 |         36        0.21       99.71
                  9 |         23        0.13       99.84
                 10 |         11        0.06       99.91
                 11 |          9        0.05       99.96
                 12 |          3        0.02       99.98
                 13 |          2        0.01       99.99
                 14 |          1        0.01       99.99
                 17 |          1        0.01      100.00
--------------------+-----------------------------------
              Total |     17,196      100.00

. recode a0166 (-2=.)
(a0166: 32 changes made)

. rename a0166 bcs_parity

. label variable bcs_parity "BCS Parity at Birth"

. tab bcs_parity

BCS Parity at Birth |      Freq.     Percent        Cum.
--------------------+-----------------------------------
                  0 |      6,389       37.22       37.22
                  1 |      5,520       32.16       69.38
                  2 |      2,787       16.24       85.62
                  3 |      1,266        7.38       93.00
                  4 |        609        3.55       96.55
                  5 |        297        1.73       98.28
                  6 |        136        0.79       99.07
                  7 |         74        0.43       99.50
                  8 |         36        0.21       99.71
                  9 |         23        0.13       99.84
                 10 |         11        0.06       99.91
                 11 |          9        0.05       99.96
                 12 |          3        0.02       99.98
                 13 |          2        0.01       99.99
                 14 |          1        0.01       99.99
                 17 |          1        0.01      100.00
--------------------+-----------------------------------
              Total |     17,164      100.00

. 
. *Mother attended mothercraft classes

. tab a0037

   Mothercraft classes |      Freq.     Percent        Cum.
-----------------------+-----------------------------------
        -3. Not Stated |         90        0.52        0.52
         -2. Not Known |         33        0.19        0.72
               1. None |     12,372       71.95       72.66
2. Individual Instruct |        940        5.47       78.13
   3. LHA Clinic Class |      1,923       11.18       89.31
     4. Hospital Class |      1,417        8.24       97.55
              5. Other |        201        1.17       98.72
              6. 2 & 3 |         98        0.57       99.29
              7. 3 & 4 |         83        0.48       99.77
              8. 2 & 4 |         39        0.23      100.00
-----------------------+-----------------------------------
                 Total |     17,196      100.00

. capture drop bcs_mothercraft

.     gen bcs_mothercraft = .
(17,196 missing values generated)

.     replace bcs_mothercraft = 1 if (a0037>=2)&(a0037<=8)
(4,701 real changes made)

.     replace bcs_mothercraft = 0 if (a0037==1)
(12,372 real changes made)

.     label variable bcs_mothercraft "BCS Mother Attended Mothercraft Classes"

.     label define yesno 1 "Yes" 0 "No", replace

.     label values bcs_mothercraft yesno

.     numlabel, add

.     tab bcs_mothercraft, mi

 BCS Mother |
   Attended |
Mothercraft |
    Classes |      Freq.     Percent        Cum.
------------+-----------------------------------
      0. No |     12,372       71.95       71.95
     1. Yes |      4,701       27.34       99.28
          . |        123        0.72      100.00
------------+-----------------------------------
      Total |     17,196      100.00

. 
. *Mother attended labour classes

. tab a0038

    Labour-preparation |
               classes |      Freq.     Percent        Cum.
-----------------------+-----------------------------------
        -3. Not Stated |        105        0.61        0.61
         -2. Not Known |         34        0.20        0.81
               1. None |     12,550       72.98       73.79
2. Individual Instruct |        641        3.73       77.52
   3. LHA Clinic Class |      2,192       12.75       90.27
     4. Hospital Class |      1,326        7.71       97.98
              5. Other |        251        1.46       99.44
              6. 2 & 3 |         42        0.24       99.68
              7. 3 & 4 |         44        0.26       99.94
              8. 2 & 4 |         11        0.06      100.00
-----------------------+-----------------------------------
                 Total |     17,196      100.00

. capture drop bcs_labourclass

.     gen bcs_labourclass = .
(17,196 missing values generated)

.     replace bcs_labourclass = 1 if (a0038>=2)&(a0038<=8)
(4,507 real changes made)

.     replace bcs_labourclass = 0 if (a0038==1)
(12,550 real changes made)

.     label variable bcs_labourclass "BCS Mother Attended Labour Classes"

.     label values bcs_labourclass yesno

.     numlabel, add
(no value label to be modified)

.     tab bcs_labourclass, mi

 BCS Mother |
   Attended |
     Labour |
    Classes |      Freq.     Percent        Cum.
------------+-----------------------------------
      0. No |     12,550       72.98       72.98
     1. Yes |      4,507       26.21       99.19
          . |        139        0.81      100.00
------------+-----------------------------------
      Total |     17,196      100.00

. 
. drop a0037 a0038

. 
. *Mother attempted breast feeding

. tab a0297

   Was Lactation |
       Attempted |      Freq.     Percent        Cum.
-----------------+-----------------------------------
  -3. Not Stated |        228        1.33        1.33
   -2. Not Known |          3        0.02        1.34
    1. Attempted |      6,311       36.70       38.04
2. Not Attempted |     10,654       61.96      100.00
-----------------+-----------------------------------
           Total |     17,196      100.00

.     capture drop bcs_breast

.     gen bcs_breast = .
(17,196 missing values generated)

.     replace bcs_breast = 1 if (a0297==1)
(6,311 real changes made)

.     replace bcs_breast = 0 if (a0297==2)
(10,654 real changes made)

.     label variable bcs_breast "BCS Mother Attempted Breast Feeding"

.     label values bcs_breast yesno

.     tab bcs_breast, mi

 BCS Mother |
  Attempted |
     Breast |
    Feeding |      Freq.     Percent        Cum.
------------+-----------------------------------
      0. No |     10,654       61.96       61.96
     1. Yes |      6,311       36.70       98.66
          . |        231        1.34      100.00
------------+-----------------------------------
      Total |     17,196      100.00

.     drop a0297

. 
. sort bcsid

. 
. save $path3\temp13.dta, replace
(note: file F:\Data\MYDATA\TEMP\temp13.dta not found)
file F:\Data\MYDATA\TEMP\temp13.dta saved

. 
. * return to jupyter

Merge all these pieces of information together to create a working BCS data file.

In [74]:
use $path3\temp1.dta, clear
    sort bcsid
    merge 1:1 bcsid using $path3\temp2.dta
    drop _merge
    sort bcsid
    duplicates report bcsid
    merge 1:1 bcsid using $path3\temp3.dta
    drop _merge
    sort bcsid
    duplicates report bcsid
    merge 1:1 bcsid using $path3\temp4.dta
    drop _merge
    sort bcsid
    duplicates report bcsid
    merge 1:1 bcsid using $path3\temp5.dta
    drop _merge
    sort bcsid
    duplicates report bcsid
    merge 1:1 bcsid using $path3\temp6.dta
    drop _merge
    sort bcsid
    duplicates report bcsid
    merge 1:1 bcsid using $path3\temp7.dta
    drop _merge
    sort bcsid
    duplicates report bcsid
    merge 1:1 bcsid using $path3\temp8.dta
    drop _merge
    sort bcsid
    duplicates report bcsid
    merge 1:1 bcsid using $path3\temp9.dta
    drop _merge
    sort bcsid
    duplicates report bcsid
    merge 1:1 bcsid using $path3\temp10.dta
    drop _merge
    sort bcsid
    duplicates report bcsid
    merge 1:1 bcsid using $path3\temp11.dta
    drop _merge
    sort bcsid
    duplicates report bcsid
    merge 1:1 bcsid using $path3\temp12.dta
    drop _merge
    sort bcsid
    duplicates report bcsid
    merge 1:1 bcsid using $path3\temp13.dta
    drop _merge
    sort bcsid
    duplicates report bcsid
    
capture drop cohort
    gen cohort=2
    label variable cohort "Cohort"
    label define cohort 1 "NCDS" 2 "BCS"
    label values cohort cohort
    tab cohort, mi

sort bcsid
save $path2\BCS_MAIN.dta, replace

* return to jupyter
. use $path3\temp1.dta, clear

.     sort bcsid

.     merge 1:1 bcsid using $path3\temp2.dta

    Result                           # of obs.
    -----------------------------------------
    not matched                         4,835
        from master                     4,448  (_merge==1)
        from using                        387  (_merge==2)

    matched                            12,748  (_merge==3)
    -----------------------------------------

.     drop _merge

.     sort bcsid

.     duplicates report bcsid

Duplicates in terms of bcsid

--------------------------------------
   copies | observations       surplus
----------+---------------------------
        1 |        17583             0
--------------------------------------

.     merge 1:1 bcsid using $path3\temp3.dta

    Result                           # of obs.
    -----------------------------------------
    not matched                           387
        from master                       387  (_merge==1)
        from using                          0  (_merge==2)

    matched                            17,196  (_merge==3)
    -----------------------------------------

.     drop _merge

.     sort bcsid

.     duplicates report bcsid

Duplicates in terms of bcsid

--------------------------------------
   copies | observations       surplus
----------+---------------------------
        1 |        17583             0
--------------------------------------

.     merge 1:1 bcsid using $path3\temp4.dta

    Result                           # of obs.
    -----------------------------------------
    not matched                         4,448
        from master                     4,448  (_merge==1)
        from using                          0  (_merge==2)

    matched                            13,135  (_merge==3)
    -----------------------------------------

.     drop _merge

.     sort bcsid

.     duplicates report bcsid

Duplicates in terms of bcsid

--------------------------------------
   copies | observations       surplus
----------+---------------------------
        1 |        17583             0
--------------------------------------

.     merge 1:1 bcsid using $path3\temp5.dta

    Result                           # of obs.
    -----------------------------------------
    not matched                         4,252
        from master                     3,480  (_merge==1)
        from using                        772  (_merge==2)

    matched                            14,103  (_merge==3)
    -----------------------------------------

.     drop _merge

.     sort bcsid

.     duplicates report bcsid

Duplicates in terms of bcsid

--------------------------------------
   copies | observations       surplus
----------+---------------------------
        1 |        18355             0
--------------------------------------

.     merge 1:1 bcsid using $path3\temp6.dta
(label rgsc already defined)

    Result                           # of obs.
    -----------------------------------------
    not matched                         3,485
        from master                     3,485  (_merge==1)
        from using                          0  (_merge==2)

    matched                            14,870  (_merge==3)
    -----------------------------------------

.     drop _merge

.     sort bcsid

.     duplicates report bcsid

Duplicates in terms of bcsid

--------------------------------------
   copies | observations       surplus
----------+---------------------------
        1 |        18355             0
--------------------------------------

.     merge 1:1 bcsid using $path3\temp7.dta
(label rgsc already defined)

    Result                           # of obs.
    -----------------------------------------
    not matched                         7,074
        from master                     6,907  (_merge==1)
        from using                        167  (_merge==2)

    matched                            11,448  (_merge==3)
    -----------------------------------------

.     drop _merge

.     sort bcsid

.     duplicates report bcsid

Duplicates in terms of bcsid

--------------------------------------
   copies | observations       surplus
----------+---------------------------
        1 |        18522             0
--------------------------------------

.     merge 1:1 bcsid using $path3\temp8.dta

    Result                           # of obs.
    -----------------------------------------
    not matched                         3,652
        from master                     3,652  (_merge==1)
        from using                          0  (_merge==2)

    matched                            14,870  (_merge==3)
    -----------------------------------------

.     drop _merge

.     sort bcsid

.     duplicates report bcsid

Duplicates in terms of bcsid

--------------------------------------
   copies | observations       surplus
----------+---------------------------
        1 |        18522             0
--------------------------------------

.     merge 1:1 bcsid using $path3\temp9.dta

    Result                           # of obs.
    -----------------------------------------
    not matched                         3,808
        from master                     3,728  (_merge==1)
        from using                         80  (_merge==2)

    matched                            14,794  (_merge==3)
    -----------------------------------------

.     drop _merge

.     sort bcsid

.     duplicates report bcsid

Duplicates in terms of bcsid

--------------------------------------
   copies | observations       surplus
----------+---------------------------
        1 |        18602             0
--------------------------------------

.     merge 1:1 bcsid using $path3\temp10.dta
(label nssec already defined)

    Result                           # of obs.
    -----------------------------------------
    not matched                         3,728
        from master                     3,728  (_merge==1)
        from using                          0  (_merge==2)

    matched                            14,874  (_merge==3)
    -----------------------------------------

.     drop _merge

.     sort bcsid

.     duplicates report bcsid

Duplicates in terms of bcsid

--------------------------------------
   copies | observations       surplus
----------+---------------------------
        1 |        18602             0
--------------------------------------

.     merge 1:1 bcsid using $path3\temp11.dta

    Result                           # of obs.
    -----------------------------------------
    not matched                           570
        from master                        83  (_merge==1)
        from using                        487  (_merge==2)

    matched                            18,519  (_merge==3)
    -----------------------------------------

.     drop _merge

.     sort bcsid

.     duplicates report bcsid

Duplicates in terms of bcsid

--------------------------------------
   copies | observations       surplus
----------+---------------------------
        1 |        19089             0
--------------------------------------

.     merge 1:1 bcsid using $path3\temp12.dta
(label yesno already defined)

    Result                           # of obs.
    -----------------------------------------
    not matched                         1,893
        from master                     1,893  (_merge==1)
        from using                          0  (_merge==2)

    matched                            17,196  (_merge==3)
    -----------------------------------------

.     drop _merge

.     sort bcsid

.     duplicates report bcsid

Duplicates in terms of bcsid

--------------------------------------
   copies | observations       surplus
----------+---------------------------
        1 |        19089             0
--------------------------------------

.     merge 1:1 bcsid using $path3\temp13.dta
(label A0255 already defined)
(label yesno already defined)

    Result                           # of obs.
    -----------------------------------------
    not matched                         1,893
        from master                     1,893  (_merge==1)
        from using                          0  (_merge==2)

    matched                            17,196  (_merge==3)
    -----------------------------------------

.     drop _merge

.     sort bcsid

.     duplicates report bcsid

Duplicates in terms of bcsid

--------------------------------------
   copies | observations       surplus
----------+---------------------------
        1 |        19089             0
--------------------------------------

.     
. capture drop cohort

.     gen cohort=2

.     label variable cohort "Cohort"

.     label define cohort 1 "NCDS" 2 "BCS"

.     label values cohort cohort

.     tab cohort, mi

     Cohort |      Freq.     Percent        Cum.
------------+-----------------------------------
        BCS |     19,089      100.00      100.00
------------+-----------------------------------
      Total |     19,089      100.00

. 
. sort bcsid

. save $path2\BCS_MAIN.dta, replace
file F:\Data\MYDATA\WORK\BCS_MAIN.dta saved

. 
. * return to jupyter

Delete the temporary data files.

In [75]:
erase $path3\temp1.dta
erase $path3\temp2.dta
erase $path3\temp3.dta
erase $path3\temp4.dta
erase $path3\temp5.dta
erase $path3\temp6.dta
erase $path3\temp7.dta
erase $path3\temp8.dta
erase $path3\temp9.dta
erase $path3\temp10.dta
erase $path3\temp11.dta
erase $path3\temp12.dta
erase $path3\temp13.dta

* return to jupyter
. erase $path3\temp1.dta

. erase $path3\temp2.dta

. erase $path3\temp3.dta

. erase $path3\temp4.dta

. erase $path3\temp5.dta

. erase $path3\temp6.dta

. erase $path3\temp7.dta

. erase $path3\temp8.dta

. erase $path3\temp9.dta

. erase $path3\temp10.dta

. erase $path3\temp11.dta

. erase $path3\temp12.dta

. erase $path3\temp13.dta

. 
. * return to jupyter


Append the NCDS and BCS data files and create a new id variable.

In [76]:
use $path2\NCDS_MAIN.dta, clear
    append using $path2\BCS_MAIN.dta
    tab cohort, mi

*Here we create a new id number for all cases in our dataset
* We probably won't need this, but we create it just in case
capture drop poolid
    gen poolid = _n
    sum poolid
    duplicates report poolid
    label variable poolid "New ID for Pooled Data"
    
    sort poolid
    
save $path3\pooledNCDSBCS_v1.dta, replace

* return to jupyter
. use $path2\NCDS_MAIN.dta, clear

.     append using $path2\BCS_MAIN.dta
(label cohort already defined)
(label OUTCME01 already defined)
(label OUTCME02 already defined)
(label yesno already defined)
(label nssec already defined)
(label rgsc already defined)
(label ed_cat already defined)
(label egp already defined)

.     tab cohort, mi

     Cohort |      Freq.     Percent        Cum.
------------+-----------------------------------
       NCDS |     18,558       49.29       49.29
        BCS |     19,089       50.71      100.00
------------+-----------------------------------
      Total |     37,647      100.00

. 
. *Here we create a new id number for all cases in our dataset

. * We probably won't need this, but we create it just in case

. capture drop poolid

.     gen poolid = _n

.     sum poolid

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
      poolid |     37,647       18824     10867.9          1      37647

.     duplicates report poolid

Duplicates in terms of poolid

--------------------------------------
   copies | observations       surplus
----------+---------------------------
        1 |        37647             0
--------------------------------------

.     label variable poolid "New ID for Pooled Data"

.     
.     sort poolid

.     
. save $path3\pooledNCDSBCS_v1.dta, replace
file F:\Data\MYDATA\TEMP\pooledNCDSBCS_v1.dta saved

. 
. * return to jupyter

We now create joint variables from the information available in the two cohorts.

In [77]:
use $path3\pooledNCDSBCS_v1.dta, clear
numlabel, add

*Cohort member's standardised ability test scores age 10/11
capture drop ability
    gen ability = .
    replace ability = ncds11_stdbastotalscore if (cohort==1)
    replace ability = bcs10_stdabilityscore if (cohort==2)
    label variable ability "Ability Test Score"
    summ ability
    summ ability if (cohort==1)
    summ ability if (cohort==2)
    

*Create one single variable for the ability PCA score in both cohorts

summ ncds11_stdpc1 bcs10_stdpc1

capture drop pcascore
    gen pcascore = .
    replace pcascore = ncds11_stdpc1 if (cohort==1)
    replace pcascore = bcs10_stdpc1 if (cohort==2)
    label variable pcascore "PCA Ability Test Score"
    summ pcascore
    summ pcascore if (cohort==1)
    summ pcascore if (cohort==2)

*Cohort member's gender
tab ncds_male, mi
tab ncds_male
tab bcs_male, mi
tab bcs_male

capture drop male
    gen male = .
    replace male = ncds_male  if (cohort==1)
    replace male = bcs_male if (cohort==2)
    label variable male "male"
    label values male yesno
    tab male

*Father's NS-SEC
tab ncds_panssec 
tab bcs_panssec

capture drop dadnssec
    gen dadnssec = .
    replace dadnssec = ncds_panssec if (cohort==1)
    replace dadnssec = bcs_panssec if (cohort==2)
    label variable dadnssec "Father's NSSEC"
    label values dadnssec nssec 
    tab dadnssec

*Here we create an interaction between NSSSEC and Cohort
*Of course we can undetake interactions using the interactions code in Stata
*However creating an interaction here allows a little more clarity.

*Coefficients and standard errors of a model with interaction
*terms cannot be readily interpreted independently of each other, 
*since any given coefficient refers to the combined influence of all 
*of the other contributing variables. We specify the interaction as a 
*discrete categorical variable that has a distinct value
*for each combination of circumstances. This allows the independent effect
*of each discrete category to be more easily interpreted.

*See: Jaccard, J. and R. Turrisi (2003) Interaction Effects in Multiple 
* Regression. London: Sage. 
 
*NSSEC * Cohort Interaction
capture drop nsinteraction
    gen nsinteraction = .
    replace nsinteraction = 1 if ((dadnssec==1)&(cohort==1))
    replace nsinteraction = 2 if ((dadnssec==1)&(cohort==2))
    replace nsinteraction = 3 if ((dadnssec==2)&(cohort==1))
    replace nsinteraction = 4 if ((dadnssec==2)&(cohort==2))
    replace nsinteraction = 5 if ((dadnssec==3)&(cohort==1))
    replace nsinteraction = 6 if ((dadnssec==3)&(cohort==2))
    replace nsinteraction = 7 if ((dadnssec==4)&(cohort==1))
    replace nsinteraction = 8 if ((dadnssec==4)&(cohort==2))
    replace nsinteraction = 9 if ((dadnssec==5)&(cohort==1))
    replace nsinteraction = 10 if ((dadnssec==5)&(cohort==2))
    replace nsinteraction = 11 if ((dadnssec==6)&(cohort==1))
    replace nsinteraction = 12 if ((dadnssec==6)&(cohort==2))
    replace nsinteraction = 13 if ((dadnssec==7)&(cohort==1))
    replace nsinteraction = 14 if ((dadnssec==7)&(cohort==2))
    replace nsinteraction = 15 if ((dadnssec==8)&(cohort==1))
    replace nsinteraction = 16 if ((dadnssec==8)&(cohort==2))
    tab nsinteraction
    label variable nsinteraction "NSSEC Interaction"
    label define nsint 1 "NCDS 1.1" 2 "BCS 1.1" 3 "NCDS 1.2" 4 "BCS 1.2" 5 "NCDS 2" 6 "BCS 2" 7 "NCDS 3" 8 "BCS 3" 9 "NCDS 4" 10 "BCS 4" 11 "NCDS 5" 12 "BCS 5" 13 "NCDS 6" 14 "BCS 6" 15 "NCDS 7" 16 "BCS 7"
    label values nsinteraction nsint

*Parents Education

capture drop parented
    gen parented = .
    replace parented = ncds_parented if (cohort==1)
    replace parented = bcs_parented if (cohort==2)
    label values parented ed
    label variable parented "Parent's Highest Education"
    tab parented
    tab parented cohort, col

*Additional variables that will potentially be used to produce weights and in multiple imputation
*Mother's Age at the Birth of the Cohort member
capture drop mumage
    gen mumage = .
    replace mumage = ncds_mumagebirth if (cohort==1)
    replace mumage = bcs_mumagebirth if (cohort==2)
    label variable mumage "Mother's Age at CM Birth"

*Cohort Member Parity at Birth
capture drop parity
    gen parity = .
    replace parity = ncds_parity if (cohort==1)
    replace parity = bcs_parity if (cohort==2)
    label variable parity "Parity at Birth"

*Whether the cohort member's mother is married at cohort member's birth
tab ncds_married   
tab bcs_mummarried
capture drop married
    gen married = .
    replace married = ncds_married if (cohort==1)
    replace married = bcs_mummarried if (cohort==2)
    label variable married "Mother married at CM birth"
    label values married yesno

* return to jupyter
. use $path3\pooledNCDSBCS_v1.dta, clear

. numlabel, add

. 
. *Cohort member's standardised ability test scores age 10/11

. capture drop ability

.     gen ability = .
(37,647 missing values generated)

.     replace ability = ncds11_stdbastotalscore if (cohort==1)
(14,131 real changes made)

.     replace ability = bcs10_stdabilityscore if (cohort==2)
(11,397 real changes made)

.     label variable ability "Ability Test Score"

.     summ ability

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     ability |     25,528         100    14.99971   43.58873   151.1925

.     summ ability if (cohort==1)

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     ability |     14,131         100          15   60.10213   134.4337

.     summ ability if (cohort==2)

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     ability |     11,397         100          15   43.58873   151.1925

.     
. 
. *Create one single variable for the ability PCA score in both cohorts

. 
. summ ncds11_stdpc1 bcs10_stdpc1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
ncds11_std~1 |     14,131   -1.45e-10           1   -2.68232   2.329129
bcs10_stdpc1 |     11,397   -4.73e-10           1  -3.701819   3.371969

. 
. capture drop pcascore

.     gen pcascore = .
(37,647 missing values generated)

.     replace pcascore = ncds11_stdpc1 if (cohort==1)
(14,131 real changes made)

.     replace pcascore = bcs10_stdpc1 if (cohort==2)
(11,397 real changes made)

.     label variable pcascore "PCA Ability Test Score"

.     summ pcascore

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
    pcascore |     25,528   -2.91e-10    .9999804  -3.701819   3.371969

.     summ pcascore if (cohort==1)

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
    pcascore |     14,131   -1.45e-10           1   -2.68232   2.329129

.     summ pcascore if (cohort==2)

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
    pcascore |     11,397   -4.73e-10           1  -3.701819   3.371969

. 
. *Cohort member's gender

. tab ncds_male, mi

NCDS Cohort |
member Male |      Freq.     Percent        Cum.
------------+-----------------------------------
      0. No |      8,959       23.80       23.80
     1. Yes |      9,595       25.49       49.28
          . |     19,093       50.72      100.00
------------+-----------------------------------
      Total |     37,647      100.00

. tab ncds_male

NCDS Cohort |
member Male |      Freq.     Percent        Cum.
------------+-----------------------------------
      0. No |      8,959       48.29       48.29
     1. Yes |      9,595       51.71      100.00
------------+-----------------------------------
      Total |     18,554      100.00

. tab bcs_male, mi

 BCS Cohort |
member Male |      Freq.     Percent        Cum.
------------+-----------------------------------
      0. No |      8,943       23.75       23.75
     1. Yes |      9,686       25.73       49.48
          . |     19,018       50.52      100.00
------------+-----------------------------------
      Total |     37,647      100.00

. tab bcs_male

 BCS Cohort |
member Male |      Freq.     Percent        Cum.
------------+-----------------------------------
      0. No |      8,943       48.01       48.01
     1. Yes |      9,686       51.99      100.00
------------+-----------------------------------
      Total |     18,629      100.00

. 
. capture drop male

.     gen male = .
(37,647 missing values generated)

.     replace male = ncds_male  if (cohort==1)
(18,554 real changes made)

.     replace male = bcs_male if (cohort==2)
(18,629 real changes made)

.     label variable male "male"

.     label values male yesno

.     tab male

       male |      Freq.     Percent        Cum.
------------+-----------------------------------
      0. No |     17,902       48.15       48.15
     1. Yes |     19,281       51.85      100.00
------------+-----------------------------------
      Total |     37,183      100.00

. 
. *Father's NS-SEC

. tab ncds_panssec 

             NCDS Age 11 Father's NSSEC |      Freq.     Percent        Cum.
----------------------------------------+-----------------------------------
1. Large Employers and Higher Manageria |        367        3.29        3.29
                 2. Higher Professional |        536        4.80        8.09
   3. Lower managerial and professional |      1,323       11.86       19.95
                        4. Intermediate |      1,058        9.48       29.44
     5. Small employers and own account |      1,374       12.32       41.75
     6. Lower Supervisory and Technical |      1,817       16.29       58.04
                        7. Semi-Routine |      1,972       17.68       75.72
                             8. Routine |      2,709       24.28      100.00
----------------------------------------+-----------------------------------
                                  Total |     11,156      100.00

. tab bcs_panssec

                                  nssec |      Freq.     Percent        Cum.
----------------------------------------+-----------------------------------
1. Large Employers and Higher Manageria |        570        4.73        4.73
                 2. Higher Professional |        760        6.31       11.04
   3. Lower managerial and professional |      1,806       14.99       26.04
                        4. Intermediate |      1,090        9.05       35.09
     5. Small employers and own account |      1,582       13.13       48.22
     6. Lower Supervisory and Technical |      2,032       16.87       65.09
                        7. Semi-Routine |      1,739       14.44       79.53
                             8. Routine |      2,466       20.47      100.00
----------------------------------------+-----------------------------------
                                  Total |     12,045      100.00

. 
. capture drop dadnssec

.     gen dadnssec = .
(37,647 missing values generated)

.     replace dadnssec = ncds_panssec if (cohort==1)
(11,156 real changes made)

.     replace dadnssec = bcs_panssec if (cohort==2)
(12,045 real changes made)

.     label variable dadnssec "Father's NSSEC"

.     label values dadnssec nssec 

.     tab dadnssec

                         Father's NSSEC |      Freq.     Percent        Cum.
----------------------------------------+-----------------------------------
1. Large Employers and Higher Manageria |        937        4.04        4.04
                 2. Higher Professional |      1,296        5.59        9.62
   3. Lower managerial and professional |      3,129       13.49       23.11
                        4. Intermediate |      2,148        9.26       32.37
     5. Small employers and own account |      2,956       12.74       45.11
     6. Lower Supervisory and Technical |      3,849       16.59       61.70
                        7. Semi-Routine |      3,711       16.00       77.69
                             8. Routine |      5,175       22.31      100.00
----------------------------------------+-----------------------------------
                                  Total |     23,201      100.00

. 
. *Here we create an interaction between NSSSEC and Cohort

. *Of course we can undetake interactions using the interactions code in Stata

. *However creating an interaction here allows a little more clarity.

. 
. *Coefficients and standard errors of a model with interaction

. *terms cannot be readily interpreted independently of each other, 

. *since any given coefficient refers to the combined influence of all 

. *of the other contributing variables. We specify the interaction as a 

. *discrete categorical variable that has a distinct value

. *for each combination of circumstances. This allows the independent effect

. *of each discrete category to be more easily interpreted.

. 
. *See: Jaccard, J. and R. Turrisi (2003) Interaction Effects in Multiple 

. * Regression. London: Sage. 

.  
. *NSSEC * Cohort Interaction

. capture drop nsinteraction

.     gen nsinteraction = .
(37,647 missing values generated)

.     replace nsinteraction = 1 if ((dadnssec==1)&(cohort==1))
(367 real changes made)

.     replace nsinteraction = 2 if ((dadnssec==1)&(cohort==2))
(570 real changes made)

.     replace nsinteraction = 3 if ((dadnssec==2)&(cohort==1))
(536 real changes made)

.     replace nsinteraction = 4 if ((dadnssec==2)&(cohort==2))
(760 real changes made)

.     replace nsinteraction = 5 if ((dadnssec==3)&(cohort==1))
(1,323 real changes made)

.     replace nsinteraction = 6 if ((dadnssec==3)&(cohort==2))
(1,806 real changes made)

.     replace nsinteraction = 7 if ((dadnssec==4)&(cohort==1))
(1,058 real changes made)

.     replace nsinteraction = 8 if ((dadnssec==4)&(cohort==2))
(1,090 real changes made)

.     replace nsinteraction = 9 if ((dadnssec==5)&(cohort==1))
(1,374 real changes made)

.     replace nsinteraction = 10 if ((dadnssec==5)&(cohort==2))
(1,582 real changes made)

.     replace nsinteraction = 11 if ((dadnssec==6)&(cohort==1))
(1,817 real changes made)

.     replace nsinteraction = 12 if ((dadnssec==6)&(cohort==2))
(2,032 real changes made)

.     replace nsinteraction = 13 if ((dadnssec==7)&(cohort==1))
(1,972 real changes made)

.     replace nsinteraction = 14 if ((dadnssec==7)&(cohort==2))
(1,739 real changes made)

.     replace nsinteraction = 15 if ((dadnssec==8)&(cohort==1))
(2,709 real changes made)

.     replace nsinteraction = 16 if ((dadnssec==8)&(cohort==2))
(2,466 real changes made)

.     tab nsinteraction

nsinteracti |
         on |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |        367        1.58        1.58
          2 |        570        2.46        4.04
          3 |        536        2.31        6.35
          4 |        760        3.28        9.62
          5 |      1,323        5.70       15.33
          6 |      1,806        7.78       23.11
          7 |      1,058        4.56       27.67
          8 |      1,090        4.70       32.37
          9 |      1,374        5.92       38.29
         10 |      1,582        6.82       45.11
         11 |      1,817        7.83       52.94
         12 |      2,032        8.76       61.70
         13 |      1,972        8.50       70.20
         14 |      1,739        7.50       77.69
         15 |      2,709       11.68       89.37
         16 |      2,466       10.63      100.00
------------+-----------------------------------
      Total |     23,201      100.00

.     label variable nsinteraction "NSSEC Interaction"

.     label define nsint 1 "NCDS 1.1" 2 "BCS 1.1" 3 "NCDS 1.2" 4 "BCS 1.2" 5 "NCDS 2" 6 "BCS 2" 7 "NCDS 3" 8 "BCS 3" 9 "NCDS 4" 10 "BCS 4" 1
> 1 "NCDS 5" 12 "BCS 5" 13 "NCDS 6" 14 "BCS 6" 15 "NCDS 7" 16 "BCS 7"

.     label values nsinteraction nsint

. 
. *Parents Education

. 
. capture drop parented

.     gen parented = .
(37,647 missing values generated)

.     replace parented = ncds_parented if (cohort==1)
(15,927 real changes made)

.     replace parented = bcs_parented if (cohort==2)
(13,088 real changes made)

.     label values parented ed

.     label variable parented "Parent's Highest Education"

.     tab parented

   Parent's |
    Highest |
  Education |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |     18,499       63.76       63.76
          2 |      7,797       26.87       90.63
          3 |      1,021        3.52       94.15
          4 |      1,698        5.85      100.00
------------+-----------------------------------
      Total |     29,015      100.00

.     tab parented cohort, col

+-------------------+
| Key               |
|-------------------|
|     frequency     |
| column percentage |
+-------------------+

  Parent's |
   Highest |        Cohort
 Education |   1. NCDS     2. BCS |     Total
-----------+----------------------+----------
         1 |    11,659      6,840 |    18,499 
           |     73.20      52.26 |     63.76 
-----------+----------------------+----------
         2 |     3,384      4,413 |     7,797 
           |     21.25      33.72 |     26.87 
-----------+----------------------+----------
         3 |       246        775 |     1,021 
           |      1.54       5.92 |      3.52 
-----------+----------------------+----------
         4 |       638      1,060 |     1,698 
           |      4.01       8.10 |      5.85 
-----------+----------------------+----------
     Total |    15,927     13,088 |    29,015 
           |    100.00     100.00 |    100.00 


. 
. *Additional variables that will potentially be used to produce weights and in multiple imputation

. *Mother's Age at the Birth of the Cohort member

. capture drop mumage

.     gen mumage = .
(37,647 missing values generated)

.     replace mumage = ncds_mumagebirth if (cohort==1)
(17,402 real changes made)

.     replace mumage = bcs_mumagebirth if (cohort==2)
(17,163 real changes made)

.     label variable mumage "Mother's Age at CM Birth"

. 
. *Cohort Member Parity at Birth

. capture drop parity

.     gen parity = .
(37,647 missing values generated)

.     replace parity = ncds_parity if (cohort==1)
(17,412 real changes made)

.     replace parity = bcs_parity if (cohort==2)
(17,164 real changes made)

.     label variable parity "Parity at Birth"

. 
. *Whether the cohort member's mother is married at cohort member's birth

. tab ncds_married   

NCDS Mother |
 married at |
     Cohort |
   Member's |
      Birth |      Freq.     Percent        Cum.
------------+-----------------------------------
      0. No |        743        4.27        4.27
     1. Yes |     16,662       95.73      100.00
------------+-----------------------------------
      Total |     17,405      100.00

. tab bcs_mummarried

 BCS Mother |
 married at |
     Cohort |
   Member's |
      Birth |      Freq.     Percent        Cum.
------------+-----------------------------------
      0. No |      1,000        5.88        5.88
     1. Yes |     16,011       94.12      100.00
------------+-----------------------------------
      Total |     17,011      100.00

. capture drop married

.     gen married = .
(37,647 missing values generated)

.     replace married = ncds_married if (cohort==1)
(17,405 real changes made)

.     replace married = bcs_mummarried if (cohort==2)
(17,011 real changes made)

.     label variable married "Mother married at CM birth"

.     label values married yesno

. 
. * return to jupyter

We now define the analytic sample.

The BCS includes cohort members born in Northern Ireland. These cohort members are included in the first survey but not in susequent sweeps. We exclude these Northern Irish cohort members for comparabiltiy with the NCDS dataset.

The cross sectional sample sizes in the cohort studies vary, because some cohort members (e.g. immigrants to the UK) were included after the first sweep. For consistency and clarity we include only the original birth sample of both cohorts in our analytical sample (i.e. we keep only cohort members who were present at the first survey).

More details on the samples of the NCDS and BCS are available here.

In [78]:
*Exclude cohort members from Northern Ireland

tab ncds0_country, mi

tab bcs0_country
drop if bcs0_country==4
*628 babies from NI are deleted

tab ncds_11outcome 
tab bcs_10outcome

*Keep only the original birth sample

tab ncds_0outcome 
tab ncds_11outcome
tab ncds_0outcome ncds_11outcome
drop if (ncds_0outcome !=1)&(cohort==1)

tab bcs_0outcome 
tab bcs_10outcome
tab bcs_0outcome bcs_10outcome
drop if (bcs_0outcome !=1)&(cohort==2)

* return to jupyter
. *Exclude cohort members from Northern Ireland

. 
. tab ncds0_country, mi

 NCDS Age 0 |
    Country |      Freq.     Percent        Cum.
------------+-----------------------------------
 1. England |     14,517       38.56       38.56
   2. Wales |        914        2.43       40.99
3. Scotland |      1,985        5.27       46.26
          . |     20,231       53.74      100.00
------------+-----------------------------------
      Total |     37,647      100.00

. 
. tab bcs0_country

   1970: Country of |
        Interview   |      Freq.     Percent        Cum.
--------------------+-----------------------------------
         1. England |     14,072       81.83       81.83
           2. Wales |        879        5.11       86.94
        3. Scotland |      1,617        9.40       96.35
4. Northern Ireland |        628        3.65      100.00
--------------------+-----------------------------------
              Total |     17,196      100.00

. drop if bcs0_country==4
(628 observations deleted)

. *628 babies from NI are deleted

. 
. tab ncds_11outcome 

   NCDS response outcome |
           1969 (age 11) |      Freq.     Percent        Cum.
-------------------------+-----------------------------------
           1. Productive |     15,337       82.64       82.64
              2. Refusal |        797        4.29       86.94
          3. Non-contact |        406        2.19       89.13
   4. Other unproductive |        202        1.09       90.21
           6. Not Issued |        275        1.48       91.70
7. Not Issued - Emigrant |        701        3.78       95.47
    8. Not Issued - Dead |        840        4.53      100.00
-------------------------+-----------------------------------
                   Total |     18,558      100.00

. tab bcs_10outcome

    BCS response outcome |
           1980 (age 10) |      Freq.     Percent        Cum.
-------------------------+-----------------------------------
           1. Productive |     14,851       80.81       80.81
   4. Other unproductive |      2,381       12.96       93.76
           6. Not Issued |        549        2.99       96.75
                 8. Dead |        597        3.25      100.00
-------------------------+-----------------------------------
                   Total |     18,378      100.00

. 
. *Keep only the original birth sample

. 
. tab ncds_0outcome 

   NCDS response outcome |
            1958 (age 0) |      Freq.     Percent        Cum.
-------------------------+-----------------------------------
           1. Productive |     17,415       93.84       93.84
          3. Non-contact |        218        1.17       95.02
           6. Not Issued |        925        4.98      100.00
-------------------------+-----------------------------------
                   Total |     18,558      100.00

. tab ncds_11outcome

   NCDS response outcome |
           1969 (age 11) |      Freq.     Percent        Cum.
-------------------------+-----------------------------------
           1. Productive |     15,337       82.64       82.64
              2. Refusal |        797        4.29       86.94
          3. Non-contact |        406        2.19       89.13
   4. Other unproductive |        202        1.09       90.21
           6. Not Issued |        275        1.48       91.70
7. Not Issued - Emigrant |        701        3.78       95.47
    8. Not Issued - Dead |        840        4.53      100.00
-------------------------+-----------------------------------
                   Total |     18,558      100.00

. tab ncds_0outcome ncds_11outcome

NCDS response outcome |                     NCDS response outcome 1969 (age 11)
         1958 (age 0) | 1. Produc  2. Refusa  3. Non-co  4. Other   6. Not Is  7. Not Is  8. Not Is |     Total
----------------------+-----------------------------------------------------------------------------+----------
        1. Productive |    14,574        781        358        195          0        667        840 |    17,415 
       3. Non-contact |       182          2         33          0          0          1          0 |       218 
        6. Not Issued |       581         14         15          7        275         33          0 |       925 
----------------------+-----------------------------------------------------------------------------+----------
                Total |    15,337        797        406        202        275        701        840 |    18,558 


. drop if (ncds_0outcome !=1)&(cohort==1)
(1,143 observations deleted)

. 
. tab bcs_0outcome 

    BCS response outcome |
            1970 (age 0) |      Freq.     Percent        Cum.
-------------------------+-----------------------------------
           1. Productive |     16,568       90.15       90.15
   4. Other unproductive |         18        0.10       90.25
           6. Not Issued |      1,792        9.75      100.00
-------------------------+-----------------------------------
                   Total |     18,378      100.00

. tab bcs_10outcome

    BCS response outcome |
           1980 (age 10) |      Freq.     Percent        Cum.
-------------------------+-----------------------------------
           1. Productive |     14,851       80.81       80.81
   4. Other unproductive |      2,381       12.96       93.76
           6. Not Issued |        549        2.99       96.75
                 8. Dead |        597        3.25      100.00
-------------------------+-----------------------------------
                   Total |     18,378      100.00

. tab bcs_0outcome bcs_10outcome

 BCS response outcome |     BCS response outcome 1980 (age 10)
         1970 (age 0) | 1. Produc  4. Other   6. Not Is    8. Dead |     Total
----------------------+--------------------------------------------+----------
        1. Productive |    13,757      2,212          4        595 |    16,568 
4. Other unproductive |        15          2          0          1 |        18 
        6. Not Issued |     1,079        167        545          1 |     1,792 
----------------------+--------------------------------------------+----------
                Total |    14,851      2,381        549        597 |    18,378 


. drop if (bcs_0outcome !=1)&(cohort==2)
(1,893 observations deleted)

. 
. * return to jupyter

Here we create some variables that identify the cohort member's response to the first survey and the survey with the outcome of interest.

In [79]:
*sweep0outcome indicates that the cohort member was included in the first survey
capture drop sweep0outcome
    gen sweep0outcome = 0
    replace sweep0outcome = 1 if ((ncds_0outcome==1)&(cohort==1))
    replace sweep0outcome = 1 if ((bcs_0outcome==1)&(cohort==2))
    label values sweep0outcome yesno
    tab sweep0outcome cohort
    label variable sweep0outcome "Productive at first survey"

tab ncds_11outcome
tab bcs_10outcome

*sweeptestoutcome indicates that they were included in the age 10/11 survey
capture drop sweeptestoutcome
    gen sweeptestoutcome = 0
    replace sweeptestoutcome = 1 if ((ncds_11outcome==1)&(cohort==1))
    replace sweeptestoutcome = 1 if ((bcs_10outcome==1)&(cohort==2))
    label values sweeptestoutcome yesno
    tab sweeptestoutcome cohort
    label variable sweeptestoutcome "Productive at age 10/11 survey"

*Also create a variable to indicate if the cohort members had died by the age
* 10/11 surveys. This will be used to delete cases after multiple imputation.
capture drop deadtestoutcome
    gen deadtestoutcome = 0
    replace deadtestoutcome = 1 if ((ncds_11outcome==8)&(cohort==1))
    replace deadtestoutcome = 1 if ((bcs_10outcome==8)&(cohort==2))
    label values deadtestoutcome yesno
    tab deadtestoutcome cohort
    label variable deadtestoutcome "Dead at age 10/11 survey"
    tab deadtestoutcome

tab cohort

* return to jupyter
. *sweep0outcome indicates that the cohort member was included in the first survey

. capture drop sweep0outcome

.     gen sweep0outcome = 0

.     replace sweep0outcome = 1 if ((ncds_0outcome==1)&(cohort==1))
(17,415 real changes made)

.     replace sweep0outcome = 1 if ((bcs_0outcome==1)&(cohort==2))
(16,568 real changes made)

.     label values sweep0outcome yesno

.     tab sweep0outcome cohort

sweep0outc |        Cohort
       ome |   1. NCDS     2. BCS |     Total
-----------+----------------------+----------
    1. Yes |    17,415     16,568 |    33,983 
-----------+----------------------+----------
     Total |    17,415     16,568 |    33,983 


.     label variable sweep0outcome "Productive at first survey"

. 
. tab ncds_11outcome

   NCDS response outcome |
           1969 (age 11) |      Freq.     Percent        Cum.
-------------------------+-----------------------------------
           1. Productive |     14,574       83.69       83.69
              2. Refusal |        781        4.48       88.17
          3. Non-contact |        358        2.06       90.23
   4. Other unproductive |        195        1.12       91.35
7. Not Issued - Emigrant |        667        3.83       95.18
    8. Not Issued - Dead |        840        4.82      100.00
-------------------------+-----------------------------------
                   Total |     17,415      100.00

. tab bcs_10outcome

    BCS response outcome |
           1980 (age 10) |      Freq.     Percent        Cum.
-------------------------+-----------------------------------
           1. Productive |     13,757       83.03       83.03
   4. Other unproductive |      2,212       13.35       96.38
           6. Not Issued |          4        0.02       96.41
                 8. Dead |        595        3.59      100.00
-------------------------+-----------------------------------
                   Total |     16,568      100.00

. 
. *sweeptestoutcome indicates that they were included in the age 10/11 survey

. capture drop sweeptestoutcome

.     gen sweeptestoutcome = 0

.     replace sweeptestoutcome = 1 if ((ncds_11outcome==1)&(cohort==1))
(14,574 real changes made)

.     replace sweeptestoutcome = 1 if ((bcs_10outcome==1)&(cohort==2))
(13,757 real changes made)

.     label values sweeptestoutcome yesno

.     tab sweeptestoutcome cohort

sweeptesto |        Cohort
    utcome |   1. NCDS     2. BCS |     Total
-----------+----------------------+----------
     0. No |     2,841      2,811 |     5,652 
    1. Yes |    14,574     13,757 |    28,331 
-----------+----------------------+----------
     Total |    17,415     16,568 |    33,983 


.     label variable sweeptestoutcome "Productive at age 10/11 survey"

. 
. *Also create a variable to indicate if the cohort members had died by the age

. * 10/11 surveys. This will be used to delete cases after multiple imputation.

. capture drop deadtestoutcome

.     gen deadtestoutcome = 0

.     replace deadtestoutcome = 1 if ((ncds_11outcome==8)&(cohort==1))
(840 real changes made)

.     replace deadtestoutcome = 1 if ((bcs_10outcome==8)&(cohort==2))
(595 real changes made)

.     label values deadtestoutcome yesno

.     tab deadtestoutcome cohort

deadtestou |        Cohort
     tcome |   1. NCDS     2. BCS |     Total
-----------+----------------------+----------
     0. No |    16,575     15,973 |    32,548 
    1. Yes |       840        595 |     1,435 
-----------+----------------------+----------
     Total |    17,415     16,568 |    33,983 


.     label variable deadtestoutcome "Dead at age 10/11 survey"

.     tab deadtestoutcome

Dead at age |
      10/11 |
     survey |      Freq.     Percent        Cum.
------------+-----------------------------------
      0. No |     32,548       95.78       95.78
     1. Yes |      1,435        4.22      100.00
------------+-----------------------------------
      Total |     33,983      100.00

. 
. tab cohort

     Cohort |      Freq.     Percent        Cum.
------------+-----------------------------------
    1. NCDS |     17,415       51.25       51.25
     2. BCS |     16,568       48.75      100.00
------------+-----------------------------------
      Total |     33,983      100.00

. 
. * return to jupyter

We create a variable that indicates which cases have complete information on all the required information for our main analysis (i.e. this variable indicates the complete records sample).

In [80]:
capture drop samplenssec 
    egen samplenssec  = rmiss(ability male parented dadnssec)
    tab samplenssec 
    label variable samplenssec  "Sample non-missing ns-sec measure"
    tab samplenssec  if (cohort==1)
    tab samplenssec  if (cohort==2)
    
* return to jupyter
. capture drop samplenssec 

.     egen samplenssec  = rmiss(ability male parented dadnssec)

.     tab samplenssec 

samplenssec |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |     17,716       52.13       52.13
          1 |      8,681       25.55       77.68
          2 |      3,887       11.44       89.12
          3 |      3,690       10.86       99.97
          4 |          9        0.03      100.00
------------+-----------------------------------
      Total |     33,983      100.00

.     label variable samplenssec  "Sample non-missing ns-sec measure"

.     tab samplenssec  if (cohort==1)

     Sample |
non-missing |
     ns-sec |
    measure |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |      9,617       55.22       55.22
          1 |      4,185       24.03       79.25
          2 |      1,900       10.91       90.16
          3 |      1,710        9.82       99.98
          4 |          3        0.02      100.00
------------+-----------------------------------
      Total |     17,415      100.00

.     tab samplenssec  if (cohort==2)

     Sample |
non-missing |
     ns-sec |
    measure |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |      8,099       48.88       48.88
          1 |      4,496       27.14       76.02
          2 |      1,987       11.99       88.01
          3 |      1,980       11.95       99.96
          4 |          6        0.04      100.00
------------+-----------------------------------
      Total |     16,568      100.00

.     
. * return to jupyter

General Ability Test Scores

In the NCDS and the BCS cohort members completed general ability tests at age 11 and 10 respectively. The general ability test in the NCDS comprised of 40 verbal and 40 non-verbal items (see Shepherd, 2012). The general ability test in the BCS comprised of four sub-scales from the British Ability Scales, word definition, word similarities, recall of digits and matrices (see Parsons, 2014).

We computed an overall cognitive ability test score using the summated test scores. This is the method used in previous studies which examine the role of cognitive ability in educational and occupational attainment (e.g. Breen and Goldthorpe, 2001). Alternatively principal components analysis (PCA) can be used to summarise the relationship between the cognitive ability subtests in order to produce an estimate of general ability ‘g’. This method has also been deployed in previous studies using the cognitive ability test scores in the NCDS and BCS (e.g. Schoon, 2010). We have computed scores using the two alternative methods, and we find that the total scores and the PCA scores are almost perfectly correlated (NCDS: r = 0.999, p < 0.001; BCS: r = 0.997, p < 0.001). Therefore, we conclude that either approach would be suitable for this analysis, but we have chosen the total score measure because of their direct comparability with previous studies.

In [81]:
* Correlation between PCA and Total Score Methods

pwcorr ability pcascore if (cohort==1)&(samplenssec==0), sig
pwcorr ability pcascore if (cohort==2)&(samplenssec==0), sig

* return to jupyter
. * Correlation between PCA and Total Score Methods

. 
. pwcorr ability pcascore if (cohort==1)&(samplenssec==0), sig

             |  ability pcascore
-------------+------------------
     ability |   1.0000 
             |
             |
    pcascore |   0.9994   1.0000 
             |   0.0000
             |

. pwcorr ability pcascore if (cohort==2)&(samplenssec==0), sig

             |  ability pcascore
-------------+------------------
     ability |   1.0000 
             |
             |
    pcascore |   0.9970   1.0000 
             |   0.0000
             |

. 
. * return to jupyter

The general ability test in the NCDS is comparable with the test in the BCS (see Elliott et al., 1978; Shepherd, 2012). However, it is not possible to directly assess the Flynn Effect using the general ability test measures in the NCDS and the BCS. This is because the tests include a different number of items and have different total scores. The two measures are suitable for the current analysis because our focus is on relative social class inequalities within each of the two cohorts. In order operationalise the analyses we construct a cross-cohort measure using arithmetic standardisation, which has been used in previous studies (see Schoon, 2010). The summary statistics for the cognitive ability tests are provided in table 1.

In [82]:
*****TABLE 1: 
*DESCRIPTIVE STATISTICS FOR GENERAL ABILITY TEST SCORES IN THE NCDS AND BCS.
summ ability if (cohort==1) & (samplenssec == 0)
summ ability if (cohort==2) & (samplenssec == 0)

* return to jupyter
. *****TABLE 1: 

. *DESCRIPTIVE STATISTICS FOR GENERAL ABILITY TEST SCORES IN THE NCDS AND BCS.

. summ ability if (cohort==1) & (samplenssec == 0)

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     ability |      9,617    100.8653     14.7091   60.10213   133.5046

. summ ability if (cohort==2) & (samplenssec == 0)

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     ability |      8,099    100.8371    14.76514   45.37629   151.1925

. 
. * return to jupyter

Table 1

Parental Social Class

The central analytical focus of this article is an investigation of the effects of parental social class on filial general cognitive ability test scores. Social class schemes are widely used in sociological research and are regarded as socio-economic measures that divide the population into unequally rewarded categories (Crompton, 2008). We employ an occupation-based socio-economic measure because it provides a robust and parsimonious indicator of parental social positions (see Connelly et al., 2016b). Occupation based socio-economic measures do not simply act as a proxy where income data are unavailable, they are sociological measures designed to better understand fundamental forms of social relations and inequalities to which income is merely epiphenomenal (Rose and Pevalin, 2003). In this analysis we employ the United Kingdom National Statistics Socio-Economic Classification (NS-SEC) (see Rose and Pevalin, 2005) which is widely used in sociological analyses and in official statistics.

Gregg (2012) coded and deposited UK standard occupational classification codes (SOC2000) for the job titles of NCDS fathers collected in the age 11 survey, and BCS mothers and fathers collected in the age 10 survey (SN7023, Gregg, 2012). These detailed occupational codes are an invaluable resource, and we use them to compute NS-SEC in both cohorts. As detailed occupational information (i.e. SOC codes) is only available for fathers in the NCDS 3 we only use father’s information in the BCS (see table 2).

Further Explanatory Variables

In previous research gender differences in childhood cognitive ability test scores have been observed (see Van der Sluis et al., 2006; Strand et al., 2006; Sullivan et al., 2013). Parental education is measured using mother’s and father’s years of education completed after the compulsory school leaving age. We categorise these variables in a similar manner to previous research using these data (see Cheung and Egerton, 2007). We are cautious not to attribute titles to these categories because in British samples years of education do not neatly map on to an individual’s educational experiences and attainments (see Connelly et al., 2016a). We use the highest level of education of the cohort member’s parents to represent the parental level of education (see table 2). Parental education is included as a control variable which may measure an additional dimension of a family’s socio-economic position.

In [83]:
*****TABLE 2: 
*DESCRIPTIVE STATISTICS OF THE GENDER AND PARENTAL EDUCATION VARIABLES IN 
*THE NCDS AND BCS.
tab male if (cohort==1) & (samplenssec == 0)
tab male if (cohort==2) & (samplenssec == 0)

tab parented if (cohort==1) & (samplenssec == 0)
tab parented if (cohort==2) & (samplenssec == 0)

tab dadnssec if (cohort==1) & (samplenssec == 0)
tab dadnssec if (cohort==2) & (samplenssec == 0)

* return to jupyter
. *****TABLE 2: 

. *DESCRIPTIVE STATISTICS OF THE GENDER AND PARENTAL EDUCATION VARIABLES IN 

. *THE NCDS AND BCS.

. tab male if (cohort==1) & (samplenssec == 0)

       male |      Freq.     Percent        Cum.
------------+-----------------------------------
      0. No |      4,700       48.87       48.87
     1. Yes |      4,917       51.13      100.00
------------+-----------------------------------
      Total |      9,617      100.00

. tab male if (cohort==2) & (samplenssec == 0)

       male |      Freq.     Percent        Cum.
------------+-----------------------------------
      0. No |      3,937       48.61       48.61
     1. Yes |      4,162       51.39      100.00
------------+-----------------------------------
      Total |      8,099      100.00

. 
. tab parented if (cohort==1) & (samplenssec == 0)

   Parent's |
    Highest |
  Education |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |      6,964       72.41       72.41
          2 |      2,123       22.08       94.49
          3 |        136        1.41       95.90
          4 |        394        4.10      100.00
------------+-----------------------------------
      Total |      9,617      100.00

. tab parented if (cohort==2) & (samplenssec == 0)

   Parent's |
    Highest |
  Education |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |      4,162       51.39       51.39
          2 |      2,773       34.24       85.63
          3 |        477        5.89       91.52
          4 |        687        8.48      100.00
------------+-----------------------------------
      Total |      8,099      100.00

. 
. tab dadnssec if (cohort==1) & (samplenssec == 0)

                         Father's NSSEC |      Freq.     Percent        Cum.
----------------------------------------+-----------------------------------
1. Large Employers and Higher Manageria |        296        3.08        3.08
                 2. Higher Professional |        447        4.65        7.73
   3. Lower managerial and professional |      1,125       11.70       19.42
                        4. Intermediate |        898        9.34       28.76
     5. Small employers and own account |      1,193       12.41       41.17
     6. Lower Supervisory and Technical |      1,589       16.52       57.69
                        7. Semi-Routine |      1,714       17.82       75.51
                             8. Routine |      2,355       24.49      100.00
----------------------------------------+-----------------------------------
                                  Total |      9,617      100.00

. tab dadnssec if (cohort==2) & (samplenssec == 0)

                         Father's NSSEC |      Freq.     Percent        Cum.
----------------------------------------+-----------------------------------
1. Large Employers and Higher Manageria |        371        4.58        4.58
                 2. Higher Professional |        483        5.96       10.54
   3. Lower managerial and professional |      1,215       15.00       25.55
                        4. Intermediate |        737        9.10       34.65
     5. Small employers and own account |      1,037       12.80       47.45
     6. Lower Supervisory and Technical |      1,449       17.89       65.34
                        7. Semi-Routine |      1,188       14.67       80.01
                             8. Routine |      1,619       19.99      100.00
----------------------------------------+-----------------------------------
                                  Total |      8,099      100.00

. 
. * return to jupyter

Table 2

Save this dataset which contains our analytical sample for the complete records analysis.

In [84]:
keep cohort ability ncds_male bcs_male male pcascore ncds_panssec bcs_panssec dadnssec nsinteraction ncds_paed_cat ncds_moed_cat ncds_parented ncds_parented bcs_parented parented  mumage parity married samplenssec sweep0outcome sweeptestoutcome deadtestoutcome cohort ncdsid bcsid poolid ncds_region ncds0_olddadrgsc ncds0_country ncds_mumagebirth ncds_parity ncds_married ncds_male ncds_paed_cat ncds0_olddadrgsc ncds_moed_cat ncds_region bcs_male bcs0_country bcs_paed bcs_moed bcs_region bcs_mumagefirstbirth bcs_mumagebirth bcs_mummarried bcs_parity bcs_mothercraft bcs_labourclass bcs_breast ncds_0outcome bcs_0outcome ncds_11outcome bcs_10outcome n539 n1225 n2393 n2394

save $path3\pooledNCDSBCS_v2.dta, replace

* return to jupyter
. keep cohort ability ncds_male bcs_male male pcascore ncds_panssec bcs_panssec dadnssec nsinteraction ncds_paed_cat ncds_moed_cat ncds_pare
> nted ncds_parented bcs_parented parented  mumage parity married samplenssec sweep0outcome sweeptestoutcome deadtestoutcome cohort ncdsid b
> csid poolid ncds_region ncds0_olddadrgsc ncds0_country ncds_mumagebirth ncds_parity ncds_married ncds_male ncds_paed_cat ncds0_olddadrgsc 
> ncds_moed_cat ncds_region bcs_male bcs0_country bcs_paed bcs_moed bcs_region bcs_mumagefirstbirth bcs_mumagebirth bcs_mummarried bcs_parit
> y bcs_mothercraft bcs_labourclass bcs_breast ncds_0outcome bcs_0outcome ncds_11outcome bcs_10outcome n539 n1225 n2393 n2394

. 
. save $path3\pooledNCDSBCS_v2.dta, replace
file F:\Data\MYDATA\TEMP\pooledNCDSBCS_v2.dta saved

. 
. * return to jupyter

Descriptive Results

The relationship between father’s social class (NS-SEC) and children’s cognitive ability test scores is reported in table 3. There is very clear evidence of a social class effect and, on average, children with more occupationally advantaged fathers have higher cognitive ability test scores in both cohorts. The difference between the children with the most advantaged fathers (NS-SEC 1.1, e.g. a chief executive officer) and the least advantaged fathers (NS-SEC 7, e.g. a construction labourer) is on average 13 points for those in the NCDS cohort, and 11 points for those in the BCS cohort. The greatest differences are observed between children with fathers in NS-SEC 1.2 (e.g. university professors) and children with fathers in NS-SEC 7 (e.g. a construction labourer). These differences are on average 14 points in the NCDS and 15 points in the BCS, which is approximately one standard deviation for both cohorts.

In [85]:
use $path3\pooledNCDSBCS_v2.dta, clear

* return to jupyter
. use $path3\pooledNCDSBCS_v2.dta, clear

. 
. * return to jupyter

In [86]:
* TABLE 3: MEAN AND STANDARD DEVIATION OF ABILITY TEST SCORES BY FATHERS NS-SEC.
tab dadnssec if (cohort==1)&(samplenssec==0), summarize(ability)
tab dadnssec if (cohort==2)&(samplenssec==0), summarize(ability)

* return to jupyter
. * TABLE 3: MEAN AND STANDARD DEVIATION OF ABILITY TEST SCORES BY FATHERS NS-SEC.

. tab dadnssec if (cohort==1)&(samplenssec==0), summarize(ability)

   Father's |    Summary of Ability Test Score
      NSSEC |        Mean   Std. Dev.       Freq.
------------+------------------------------------
  1. Large  |   108.54637    12.81598         296
  2. Higher |   109.77293   12.447527         447
  3. Lower  |   107.58185   13.092885       1,125
  4. Interm |   104.93596   13.638081         898
  5. Small  |   100.19477   14.394405       1,193
  6. Lower  |   100.44714   14.542212       1,589
  7. Semi-R |   98.630747   14.160227       1,714
  8. Routin |   95.696471   14.372789       2,355
------------+------------------------------------
      Total |   100.86529   14.709098       9,617

. tab dadnssec if (cohort==2)&(samplenssec==0), summarize(ability)

   Father's |    Summary of Ability Test Score
      NSSEC |        Mean   Std. Dev.       Freq.
------------+------------------------------------
  1. Large  |   106.46699   13.788343         371
  2. Higher |   110.30348   12.838866         483
  3. Lower  |   106.39991    14.26658       1,215
  4. Interm |   104.66767   13.445025         737
  5. Small  |    99.48955   14.141566       1,037
  6. Lower  |   99.546757   13.969552       1,449
  7. Semi-R |   97.741444   14.447238       1,188
  8. Routin |    95.09387   14.182694       1,619
------------+------------------------------------
      Total |   100.83708   14.765141       8,099

. 
. * return to jupyter

Table 3



Missing Data

Missing data in the cohort studies has the potential to induce bias into estimation within some analyses. As Carpenter and Kenward (2012) strongly advise we first conduct a complete records analysis, followed by a series of principled approaches to handling missing data. The NCDS and BCS do not include non-response weights in the deposited datasets. We construct inverse probability weights (IPW) in an attempt to reduce bias in the complete records analysis due to attrition (Höfler et al., 2005). We also undertake multiple imputation by chained equations (see Little and Rubin, 2014), and we use multiple imputation and inverse probability weights in combination (see Seaman et al., 2012). The substantive conclusions of the models using these different missing data strategies are largely consistent but this could not have been know a priori. We focus our discussion on the more sophisticated models, which use multiple imputation and inverse probability weights to provide improved adjustments in the presence of missing data.

An important innovation in the present work is that details of the complete modelling process and outputs are provided within the Jupyter notebook. There is no single agreed upon approach for handling missing data in large-scale surveys. There are alternative ways of specifying how datasets are multiply imputed, and therefore for the work to be reproducible it is essential to have clear documentation that facilitates the precise duplication of the datasets that are created.

Missing data techniques are at the cutting edge of statistical methods. It is highly likely that as statistical theory develops, the techniques and approaches that are currently prescribed may be modified. We also envisage that facilities within data analysis software will inevitably change. Therefore, we argue that there are obvious benefits to providing clearly documented information about the processes relating to handling missing data in order to enable the work to be reproducible at some point in the future.

In [87]:
* TABLE S1: PATTERNS OF UNIT MISSINGNESS FOR THE NCDS AND BCS.

*Present at the birth survey
tab ncds_0outcome
tab bcs_0outcome
*Outcome at the age 10/11 survey
tab ncds_11outcome
tab bcs_10outcome
*Deceased at age 10/11 Survey
tab deadtestoutcome
tab deadtestoutcome if (cohort==1)
tab deadtestoutcome if (cohort==2)

* return to jupyter
. * TABLE S1: PATTERNS OF UNIT MISSINGNESS FOR THE NCDS AND BCS.

. 
. *Present at the birth survey

. tab ncds_0outcome

   NCDS response outcome |
            1958 (age 0) |      Freq.     Percent        Cum.
-------------------------+-----------------------------------
           1. Productive |     17,415      100.00      100.00
-------------------------+-----------------------------------
                   Total |     17,415      100.00

. tab bcs_0outcome

    BCS response outcome |
            1970 (age 0) |      Freq.     Percent        Cum.
-------------------------+-----------------------------------
           1. Productive |     16,568      100.00      100.00
-------------------------+-----------------------------------
                   Total |     16,568      100.00

. *Outcome at the age 10/11 survey

. tab ncds_11outcome

   NCDS response outcome |
           1969 (age 11) |      Freq.     Percent        Cum.
-------------------------+-----------------------------------
           1. Productive |     14,574       83.69       83.69
              2. Refusal |        781        4.48       88.17
          3. Non-contact |        358        2.06       90.23
   4. Other unproductive |        195        1.12       91.35
7. Not Issued - Emigrant |        667        3.83       95.18
    8. Not Issued - Dead |        840        4.82      100.00
-------------------------+-----------------------------------
                   Total |     17,415      100.00

. tab bcs_10outcome

    BCS response outcome |
           1980 (age 10) |      Freq.     Percent        Cum.
-------------------------+-----------------------------------
           1. Productive |     13,757       83.03       83.03
   4. Other unproductive |      2,212       13.35       96.38
           6. Not Issued |          4        0.02       96.41
                 8. Dead |        595        3.59      100.00
-------------------------+-----------------------------------
                   Total |     16,568      100.00

. *Deceased at age 10/11 Survey

. tab deadtestoutcome

Dead at age |
      10/11 |
     survey |      Freq.     Percent        Cum.
------------+-----------------------------------
      0. No |     32,548       95.78       95.78
     1. Yes |      1,435        4.22      100.00
------------+-----------------------------------
      Total |     33,983      100.00

. tab deadtestoutcome if (cohort==1)

Dead at age |
      10/11 |
     survey |      Freq.     Percent        Cum.
------------+-----------------------------------
      0. No |     16,575       95.18       95.18
     1. Yes |        840        4.82      100.00
------------+-----------------------------------
      Total |     17,415      100.00

. tab deadtestoutcome if (cohort==2)

Dead at age |
      10/11 |
     survey |      Freq.     Percent        Cum.
------------+-----------------------------------
      0. No |     15,973       96.41       96.41
     1. Yes |        595        3.59      100.00
------------+-----------------------------------
      Total |     16,568      100.00

. 
. * return to jupyter

Table S1

In [88]:
* TABLE S2: PATTERNS OF ITEM MISSINGNESS FOR THE NCDS AND BCS DATA POOLED.
mvpatterns ability male parented dadnssec

* return to jupyter
. * TABLE S2: PATTERNS OF ITEM MISSINGNESS FOR THE NCDS AND BCS DATA POOLED.

. mvpatterns ability male parented dadnssec
Variable     | type     obs   mv   variable label
-------------+-----------------------------------------------
ability      | float  24828 9155   Ability Test Score
male         | float  33974    9   male
parented     | float  27778 6205   Parent's Highest Education
dadnssec     | float  2179112192   Father's NSSEC
-------------------------------------------------------------

Patterns of missing values

  +------------------------+
  | _pattern   _mv   _freq |
  |------------------------|
  |     ++++     0   17716 |
  |     +++.     1    4908 |
  |     .+..     3    3690 |
  |     .++.     2    2814 |
  |     .+++     1    2340 |
  |------------------------|
  |     ++.+     1    1433 |
  |     ++..     2     771 |
  |     .+.+     2     302 |
  |     ....     4       9 |
  +------------------------+

. 
. * return to jupyter

Table S2

As Carpenter and Kenward (2012) advise we first conduct a complete records analysis (see table S4), followed by a series of principled approaches to handling missing data.

  1. We construct inverse probability weights (IPW) in an attempt to reduce bias in the complete records analysis due to attrition (see table S5, model 1).

  2. We also undertake multiple imputation by chained equations (see table S5, model 2).

  3. We use multiple imputation and inverse probability weights in combination (results shown in main paper).

The substantive conclusions of the models using these different missing data strategies are largely consistent but this could not have been known a priori. We focus our discussion in the main article on the more sophisticated models, which use multiple imputation and inverse probability weights to provide improved adjustments in the presence of missing data. Missing data techniques are at the cutting edge of statistical methods. It is highly likely that as statistical theory develops, the techniques and approaches that are currently prescribed may be modified. We also envisage that facilities within data analysis software will inevitably change. Therefore, we argue that there are obvious benefits to providing clearly documented information about the processes relating to handling missing data in order to enable the work to be reproducible at some point in the future.


Inverse Probability Weights

We constructed inverse probability weights (IPW) in an attempt to reduce bias in the complete records analysis due to attrition (see Höfler, Pfister, Lieb, & Wittchen, 2005). To produce the inverse probability weights we first model whether a cohort member is present at the age 11 (NCDS) or the age 10 (BCS) sweep of the survey. We selected variables to predict this outcome based on their use in previous models of missingness in the cohort studies (see Mostafa & Wiggins, 2015; Plewis, Calderwood, Hawkes, & Nathan, 2004), and also the degree of missingness on these variables themselves. The variables used in these models are shown in table S3.

A very small per cent of variance in missingness at age 11/10 is accounted for by our models (less than 1 per cent in the NCDS, and 3 per cent in the BCS). This indicates that the predictive power of our models is weak and our attrition weights are unlikely to have a major impact on the results. We have made a best attempt however, with the available information, to construct suitable inverse probability weights. Including additional variables in the models of missingness at age 10/11 did not lead to large increases in pseudo R2 and led to a reduction in the number of observations included in the model due to item missingness. Mostafa and Wiggins (2015) argue that the use of metadata, such as interviewer characteristics and conditions surrounding the collection of the data, could account for more of the variance in unit missingness in the cohort studies. These metadata variables are currently not available in the deposited NCDS or BCS datasets available to researchers.

Following the models of missingness at age 10/11, we calculated predicted probabilities of observing the cohort member at age 10/11. The weight is the inverse of these predicted probabilities (Höfler et al., 2005). For 235 cases there was missing information that prevented the calculation of the probability of inclusion. In these cases a weight of 1 was allocated to ensure that these cases were included in the models and that the overall sample sizes would remain consistent.

Table S3

In [89]:
use $path3\pooledNCDSBCS_v2.dta, clear

* return to jupyter
. use $path3\pooledNCDSBCS_v2.dta, clear

. 
In [90]:
tab sweeptestoutcome if (cohort==1)
tab sweeptestoutcome if (cohort==2)

* return to jupyter
. tab sweeptestoutcome if (cohort==1)

 Productive |
     at age |
      10/11 |
     survey |      Freq.     Percent        Cum.
------------+-----------------------------------
      0. No |      2,841       16.31       16.31
     1. Yes |     14,574       83.69      100.00
------------+-----------------------------------
      Total |     17,415      100.00

. tab sweeptestoutcome if (cohort==2)

 Productive |
     at age |
      10/11 |
     survey |      Freq.     Percent        Cum.
------------+-----------------------------------
      0. No |      2,811       16.97       16.97
     1. Yes |     13,757       83.03      100.00
------------+-----------------------------------
      Total |     16,568      100.00

. 
In [91]:
*Potential explanatory variables in the missingness model for the NCDS (1958 Cohort).
tab ncds_region
tab ncds0_olddadrgsc
tab ncds0_country, mi
summ ncds_mumagebirth
summ ncds_parity
summ ncds_married
summ ncds_male
summ ncds_paed_cat
summ ncds0_olddadrgsc
summ ncds_moed_cat
summ ncds_region

* return to jupyter
. *Potential explanatory variables in the missingness model for the NCDS (1958 Cohort).

. tab ncds_region

     Region at PMS |
    (1958) - Birth |      Freq.     Percent        Cum.
-------------------+-----------------------------------
          1. North |      1,234        7.09        7.09
     2. North West |      2,295       13.18       20.26
   3. E & W.Riding |      1,433        8.23       28.49
 4. North Midlands |      1,299        7.46       35.95
       5. Midlands |      1,648        9.46       45.41
           6. East |      1,242        7.13       52.55
     7. South East |      3,444       19.78       72.32
          8. South |        955        5.48       77.81
     9. South West |        966        5.55       83.35
         10. Wales |        914        5.25       88.60
      11. Scotland |      1,985       11.40      100.00
-------------------+-----------------------------------
             Total |     17,415      100.00

. tab ncds0_olddadrgsc

 NCDS Birth |
   Dad RGSC |
 Old Coding |      Freq.     Percent        Cum.
------------+-----------------------------------
       1. I |        746        4.53        4.53
      2. II |      2,133       12.96       17.49
  3. III NM |      1,592        9.67       27.17
   4. III M |      8,376       50.89       78.06
      5. IV |      1,995       12.12       90.18
       6. V |      1,616        9.82      100.00
------------+-----------------------------------
      Total |     16,458      100.00

. tab ncds0_country, mi

 NCDS Age 0 |
    Country |      Freq.     Percent        Cum.
------------+-----------------------------------
 1. England |     14,516       42.72       42.72
   2. Wales |        914        2.69       45.41
3. Scotland |      1,985        5.84       51.25
          . |     16,568       48.75      100.00
------------+-----------------------------------
      Total |     33,983      100.00

. summ ncds_mumagebirth

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
ncds_mumag~h |     17,402    27.45702     5.72552          8         48

. summ ncds_parity

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
 ncds_parity |     17,412    1.316219    1.560322          0          9

. summ ncds_married

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
ncds_married |     17,405    .9573111    .2021606          0          1

. summ ncds_male

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
   ncds_male |     17,412    .5169423    .4997272          0          1

. summ ncds_paed_cat

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
ncds_paed_~t |     13,950    1.272688    .6484122          1          4

. summ ncds0_olddadrgsc

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
ncds0_oldd~c |     16,458    3.825677    1.227504          1          6

. summ ncds_moed_cat

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
ncds_moed_~t |     10,798    1.261993     .578698          1          4

. summ ncds_region

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
 ncds_region |     17,415    5.881596    3.099308          1         11

. 
. * return to jupyter

In [92]:
*Final missingness model selected
logit sweeptestoutcome ncds_mumagebirth ncds_parity ncds_married ncds_male ib7.ncds_region if (cohort==1), allbaselevels

fitstat

* return to jupyter
. *Final missingness model selected

. logit sweeptestoutcome ncds_mumagebirth ncds_parity ncds_married ncds_male ib7.ncds_region if (cohort==1), allbaselevels

Iteration 0:   log likelihood = -7733.3828  
Iteration 1:   log likelihood = -7700.5492  
Iteration 2:   log likelihood =  -7700.044  
Iteration 3:   log likelihood = -7700.0439  

Logistic regression                             Number of obs     =     17,395
                                                LR chi2(14)       =      66.68
                                                Prob > chi2       =     0.0000
Log likelihood = -7700.0439                     Pseudo R2         =     0.0043

------------------------------------------------------------------------------------
  sweeptestoutcome |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
  ncds_mumagebirth |   -.001769   .0041896    -0.42   0.673    -.0099805    .0064424
       ncds_parity |   .0185246     .01558     1.19   0.234    -.0120116    .0490608
      ncds_married |   .4654806    .089686     5.19   0.000     .2896992    .6412619
         ncds_male |  -.0631524   .0412148    -1.53   0.125     -.143932    .0176273
                   |
       ncds_region |
         1. North  |   .4368775   .0971786     4.50   0.000     .2464111     .627344
    2. North West  |   .0769387   .0712456     1.08   0.280    -.0627002    .2165776
  3. E & W.Riding  |   .2697409   .0874209     3.09   0.002      .098399    .4410828
4. North Midlands  |   .0849654   .0864225     0.98   0.326    -.0844197    .2543504
      5. Midlands  |   .1899264   .0814421     2.33   0.020     .0303029    .3495499
          6. East  |   .1562596   .0894535     1.75   0.081     -.019066    .3315852
    7. South East  |          0  (base)
         8. South  |   .0703747    .096803     0.73   0.467    -.1193557    .2601051
    9. South West  |   .1696321   .0989112     1.71   0.086    -.0242303    .3634945
        10. Wales  |   .3239034   .1060707     3.05   0.002     .1160087    .5317982
     11. Scotland  |   .0284434   .0738455     0.39   0.700     -.116291    .1731779
                   |
             _cons |   1.124436   .1379711     8.15   0.000     .8540176    1.394854
------------------------------------------------------------------------------------

. 
. fitstat

Measures of Fit for logit of sweeptestoutcome

Log-Lik Intercept Only:    -7733.383     Log-Lik Full Model:        -7700.044
D(17379):                  15400.088     LR(14):                       66.678
                                         Prob > LR:                     0.000
McFadden's R2:                 0.004     McFadden's Adj R2:             0.002
Maximum Likelihood R2:         0.004     Cragg & Uhler's R2:            0.006
McKelvey and Zavoina's R2:     0.008     Efron's R2:                    0.004
Variance of y*:                3.317     Variance of error:             3.290
Count R2:                      0.837     Adj Count R2:                  0.000
AIC:                           0.887     AIC*n:                     15432.088
BIC:                     -154287.392     BIC':                         70.017

. 
. * return to jupyter

In [93]:
*Create a variable for the predicted probability of missingness from this model.
capture drop pp_ncds
predict pp_ncds
summ pp_ncds

* return to jupyter
. *Create a variable for the predicted probability of missingness from this model.

. capture drop pp_ncds

. predict pp_ncds
(option pr assumed; Pr(sweeptestoutcome))
(16588 missing values generated)

. summ pp_ncds

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     pp_ncds |     17,395    .8370221    .0231933   .7305862   .8932639

. 
. * return to jupyter

In [94]:
*Potential explanatory variables in the missingness model for the BCS (1970 Cohort).
tab bcs_male
tab bcs0_country
tab bcs_paed
tab bcs_moed
tab bcs_region
tab bcs_mumagefirstbirth
tab bcs_mumagebirth
tab bcs_mummarried
tab bcs_parity
tab bcs_mothercraft
tab bcs_labourclass
tab bcs_breast

* return to jupyter
. *Potential explanatory variables in the missingness model for the BCS (1970 Cohort).

. tab bcs_male

 BCS Cohort |
member Male |      Freq.     Percent        Cum.
------------+-----------------------------------
      0. No |      7,975       48.15       48.15
     1. Yes |      8,587       51.85      100.00
------------+-----------------------------------
      Total |     16,562      100.00

. tab bcs0_country

   1970: Country of |
        Interview   |      Freq.     Percent        Cum.
--------------------+-----------------------------------
         1. England |     14,072       84.93       84.93
           2. Wales |        879        5.31       90.24
        3. Scotland |      1,617        9.76      100.00
--------------------+-----------------------------------
              Total |     16,568      100.00

. tab bcs_paed

        BCS |
   Father's |
  Education |
 Categories |      Freq.     Percent        Cum.
------------+-----------------------------------
    1. Comp |      7,806       65.23       65.23
2. Comp+1-3 |      2,843       23.76       88.99
3. Comp+4-5 |        525        4.39       93.38
 4. Comp+6+ |        792        6.62      100.00
------------+-----------------------------------
      Total |     11,966      100.00

. tab bcs_moed

        BCS |
   Mother's |
  Education |
 Categories |      Freq.     Percent        Cum.
------------+-----------------------------------
    1. Comp |      8,279       65.50       65.50
2. Comp+1-3 |      3,466       27.42       92.93
3. Comp+4-5 |        461        3.65       96.57
 4. Comp+6+ |        433        3.43      100.00
------------+-----------------------------------
      Total |     12,639      100.00

. tab bcs_region

    BCS Region at Birth |      Freq.     Percent        Cum.
------------------------+-----------------------------------
               1. North |      1,023        6.17        6.17
2. Yorks and Humberside |      1,486        8.97       15.14
       3. East Midlands |      1,036        6.25       21.40
         4. East Anglia |        539        3.25       24.65
          5. South East |      5,022       30.31       54.96
          6. South West |      1,051        6.34       61.30
       7. West Midlands |      1,745       10.53       71.84
          8. North West |      2,170       13.10       84.93
               9. Wales |        879        5.31       90.24
           10. Scotland |      1,617        9.76      100.00
------------------------+-----------------------------------
                  Total |     16,568      100.00

. tab bcs_mumagefirstbirth

   BCS Mother's Age at First |
                       Birth |      Freq.     Percent        Cum.
-----------------------------+-----------------------------------
                          12 |          1        0.01        0.01
                          13 |          5        0.03        0.04
                          14 |         33        0.20        0.24
                          15 |        126        0.77        1.00
                          16 |        444        2.70        3.70
                          17 |        929        5.65        9.35
                          18 |      1,387        8.43       17.78
                          19 |      1,608        9.77       27.55
                          20 |      1,793       10.90       38.45
                          21 |      1,796       10.92       49.37
                          22 |      1,775       10.79       60.16
                          23 |      1,469        8.93       69.09
                          24 |      1,232        7.49       76.58
                          25 |      1,041        6.33       82.91
                          26 |        755        4.59       87.50
                          27 |        553        3.36       90.86
                          28 |        393        2.39       93.25
                          29 |        300        1.82       95.07
                          30 |        214        1.30       96.37
                          31 |        141        0.86       97.23
                          32 |        126        0.77       97.99
                          33 |         81        0.49       98.49
                          34 |         68        0.41       98.90
                          35 |         50        0.30       99.20
                          36 |         30        0.18       99.39
                          37 |         24        0.15       99.53
                          38 |         24        0.15       99.68
                          39 |         17        0.10       99.78
                          40 |         19        0.12       99.90
                          41 |          4        0.02       99.92
                          42 |          7        0.04       99.96
                          43 |          3        0.02       99.98
                          45 |          1        0.01       99.99
                          46 |          1        0.01       99.99
                          47 |          1        0.01      100.00
-----------------------------+-----------------------------------
                       Total |     16,451      100.00

. tab bcs_mumagebirth

  BCS Mother's Age at Cohort |
              Member's Birth |      Freq.     Percent        Cum.
-----------------------------+-----------------------------------
                          14 |          2        0.01        0.01
                          15 |         26        0.16        0.17
                          16 |        130        0.79        0.96
                          17 |        300        1.81        2.77
                          18 |        509        3.08        5.85
                          19 |        670        4.05        9.90
                          20 |        902        5.45       15.35
                          21 |      1,059        6.40       21.76
                          22 |      1,315        7.95       29.71
                          23 |      1,446        8.74       38.46
                          24 |      1,183        7.15       45.61
                          25 |      1,265        7.65       53.26
                          26 |      1,156        6.99       60.25
                          27 |      1,071        6.48       66.73
                          28 |        823        4.98       71.70
                          29 |        790        4.78       76.48
                          30 |        695        4.20       80.68
                          31 |        591        3.57       84.26
                          32 |        476        2.88       87.14
                          33 |        392        2.37       89.51
                          34 |        344        2.08       91.59
                          35 |        302        1.83       93.41
                          36 |        220        1.33       94.74
                          37 |        208        1.26       96.00
                          38 |        175        1.06       97.06
                          39 |        150        0.91       97.97
                          40 |        125        0.76       98.72
                          41 |         80        0.48       99.21
                          42 |         59        0.36       99.56
                          43 |         33        0.20       99.76
                          44 |         20        0.12       99.89
                          45 |          6        0.04       99.92
                          46 |          6        0.04       99.96
                          47 |          2        0.01       99.97
                          49 |          1        0.01       99.98
                          50 |          1        0.01       99.98
                          51 |          1        0.01       99.99
                          52 |          1        0.01       99.99
                          53 |          1        0.01      100.00
-----------------------------+-----------------------------------
                       Total |     16,536      100.00

. tab bcs_mummarried

 BCS Mother |
 married at |
     Cohort |
   Member's |
      Birth |      Freq.     Percent        Cum.
------------+-----------------------------------
      0. No |        978        5.97        5.97
     1. Yes |     15,408       94.03      100.00
------------+-----------------------------------
      Total |     16,386      100.00

. tab bcs_parity

BCS Parity at Birth |      Freq.     Percent        Cum.
--------------------+-----------------------------------
                  0 |      6,187       37.41       37.41
                  1 |      5,356       32.38       69.79
                  2 |      2,689       16.26       86.05
                  3 |      1,204        7.28       93.33
                  4 |        568        3.43       96.77
                  5 |        267        1.61       98.38
                  6 |        130        0.79       99.17
                  7 |         65        0.39       99.56
                  8 |         30        0.18       99.74
                  9 |         23        0.14       99.88
                 10 |          8        0.05       99.93
                 11 |          6        0.04       99.96
                 12 |          3        0.02       99.98
                 13 |          2        0.01       99.99
                 14 |          1        0.01      100.00
--------------------+-----------------------------------
              Total |     16,539      100.00

. tab bcs_mothercraft

 BCS Mother |
   Attended |
Mothercraft |
    Classes |      Freq.     Percent        Cum.
------------+-----------------------------------
      0. No |     12,038       73.18       73.18
     1. Yes |      4,412       26.82      100.00
------------+-----------------------------------
      Total |     16,450      100.00

. tab bcs_labourclass

 BCS Mother |
   Attended |
     Labour |
    Classes |      Freq.     Percent        Cum.
------------+-----------------------------------
      0. No |     12,089       73.57       73.57
     1. Yes |      4,344       26.43      100.00
------------+-----------------------------------
      Total |     16,433      100.00

. tab bcs_breast

 BCS Mother |
  Attempted |
     Breast |
    Feeding |      Freq.     Percent        Cum.
------------+-----------------------------------
      0. No |     10,123       61.91       61.91
     1. Yes |      6,227       38.09      100.00
------------+-----------------------------------
      Total |     16,350      100.00

. 
. * return to jupyter

In [95]:
*Final missingness model selected
logit sweeptestoutcome bcs_male ib5.bcs_region bcs_mumagebirth bcs_mummarried bcs_parity if (cohort==2)

fitstat

* return to jupyter
. *Final missingness model selected

. logit sweeptestoutcome bcs_male ib5.bcs_region bcs_mumagebirth bcs_mummarried bcs_parity if (cohort==2)

Iteration 0:   log likelihood = -7391.3181  
Iteration 1:   log likelihood =  -7219.078  
Iteration 2:   log likelihood = -7199.8066  
Iteration 3:   log likelihood = -7199.7939  
Iteration 4:   log likelihood = -7199.7939  

Logistic regression                             Number of obs     =     16,353
                                                LR chi2(13)       =     383.05
                                                Prob > chi2       =     0.0000
Log likelihood = -7199.7939                     Pseudo R2         =     0.0259

------------------------------------------------------------------------------------------
        sweeptestoutcome |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------------------+----------------------------------------------------------------
                bcs_male |  -.0560849   .0425418    -1.32   0.187    -.1394653    .0272954
                         |
              bcs_region |
               1. North  |   .6300561   .1022219     6.16   0.000      .429705    .8304073
2. Yorks and Humberside  |   .4836959   .0833184     5.81   0.000     .3203948     .646997
       3. East Midlands  |   .3774229   .0941076     4.01   0.000     .1929754    .5618705
         4. East Anglia  |   .3941494   .1272544     3.10   0.002     .1447354    .6435635
          6. South West  |   .4135188   .0948548     4.36   0.000     .2276067    .5994309
       7. West Midlands  |   .3147517   .0752246     4.18   0.000     .1673142    .4621891
          8. North West  |   .3327414   .0690646     4.82   0.000     .1973772    .4681056
               9. Wales  |   .7710295   .1142771     6.75   0.000     .5470505    .9950086
           10. Scotland  |   .4202723   .0784915     5.35   0.000     .2664319    .5741128
                         |
         bcs_mumagebirth |   .0027454   .0046279     0.59   0.553    -.0063251    .0118159
          bcs_mummarried |   1.216936   .0730184    16.67   0.000     1.073823     1.36005
              bcs_parity |  -.0547461     .01739    -3.15   0.002      -.08883   -.0206623
                   _cons |    .223148   .1224326     1.82   0.068    -.0168154    .4631115
------------------------------------------------------------------------------------------

. 
. fitstat

Measures of Fit for logit of sweeptestoutcome

Log-Lik Intercept Only:    -7391.318     Log-Lik Full Model:        -7199.794
D(16338):                  14399.588     LR(13):                      383.048
                                         Prob > LR:                     0.000
McFadden's R2:                 0.026     McFadden's Adj R2:             0.024
Maximum Likelihood R2:         0.023     Cragg & Uhler's R2:            0.039
McKelvey and Zavoina's R2:     0.040     Efron's R2:                    0.027
Variance of y*:                3.427     Variance of error:             3.290
Count R2:                      0.832     Adj Count R2:                 -0.000
AIC:                           0.882     AIC*n:                     14429.588
BIC:                     -144114.411     BIC':                       -256.920

. 
. * return to jupyter

In [96]:
*Create a variable for the predicted probability of missingness from this model.
capture drop pp_bcs
predict pp_bcs
summ pp_bcs

* return to jupyter
. *Create a variable for the predicted probability of missingness from this model.

. capture drop pp_bcs

. predict pp_bcs
(option pr assumed; Pr(sweeptestoutcome))
(17630 missing values generated)

. summ pp_bcs

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
      pp_bcs |     16,353    .8324466    .0616245   .4754927   .9099228

. 
. * return to jupyter

In [97]:
*Creating the Inverse Probability Weights from the Predicted probabilities
* created above (pp_ncds and pp_bcs).
capture drop ipw
gen ipw = .
replace ipw = 1/pp_ncds if (cohort==1)
replace ipw = 1/pp_bcs if (cohort==2)
label variable ipw "Inverse Probability Weight

* return to jupyter
. *Creating the Inverse Probability Weights from the Predicted probabilities

. * created above (pp_ncds and pp_bcs).

. capture drop ipw

. gen ipw = .
(33,983 missing values generated)

. replace ipw = 1/pp_ncds if (cohort==1)
(17,395 real changes made)

. replace ipw = 1/pp_bcs if (cohort==2)
(16,353 real changes made)

. label variable ipw "Inverse Probability Weight

. 
. * return to jupyter

In [98]:
summ ipw

* return to jupyter
. summ ipw

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         ipw |     33,748    1.202474    .0843147   1.098994   2.103082

. 
In [99]:
*Those who have missing predicted probabilities (i.e. have missing
*  data in the model) are given a weight of 1.
recode ipw (.=1)

* return to jupyter
. *Those who have missing predicted probabilities (i.e. have missing

. *  data in the model) are given a weight of 1.

. recode ipw (.=1)
(ipw: 235 changes made)

. 
In [100]:
*Drop the predicted values, these were only needed in creating the IPWs.
drop pp_ncds pp_bcs

* return to jupyter
. *Drop the predicted values, these were only needed in creating the IPWs.

. drop pp_ncds pp_bcs

. 
. * return to jupyter

In [101]:
save $path3\pooledNCDSBCS_v3.dta, replace

* return to jupyter
. save $path3\pooledNCDSBCS_v3.dta, replace
file F:\Data\MYDATA\TEMP\pooledNCDSBCS_v3.dta saved

. 
. * return to jupyter


Multiple Imputation

We implemented Multiple Imputation using the mi commands in Stata SE 14.1. In multiple imputation a number of plausible values are computed that better represent the uncertainty around the missing value. This method imputes multiple values for each missing variable from a cycle of regression models. These datasets are then pooled when analyses are undertaken using ‘Rubin’s rules’ (see Little & Rubin, 2014). We imputed 10 datasets. Our missing data model included all of the variables in the explanatory model of interest as well as parity and mother’s marital status (see table S3 for details of these additional variables). We imputed missing values on both our explanatory and dependent variables. There has been some debate concerning the imputation of missing values for dependent variables (see Von Hippel, 2007). Recent methodological research recommends that imputed values on the dependent variable should not be deleted prior to analysis, and this is the approach we have taken in this article (see Sullivan, Salter, Ryan, & Lee, 2015). We carried out multiple imputation on the missing data from all cohort members present at the first sweep of the studies, and we deleted deceased cohort members before the substantive analysis.

When computing the interaction term in our models (father’s NS-SEC x cohort) we have a scenario where there is missing data on the father’s NS-SEC variable but no missing data on the cohort variable. Because cohort is not missing and has only two levels (NCDS or BCS) we carried out multiple imputation separately for each cohort (see Carpenter & Kenward, 2012, pp. 148-150; Enders, 2010, pp. 267-268). We then created the interaction term following the multiple imputation.

We also use multiple imputation and inverse probability weights in combination (see Seaman, White, Copas, & Li, 2012). This is carried out by imputing the data as described above, then deleting those cases who were not present in the age 10/11 surveys. In the analytical model we combine the imputed datasets, as described above, and adjust the analyses using the inverse probability weights. Seaman et al. (2012) note that whilst combining multiple imputation and inverse probability weights will generally have no advantages if the imputation models are correctly specified, the combination of multiple imputation and inverse probability weights can act as a robustness check. The strategy of combining multiple imputation and inverse probability weights has been used previously in the analysis of the National Child Development Study (Caldwell et al., 2008; Stansfeld, Clark, Caldwell, Rodgers, & Power, 2008), however we do not have access to the inverse probability weights used in these previous studies.

Additional Notes:

We impute the datasets using similar variables available in both datasets and used in the missingness models used to prepare the inverse probability weights described above.

There are many approaches that can be taken to imputing an interaction.

In our case the interaction involved one variable with no missingness (cohort) and one variable with missingness (the father's NS-SEC variable). Due to this special circumstance we have chosen to split the data by the values of the fully observed variable (cohort) and separately impute the subsets of the data. The interaction is then created after the imputation.

see:

Multiple Imputation in Stata:Creating Imputation Models

We could consider cohort as equivalent to race in this description:

"For example, suppose you're regressing income on education, experience, and black (an indicator for "subject is black"), but think the returns to education vary by race and thus include black##c.education in the regression. The just another variable approach would create a variable edblack=black*race and impute it, but it's possible for the model to impute a zero for black and a non-zero value for edblack. There's no indication this would cause problems in the analysis model, however.

An alternative would be to add the by(black) option to the imputation command, so that whites and blacks are imputed separately. This would allow you to use black##c.education in your analysis model without bias (and it would always correspond to the actual values of black and education). However, running two separate imputation models allows the returns to experience to vary by race in the imputation model, not just education. If you had strong theoretical reasons to believe that was not the case (which is unlikely) that would be a specification problem. A far more more common problem is small sample size: make sure each of your by() groups is big enough for reasonable regressions."

See also:

Multiple imputation with interactions and non-linear terms

See also:

Carpenter, James, and Michael Kenward. Multiple imputation and its application. John Wiley & Sons, 2012. chapter 7.

"If Y3 is fully observed, but Y1, Y2 are partially observed then since equation 7.1 fits a straight line for each group defined by Y3 we can impute as followed:

1) Divide the data into two groups by values of binary Y3. 2) Separately in each group, impute (Y1,Y2) using a bivariate normal model or FCS equivalent, creating K imputed datasets. 3) For k = 1, ...., K append the imputed datasets for the two groups, to give k imputed datasets.

This approach, of imputing separately in the groups defined by categorical variables in the interaction, is by far the simplest approach; clearly we can have more than the two groups in the above discussion. The imputation groups may be defined by levels of a single vategorical variable, or the interaction of categorical variables. The only requirement is that these variables be fully observed on each unit."

After running the multiple imputations we will delete those cohort members known to be dead at the age 10/11 sweeps (deadtestoutcome).

In [102]:
use $path3\pooledNCDSBCS_v3.dta, clear

* return to jupyter
. use $path3\pooledNCDSBCS_v3.dta, clear

. 
. * return to jupyter

In [103]:
set seed 1485

keep ability male parented dadnssec cohort ipw  parity married poolid deadtestoutcome sweeptestoutcome samplenssec

* return to jupyter
. set seed 1485

. 
. keep ability male parented dadnssec cohort ipw  parity married poolid deadtestoutcome sweeptestoutcome samplenssec

. 
. * return to jupyter

We get an error when we run the multiple imputation "invalid name".

I think this is because the label for NS-SEC is too long.

Dropping the NS-SEC label solves the problem so we drop it here.

In [104]:
label drop nssec

* return to jupyter
. label drop nssec

. 
. * return to jupyter

We set the dataset to be an mi dataset, we use the mlong style which is described as "memory efficient".

see: Multiple-imputation analysis using Stata’s mi command.

In [105]:
mi set mlong

* return to jupyter
. mi set mlong

. 
. * return to jupyter

mi register imputed identifies which variables in the imputation model have missing information and will be imputed

In [106]:
mi register imputed ability male parented dadnssec parity married

* return to jupyter
. mi register imputed ability male parented dadnssec parity married
(16325 m=0 obs. now marked as incomplete)

. 

mi register regular identifies variables which are the same values in the imputed data and the original data (i.e. that don't have missing values)

In [107]:
mi register regular cohort ipw poolid deadtestoutcome sweeptestoutcome samplenssec

* return to jupyter
. mi register regular cohort ipw poolid deadtestoutcome sweeptestoutcome samplenssec

. 
. * return to jupyter

We create 60 imputed datasets.

In [108]:
mi impute chained (reg) ability parity (logit) male married (mlogit) parented dadnssec, add(60) rseed(1485) by(cohort)

* return to jupyter
. mi impute chained (reg) ability parity (logit) male married (mlogit) parented dadnssec, add(60) rseed(1485) by(cohort)

Performing setup for each by() group:

-> cohort = 1. NCDS
Conditional models:
            parity: regress parity i.male i.married i.parented ability i.dadnssec
              male: logit male parity i.married i.parented ability i.dadnssec
           married: logit married parity i.male i.parented ability i.dadnssec
          parented: mlogit parented parity i.male i.married ability i.dadnssec
           ability: regress ability parity i.male i.married i.parented i.dadnssec
          dadnssec: mlogit dadnssec parity i.male i.married i.parented ability

-> cohort = 2. BCS
Conditional models:
              male: logit male parity i.married i.parented ability i.dadnssec
            parity: regress parity i.male i.married i.parented ability i.dadnssec
           married: logit married i.male parity i.parented ability i.dadnssec
          parented: mlogit parented i.male parity i.married ability i.dadnssec
           ability: regress ability i.male parity i.married i.parented i.dadnssec
          dadnssec: mlogit dadnssec i.male parity i.married i.parented ability

Performing imputation for each by() group:

-> cohort = 1. NCDS
Performing chained iterations ...

-> cohort = 2. BCS
Performing chained iterations ...

Multivariate imputation                     Imputations =       60
Chained equations                                 added =       60
Imputed: m=1 through m=60                       updated =        0

Initialization: monotone                     Iterations =      600
                                                burn-in =       10

           ability: linear regression
            parity: linear regression
              male: logistic regression
           married: logistic regression
          parented: multinomial logistic regression
          dadnssec: multinomial logistic regression

------------------------------------------------------------------
                   |               Observations per m             
by()               |----------------------------------------------
          Variable |   Complete   Incomplete   Imputed |     Total
-------------------+-----------------------------------+----------
cohort = 1. NCDS   |                                   |
           ability |      13440         3975      3975 |     17415
            parity |      17412            3         3 |     17415
              male |      17412            3         3 |     17415
           married |      17405           10        10 |     17415
          parented |      15083         2332      2332 |     17415
          dadnssec |      10598         6817      6817 |     17415
                   |                                   |
cohort = 2. BCS    |                                   |
           ability |      11388         5180      5180 |     16568
            parity |      16539           29        29 |     16568
              male |      16562            6         6 |     16568
           married |      16386          182       182 |     16568
          parented |      12695         3873      3873 |     16568
          dadnssec |      11193         5375      5375 |     16568
                   |                                   |
-------------------+-----------------------------------+----------
Overall            |                                   |
           ability |      24828         9155      9155 |     33983
            parity |      33951           32        32 |     33983
              male |      33974            9         9 |     33983
           married |      33791          192       192 |     33983
          parented |      27778         6205      6205 |     33983
          dadnssec |      21791        12192     12192 |     33983
------------------------------------------------------------------
(complete + incomplete = total; imputed is the minimum across m
 of the number of filled-in observations.)

. 
. * return to jupyter

In [109]:
save $path3\pooledNCDSBCS_v3_imputed.dta, replace

* return to jupyter
. save $path3\pooledNCDSBCS_v3_imputed.dta, replace
file F:\Data\MYDATA\TEMP\pooledNCDSBCS_v3_imputed.dta saved

. 
. * return to jupyter


Modelling Results

The models reported in table 4 are ordinary least squares (OLS) linear regression analyses of the cognitive ability test scores. The data from the two cohorts have been pooled, and the models include a dummy variable indicating cohort membership. Table 6, model 1 shows that boys have marginally lower cognitive ability test scores. Children of parents who have spent a longer period of time in education on average have higher cognitive ability test scores. There is also a large social class effect that is significant, net of parental education and gender 4.

Children from the least advantaged social class NS-SEC 7 (e.g. the daughter of a construction labourer) on average score 7 points lower than children from social class NS-SEC 3 (e.g. the daughter of a police officer). By contrast children from social class NS-SEC 1.2 (e.g. the daughter of a university lecturer) on average score 2 points higher than counterparts in NS-SEC 3. Similar socio-economic inequalities in cognitive test scores have previously been reported (see Shenkin et al., 2001; Lawlor et al., 2005; Feinstein, 2003; Sullivan et al., 2013).

In model 4 (table 6) we include an interaction term representing father’s NS-SEC and cohort, to investigate changes between the cohorts. Including the interaction term in the model does not improve the proportion of variance explained overall. We do not find systematic changes NS-SEC inequalities between the two cohorts.

Despite the overall lack of improvement in model fit when the interaction is included, we observe that there are some small statistically significant differences between the cohorts (table 4, model 2). To aid in the interpretation of the change in effect sizes between cohorts, a visualisation of this relationship is provided by a plot of the regression coefficients for father’s NS-SEC and 95 per cent quasi-variance comparison intervals 5 is provided (figure 1). Overall, in figure 1 there is no clear pattern of either increasing or decreasing social class inequalities between the two cohorts.

BCS members have marginally lower test scores, across all social class groups. We emphasise that the cognitive ability tests in the NCDS and the BCS are not identical, however the two measures are suitable for the current analysis because our focus is on relative social class inequalities within the two cohorts. The outcome variable in this model is constructed using arithmetic standardisation. The difference between the scores in the NCDS and the BCS in this analysis should not therefore be understood as a direct assessment of the Flynn Effect. We conclude that the more parsimonous model that does not include the interaction is more appropriate.

Open the dataset with multiple imputation. The models using complete cases and the weights (only) are shown in the supplementary materials.

In [27]:
use $path3\pooledNCDSBCS_v3_imputed.dta, clear

set seed 1485

* return to jupyter
. use $path3\pooledNCDSBCS_v3_imputed.dta, clear

. 
. set seed 1485

. 
. * return to jupyter

We included information on the deceased cohort members in the multiple imputation, above (to provide additional information in the model). We now delete the cases who were deceased by the time of the age 10/11 surveys. We do this as we do not want to model outcomes for cohort members who were deceased at the time the ability test was taken.

In [28]:
tab deadtestoutcome, mi

* return to jupyter
. tab deadtestoutcome, mi

Dead at age |
      10/11 |
     survey |      Freq.     Percent        Cum.
------------+-----------------------------------
      0. No |    925,948       91.36       91.36
     1. Yes |     87,535        8.64      100.00
------------+-----------------------------------
      Total |  1,013,483      100.00

. 
. * return to jupyter

In [29]:
summ ability if (deadtestoutcome==1)

* return to jupyter
. summ ability if (deadtestoutcome==1)

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     ability |     86,100    99.38687    15.00202   28.07528   160.2226

. 
. * return to jupyter

In [30]:
drop if deadtestoutcome==1

* return to jupyter
. drop if deadtestoutcome==1
(87,535 observations deleted)

. 
. * return to jupyter

We create an interaction variable. We created this above for the complete records analysis, but we recreate it again here after the multiple imputation.

In [31]:
capture drop nsinteraction
gen nsinteraction = .
replace nsinteraction = 1 if ((dadnssec==1)&(cohort==1))
replace nsinteraction = 2 if ((dadnssec==1)&(cohort==2))
replace nsinteraction = 3 if ((dadnssec==2)&(cohort==1))
replace nsinteraction = 4 if ((dadnssec==2)&(cohort==2))
replace nsinteraction = 5 if ((dadnssec==3)&(cohort==1))
replace nsinteraction = 6 if ((dadnssec==3)&(cohort==2))
replace nsinteraction = 7 if ((dadnssec==4)&(cohort==1))
replace nsinteraction = 8 if ((dadnssec==4)&(cohort==2))
replace nsinteraction = 9 if ((dadnssec==5)&(cohort==1))
replace nsinteraction = 10 if ((dadnssec==5)&(cohort==2))
replace nsinteraction = 11 if ((dadnssec==6)&(cohort==1))
replace nsinteraction = 12 if ((dadnssec==6)&(cohort==2))
replace nsinteraction = 13 if ((dadnssec==7)&(cohort==1))
replace nsinteraction = 14 if ((dadnssec==7)&(cohort==2))
replace nsinteraction = 15 if ((dadnssec==8)&(cohort==1))
replace nsinteraction = 16 if ((dadnssec==8)&(cohort==2))
tab nsinteraction
label variable nsinteraction "NSSEC Interaction"
label define nsint 1 "NCDS 1.1" 2 "BCS 1.1" 3 "NCDS 1.2" 4 "BCS 1.2" 5 "NCDS 2" 6 "BCS 2" 7 "NCDS 3" 8 "BCS 3" 9 "NCDS 4" 10 "BCS 4" 11 "NCDS 5" 12 "BCS 5" 13 "NCDS 6" 14 "BCS 6" 15 "NCDS 7" 16 "BCS 7", replace
label values nsinteraction nsint

mi register passive nsinteraction

* return to jupyter
. capture drop nsinteraction

. gen nsinteraction = .
(925,948 missing values generated)

. replace nsinteraction = 1 if ((dadnssec==1)&(cohort==1))
(13,535 real changes made)

. replace nsinteraction = 2 if ((dadnssec==1)&(cohort==2))
(22,498 real changes made)

. replace nsinteraction = 3 if ((dadnssec==2)&(cohort==1))
(18,391 real changes made)

. replace nsinteraction = 4 if ((dadnssec==2)&(cohort==2))
(27,622 real changes made)

. replace nsinteraction = 5 if ((dadnssec==3)&(cohort==1))
(47,752 real changes made)

. replace nsinteraction = 6 if ((dadnssec==3)&(cohort==2))
(69,934 real changes made)

. replace nsinteraction = 7 if ((dadnssec==4)&(cohort==1))
(39,097 real changes made)

. replace nsinteraction = 8 if ((dadnssec==4)&(cohort==2))
(39,944 real changes made)

. replace nsinteraction = 9 if ((dadnssec==5)&(cohort==1))
(51,654 real changes made)

. replace nsinteraction = 10 if ((dadnssec==5)&(cohort==2))
(66,476 real changes made)

. replace nsinteraction = 11 if ((dadnssec==6)&(cohort==1))
(70,532 real changes made)

. replace nsinteraction = 12 if ((dadnssec==6)&(cohort==2))
(79,389 real changes made)

. replace nsinteraction = 13 if ((dadnssec==7)&(cohort==1))
(77,046 real changes made)

. replace nsinteraction = 14 if ((dadnssec==7)&(cohort==2))
(73,037 real changes made)

. replace nsinteraction = 15 if ((dadnssec==8)&(cohort==1))
(110,371 real changes made)

. replace nsinteraction = 16 if ((dadnssec==8)&(cohort==2))
(107,913 real changes made)

. tab nsinteraction

nsinteracti |
         on |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |     13,535        1.48        1.48
          2 |     22,498        2.46        3.94
          3 |     18,391        2.01        5.95
          4 |     27,622        3.02        8.96
          5 |     47,752        5.22       14.18
          6 |     69,934        7.64       21.82
          7 |     39,097        4.27       26.10
          8 |     39,944        4.36       30.46
          9 |     51,654        5.64       36.10
         10 |     66,476        7.26       43.37
         11 |     70,532        7.71       51.08
         12 |     79,389        8.67       59.75
         13 |     77,046        8.42       68.17
         14 |     73,037        7.98       76.15
         15 |    110,371       12.06       88.21
         16 |    107,913       11.79      100.00
------------+-----------------------------------
      Total |    915,191      100.00

. label variable nsinteraction "NSSEC Interaction"

. label define nsint 1 "NCDS 1.1" 2 "BCS 1.1" 3 "NCDS 1.2" 4 "BCS 1.2" 5 "NCDS 2" 6 "BCS 2" 7 "NCDS 3" 8 "BCS 3" 9 "NCDS 4" 10 "BCS 4" 11 "N
> CDS 5" 12 "BCS 5" 13 "NCDS 6" 14 "BCS 6" 15 "NCDS 7" 16 "BCS 7", replace

. label values nsinteraction nsint

. 
. mi register passive nsinteraction
(system variable _mi_id updated due to changed number of obs.)

. 

The models in table 4 include multiple imputation and the inverse probability weights.

For these models we also keep only those present at age 10/11.

See: Seaman, S. R., White, I. R., Copas, A. J., & Li, L. (2012). Combining multiple imputation and inverse probability weighting. Biometrics, 68(1), 129-137.

" Some researchers may prefer to use straightforward MI (what we called MI/MI). Provided that the imputation models are correctly specified, this will be more efficient than IPW/MI. However, our (admittedly contrived) simulations and (not contrived) real data example have shown that those who prefer IPW/MI have some justification for their caution. A possible use for IPW/MI is as a check, or diagnostic, for MI/MI. If the results of IPW/MI and MI/MI are very different, further exploration would be warranted, possibly leading to refinement of the imputation model."

These two papers also use this method:

Caldwell, T. M., Rodgers, B., Clark, C., Jefferis, B. J. M. H., Stansfeld, S. A., & Power, C. (2008). Lifecourse socioeconomic predictors of midlife drinking patterns, problems and abstention: findings from the 1958 British Birth Cohort Study. Drug and alcohol dependence, 95(3), 269-278.

Stansfeld, S. A., Clark, C., Caldwell, T., Rodgers, B., & Power, C. (2008). Psychosocial work characteristics and anxiety and depressive disorders in midlife: the effects of prior psychological distress. Occupational and Environmental Medicine, 65(9), 634-642.

"Multiple imputation was used to address missing data in the analyses, using the ICE programme in STATA. All psychological health, sociodemographic and work variables reported in this paper were included in the imputation equations; employment status at 33 and father’s social class at 7 and own social class at 42 were also included as they were significantly associated with attrition....

All living participants were included in the imputation, but analyses were conducted only for those who participated in the study at age 45 and were in paid employment (n = 8243).....

In order to address attrition, inverse probability weights were then estimated from a logistic regression model predicting participation in the study at age 45. Sex and all of the independent variables used in the imputation equation, except those measured at 45, and all significant two-way interactions were used as predictors in this logistic regression. The weight was applied to all analyses in this paper."

Keep only those present at the age 10/11 surveys.

In [32]:
estimates clear

* return to jupyter
. estimates clear

. 
. * return to jupyter

In [33]:
tab sweeptestoutcome

* return to jupyter
. tab sweeptestoutcome

 Productive |
     at age |
      10/11 |
     survey |      Freq.     Percent        Cum.
------------+-----------------------------------
      0. No |    257,237       27.78       27.78
     1. Yes |    668,711       72.22      100.00
------------+-----------------------------------
      Total |    925,948      100.00

. 
. * return to jupyter

In [34]:
*keep only those productive at age 10/11
keep if sweeptestoutcome ==1

* return to jupyter
. *keep only those productive at age 10/11

. keep if sweeptestoutcome ==1
(257,237 observations deleted)

. 
. * return to jupyter

In [35]:
*TABLE 4 MODEL 1

estimates clear

mi estimate, post: regress ability male i.parented ib4.dadnssec cohort [pweight=ipw], allbaselevels

* return to jupyter
. *TABLE 4 MODEL 1

. 
. estimates clear

. 
. mi estimate, post: regress ability male i.parented ib4.dadnssec cohort [pweight=ipw], allbaselevels
(system variable _mi_id updated due to changed number of obs.)

Multiple-imputation estimates                   Imputations       =         60
Linear regression                               Number of obs     =     28,331
                                                Average RVI       =     0.3613
                                                Largest FMI       =     0.4366
                                                Complete DF       =      28318
DF adjustment:   Small sample                   DF:     min       =     308.70
                                                        avg       =     874.12
                                                        max       =   2,278.12
Model F test:       Equal FMI                   F(  12, 7041.2)   =     297.81
Within VCE type:       Robust                   Prob > F          =     0.0000

------------------------------------------------------------------------------
     ability |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        male |  -.5529273    .179601    -3.08   0.002    -.9051258   -.2007288
             |
    parented |
          2  |   5.865727   .2354561    24.91   0.000     5.403645    6.327808
          3  |   8.298234   .5356372    15.49   0.000     7.246654    9.349814
          4  |   10.62638   .4562337    23.29   0.000     9.730598    11.52217
             |
    dadnssec |
          1  |   1.787366   .5802228     3.08   0.002     .6480052    2.926727
          2  |   2.279596   .5910755     3.86   0.000     1.116549    3.442643
          3  |   1.190626   .4294348     2.77   0.006     .3469172    2.034335
          5  |  -3.526372   .4348423    -8.11   0.000    -4.380374    -2.67237
          6  |  -3.306138   .4132835    -8.00   0.000    -4.117849   -2.494427
          7  |  -4.797451   .4256146   -11.27   0.000    -5.633624   -3.961277
          8  |  -7.168137   .4124364   -17.38   0.000    -7.978642   -6.357632
             |
      cohort |  -2.087461   .1840391   -11.34   0.000    -2.448375   -1.726548
       _cons |   104.0589   .4338118   239.87   0.000     103.2077    104.9102
------------------------------------------------------------------------------

. 
. * return to jupyter

In [36]:
mibeta ability male i.parented ib4.dadnssec cohort [pweight=ipw], allbaselevels

* return to jupyter
. mibeta ability male i.parented ib4.dadnssec cohort [pweight=ipw], allbaselevels

Multiple-imputation estimates                   Imputations       =         60
Linear regression                               Number of obs     =     28,331
                                                Average RVI       =     0.3613
                                                Largest FMI       =     0.4365
                                                Complete DF       =      28318
DF adjustment:   Small sample                   DF:     min       =     308.70
                                                        avg       =     874.12
                                                        max       =   2,278.12
Model F test:       Equal FMI                   F(  12, 7041.2)   =     297.81
Within VCE type:       Robust                   Prob > F          =     0.0000

------------------------------------------------------------------------------
     ability |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        male |  -.5529273    .179601    -3.08   0.002    -.9051258   -.2007288
             |
    parented |
          2  |   5.865727   .2354561    24.91   0.000     5.403645    6.327808
          3  |   8.298234   .5356372    15.49   0.000     7.246654    9.349814
          4  |   10.62638   .4562337    23.29   0.000     9.730598    11.52217
             |
    dadnssec |
          1  |   1.787366   .5802228     3.08   0.002     .6480052    2.926727
          2  |   2.279596   .5910755     3.86   0.000     1.116549    3.442643
          3  |   1.190626   .4294348     2.77   0.006     .3469172    2.034335
          5  |  -3.526372   .4348423    -8.11   0.000    -4.380374    -2.67237
          6  |  -3.306138   .4132835    -8.00   0.000    -4.117849   -2.494427
          7  |  -4.797451   .4256146   -11.27   0.000    -5.633624   -3.961277
          8  |  -7.168137   .4124364   -17.38   0.000    -7.978642   -6.357632
             |
      cohort |  -2.087461   .1840391   -11.34   0.000    -2.448375   -1.726548
       _cons |   104.0589   .4338118   239.87   0.000     103.2077    104.9102
------------------------------------------------------------------------------

Standardized coefficients and R-squared
Summary statistics over 60 imputations

             |       mean       min        p25     median        p75       max
-------------+----------------------------------------------------------------
        male |  -.0184715    -.0232  -.0200423  -.0184812  -.0170057    -.0117
             |
    parented |
          2  |   .1745555      .168   .1718451     .17444   .1770959      .185
          3  |   .1019757     .0924   .1000508   .1016537   .1045144      .109
          4  |   .1657808      .157   .1625834   .1663338   .1687355      .176
             |
    dadnssec |
          1  |   .0229434     .0125   .0205665   .0230113   .0258286     .0314
          2  |   .0337025     .0219   .0294873   .0337864   .0377892     .0497
          3  |   .0267778    .00998   .0239177   .0274362   .0305021     .0414
          5  |  -.0785171    -.0945  -.0812713   -.077357  -.0740187    -.0697
          6  |  -.0826347    -.0963  -.0866849  -.0822976  -.0787418    -.0698
          7  |  -.1187835      -.13  -.1226171  -.1182057  -.1147945     -.101
          8  |   -.201945     -.215  -.2071063  -.2027168  -.1972093     -.186
             |
      cohort |  -.0697507    -.0759  -.0717631  -.0691932  -.0681856    -.0654
-------------+----------------------------------------------------------------
    R-square |   .1389302      .134   .1376231   .1393436   .1400777      .144
Adj R-square |   .1385653      .134   .1372576   .1389789   .1397133      .143
------------------------------------------------------------------------------

. 
. * return to jupyter

In [37]:
*TABLE 4 MODEL 2

estimates clear

* return to jupyter
. *TABLE 4 MODEL 2

. 
. estimates clear

. 
. * return to jupyter

In [38]:
mi estimate, post: regress ability male i.parented ib7.nsinteraction [pweight=ipw], allbaselevels

* return to jupyter
. mi estimate, post: regress ability male i.parented ib7.nsinteraction [pweight=ipw], allbaselevels

Multiple-imputation estimates                   Imputations       =         60
Linear regression                               Number of obs     =     28,331
                                                Average RVI       =     0.4115
                                                Largest FMI       =     0.3949
                                                Complete DF       =      28311
DF adjustment:   Small sample                   DF:     min       =     376.16
                                                        avg       =     611.14
                                                        max       =   2,252.49
Model F test:       Equal FMI                   F(  19, 8704.8)   =     182.60
Within VCE type:       Robust                   Prob > F          =     0.0000

-------------------------------------------------------------------------------
      ability |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
         male |  -.5596313   .1796991    -3.11   0.002    -.9120244   -.2072382
              |
     parented |
           2  |   5.871626   .2362326    24.86   0.000     5.408002     6.33525
           3  |    8.30813   .5331239    15.58   0.000     7.261597    9.354663
           4  |   10.64564   .4568768    23.30   0.000     9.748591    11.54268
              |
nsinteraction |
    NCDS 1.1  |   2.823372    .864195     3.27   0.001     1.126202    4.520541
     BCS 1.1  |  -.6376888   .7817899    -0.82   0.415     -2.17297    .8975921
    NCDS 1.2  |   1.849502   .7733177     2.39   0.017     .3296352    3.369368
     BCS 1.2  |   .9621695   .7714264     1.25   0.213    -.5543167    2.478656
      NCDS 2  |   1.708454   .6153074     2.78   0.006     .4989125    2.917995
       BCS 2  |  -.9002437   .5898865    -1.53   0.128    -2.058989    .2585018
       BCS 3  |  -1.585695    .683324    -2.32   0.021    -2.928931   -.2424602
      NCDS 4  |  -3.241009   .6280868    -5.16   0.000    -4.475416   -2.006603
       BCS 4  |  -5.423505    .624567    -8.68   0.000    -6.651111     -4.1959
      NCDS 5  |  -2.937072    .592384    -4.96   0.000    -4.101296   -1.772848
       BCS 5  |  -5.289967   .5726627    -9.24   0.000    -6.414886   -4.165048
      NCDS 6  |   -4.47695   .5983433    -7.48   0.000    -5.653467   -3.300433
       BCS 6  |  -6.752555   .5875267   -11.49   0.000    -7.906433   -5.598676
      NCDS 7  |  -7.119842   .5645244   -12.61   0.000    -8.229579   -6.010104
       BCS 7  |   -8.78016   .5601276   -15.68   0.000     -9.88061    -7.67971
              |
        _cons |   101.7356   .4928228   206.43   0.000     100.7669    102.7043
-------------------------------------------------------------------------------

. 
. * return to jupyter

In [39]:
mibeta ability male i.parented ib7.nsinteraction [pweight=ipw], allbaselevels

* return to jupyter
. mibeta ability male i.parented ib7.nsinteraction [pweight=ipw], allbaselevels

Multiple-imputation estimates                   Imputations       =         60
Linear regression                               Number of obs     =     28,331
                                                Average RVI       =     0.4115
                                                Largest FMI       =     0.3948
                                                Complete DF       =      28311
DF adjustment:   Small sample                   DF:     min       =     376.16
                                                        avg       =     611.14
                                                        max       =   2,252.49
Model F test:       Equal FMI                   F(  19, 8704.8)   =     182.60
Within VCE type:       Robust                   Prob > F          =     0.0000

-------------------------------------------------------------------------------
      ability |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
         male |  -.5596313   .1796991    -3.11   0.002    -.9120244   -.2072382
              |
     parented |
           2  |   5.871626   .2362326    24.86   0.000     5.408002     6.33525
           3  |    8.30813   .5331239    15.58   0.000     7.261597    9.354663
           4  |   10.64564   .4568768    23.30   0.000     9.748591    11.54268
              |
nsinteraction |
    NCDS 1.1  |   2.823372    .864195     3.27   0.001     1.126202    4.520541
     BCS 1.1  |  -.6376888   .7817899    -0.82   0.415     -2.17297    .8975921
    NCDS 1.2  |   1.849502   .7733177     2.39   0.017     .3296352    3.369368
     BCS 1.2  |   .9621695   .7714264     1.25   0.213    -.5543167    2.478656
      NCDS 2  |   1.708454   .6153074     2.78   0.006     .4989125    2.917995
       BCS 2  |  -.9002437   .5898865    -1.53   0.128    -2.058989    .2585018
       BCS 3  |  -1.585695    .683324    -2.32   0.021    -2.928931   -.2424602
      NCDS 4  |  -3.241009   .6280868    -5.16   0.000    -4.475416   -2.006603
       BCS 4  |  -5.423505    .624567    -8.68   0.000    -6.651111     -4.1959
      NCDS 5  |  -2.937072    .592384    -4.96   0.000    -4.101296   -1.772848
       BCS 5  |  -5.289967   .5726627    -9.24   0.000    -6.414886   -4.165048
      NCDS 6  |   -4.47695   .5983433    -7.48   0.000    -5.653467   -3.300433
       BCS 6  |  -6.752555   .5875267   -11.49   0.000    -7.906433   -5.598676
      NCDS 7  |  -7.119842   .5645244   -12.61   0.000    -8.229579   -6.010104
       BCS 7  |   -8.78016   .5601276   -15.68   0.000     -9.88061    -7.67971
              |
        _cons |   101.7356   .4928228   206.43   0.000     100.7669    102.7043
-------------------------------------------------------------------------------

Standardized coefficients and R-squared
Summary statistics over 60 imputations

             |       mean       min        p25     median        p75       max
-------------+----------------------------------------------------------------
        male |  -.0186955    -.0234  -.0202557  -.0187198  -.0171167     -.012
             |
    parented |
          2  |   .1747311      .168   .1718273   .1747024   .1774466      .186
          3  |   .1020976     .0932   .1003903   .1019859   .1046074      .109
          4  |   .1660812      .157   .1629075    .166622   .1689558      .176
             |
nsinteract~n |
   NCDS 1.1  |   .0236479     .0145   .0214039   .0242233   .0263084     .0326
    BCS 1.1  |  -.0063036    -.0159  -.0092196   -.005852   -.003795    .00758
   NCDS 1.2  |   .0186201    .00818   .0153741   .0191881   .0207888     .0315
    BCS 1.2  |   .0106749   -.00121   .0069641   .0108788   .0138816     .0218
     NCDS 2  |   .0269205     .0077    .023743   .0274129   .0303461     .0459
      BCS 2  |  -.0154647    -.0312  -.0189422  -.0141758  -.0118529   -.00146
      BCS 3  |  -.0212993    -.0328  -.0244832  -.0221359  -.0179404   -.00803
     NCDS 4  |  -.0525572    -.0678  -.0564814  -.0527218  -.0476009    -.0407
      BCS 4  |  -.0889345     -.106  -.0934298  -.0887321  -.0842438    -.0748
     NCDS 5  |  -.0546596    -.0712  -.0589729  -.0541065   -.050261    -.0381
      BCS 5  |  -.0977622     -.115  -.1010194   -.096559  -.0935974    -.0884
     NCDS 6  |  -.0864516     -.104  -.0906362  -.0861367  -.0814556    -.0721
      BCS 6  |  -.1168589      -.13  -.1203538  -.1155812  -.1128864     -.106
     NCDS 7  |  -.1591148     -.178  -.1644767  -.1588776  -.1534594     -.145
      BCS 7  |  -.1782031     -.192   -.182083   -.178447  -.1732933     -.164
-------------+----------------------------------------------------------------
    R-square |   .1393231      .134   .1379374   .1397493   .1404945      .144
Adj R-square |   .1387454      .134   .1373589   .1391719   .1399177      .143
------------------------------------------------------------------------------

. 
. * return to jupyter

Table 4

In [40]:
* Here I am repeating table 6 model 2 whilst changing to reference category
* This allows us to compare between categories in the interaction.

mi estimate: regress ability male i.parented ib1.nsinteraction [pweight=ipw], allbaselevels 
*There is a significant difference between BCS and NCDS for 1.1
. * Here I am repeating table 6 model 2 whilst changing to reference category

. * This allows us to compare between categories in the interaction.

. 
. mi estimate: regress ability male i.parented ib1.nsinteraction [pweight=ipw], allbaselevels 

Multiple-imputation estimates                   Imputations       =         60
Linear regression                               Number of obs     =     28,331
                                                Average RVI       =     0.4115
                                                Largest FMI       =     0.3263
                                                Complete DF       =      28311
DF adjustment:   Small sample                   DF:     min       =     546.37
                                                        avg       =     917.99
                                                        max       =   2,252.49
Model F test:       Equal FMI                   F(  19, 8704.8)   =     182.60
Within VCE type:       Robust                   Prob > F          =     0.0000

-------------------------------------------------------------------------------
      ability |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
         male |  -.5596313   .1796991    -3.11   0.002    -.9120244   -.2072382
              |
     parented |
           2  |   5.871626   .2362326    24.86   0.000     5.408002     6.33525
           3  |    8.30813   .5331239    15.58   0.000     7.261597    9.354663
           4  |   10.64564   .4568768    23.30   0.000     9.748591    11.54268
              |
nsinteraction |
     BCS 1.1  |  -3.461061   .9929515    -3.49   0.001     -5.41153   -1.510591
    NCDS 1.2  |    -.97387   .9429399    -1.03   0.302    -2.825543    .8778034
     BCS 1.2  |  -1.861202   .9158719    -2.03   0.042     -3.65911   -.0632942
      NCDS 2  |  -1.114918     .80775    -1.38   0.168    -2.700447    .4706112
       BCS 2  |  -3.723615   .7835263    -4.75   0.000    -5.260958   -2.186273
      NCDS 3  |  -2.823372    .864195    -3.27   0.001    -4.520541   -1.126202
       BCS 3  |  -4.409067    .859921    -5.13   0.000    -6.097227   -2.720907
      NCDS 4  |  -6.064381   .8117831    -7.47   0.000    -7.657461   -4.471301
       BCS 4  |  -8.246877   .8179949   -10.08   0.000     -9.85251   -6.641245
      NCDS 5  |  -5.760444   .7813254    -7.37   0.000    -7.293525   -4.227363
       BCS 5  |  -8.113339   .7773216   -10.44   0.000    -9.638575   -6.588103
      NCDS 6  |  -7.300322   .7878075    -9.27   0.000    -8.846467   -5.754177
       BCS 6  |  -9.575926   .7966527   -12.02   0.000    -11.13922   -8.012632
      NCDS 7  |  -9.943213   .7866582   -12.64   0.000    -11.48757   -8.398859
       BCS 7  |  -11.60353   .7729728   -15.01   0.000    -13.12038   -10.08668
              |
        _cons |    104.559   .7301608   143.20   0.000      103.126    105.9919
-------------------------------------------------------------------------------

. *There is a significant difference between BCS and NCDS for 1.1

In [41]:
mi estimate: regress ability male i.parented ib3.nsinteraction  [pweight=ipw], allbaselevels
*There is not a significant difference between BCS and NCDS for 1.2
. mi estimate: regress ability male i.parented ib3.nsinteraction  [pweight=ipw], allbaselevels

Multiple-imputation estimates                   Imputations       =         60
Linear regression                               Number of obs     =     28,331
                                                Average RVI       =     0.4115
                                                Largest FMI       =     0.4151
                                                Complete DF       =      28311
DF adjustment:   Small sample                   DF:     min       =     340.89
                                                        avg       =     644.64
                                                        max       =   2,252.49
Model F test:       Equal FMI                   F(  19, 8704.8)   =     182.60
Within VCE type:       Robust                   Prob > F          =     0.0000

-------------------------------------------------------------------------------
      ability |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
         male |  -.5596313   .1796991    -3.11   0.002    -.9120244   -.2072382
              |
     parented |
           2  |   5.871626   .2362326    24.86   0.000     5.408002     6.33525
           3  |    8.30813   .5331239    15.58   0.000     7.261597    9.354663
           4  |   10.64564   .4568768    23.30   0.000     9.748591    11.54268
              |
nsinteraction |
    NCDS 1.1  |     .97387   .9429399     1.03   0.302    -.8778034    2.825543
     BCS 1.1  |  -2.487191   .8802418    -2.83   0.005    -4.216437   -.7579438
     BCS 1.2  |  -.8873322   .8054189    -1.10   0.271    -2.469263    .6945985
      NCDS 2  |  -.1410478   .7311071    -0.19   0.847    -1.578167    1.296071
       BCS 2  |  -2.749745   .6848951    -4.01   0.000    -4.094722   -1.404769
      NCDS 3  |  -1.849502   .7733177    -2.39   0.017    -3.369368   -.3296352
       BCS 3  |  -3.435197   .8188311    -4.20   0.000    -5.045795   -1.824599
      NCDS 4  |  -5.090511   .7340165    -6.94   0.000      -6.5323   -3.648723
       BCS 4  |  -7.273007   .7331959    -9.92   0.000     -8.71345   -5.832564
      NCDS 5  |  -4.786574   .7140294    -6.70   0.000    -6.189363   -3.383785
       BCS 5  |  -7.139469     .70503   -10.13   0.000    -8.524578   -5.754359
      NCDS 6  |  -6.326452   .7030693    -9.00   0.000    -7.707528   -4.945375
       BCS 6  |  -8.602056   .7171048   -12.00   0.000    -10.01055   -7.193567
      NCDS 7  |  -8.969343   .6840177   -13.11   0.000    -10.31299   -7.625696
       BCS 7  |  -10.62966   .7022876   -15.14   0.000     -12.0096   -9.249726
              |
        _cons |   103.5851    .641653   161.43   0.000     102.3244    104.8458
-------------------------------------------------------------------------------

. *There is not a significant difference between BCS and NCDS for 1.2

In [42]:
mi estimate: regress ability male i.parented ib5.nsinteraction  [pweight=ipw], allbaselevels
*There is a significant difference between BCS and NCDS for 2
. mi estimate: regress ability male i.parented ib5.nsinteraction  [pweight=ipw], allbaselevels

Multiple-imputation estimates                   Imputations       =         60
Linear regression                               Number of obs     =     28,331
                                                Average RVI       =     0.4115
                                                Largest FMI       =     0.3774
                                                Complete DF       =      28311
DF adjustment:   Small sample                   DF:     min       =     411.08
                                                        avg       =     661.86
                                                        max       =   2,252.49
Model F test:       Equal FMI                   F(  19, 8704.8)   =     182.60
Within VCE type:       Robust                   Prob > F          =     0.0000

-------------------------------------------------------------------------------
      ability |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
         male |  -.5596313   .1796991    -3.11   0.002    -.9120244   -.2072382
              |
     parented |
           2  |   5.871626   .2362326    24.86   0.000     5.408002     6.33525
           3  |    8.30813   .5331239    15.58   0.000     7.261597    9.354663
           4  |   10.64564   .4568768    23.30   0.000     9.748591    11.54268
              |
nsinteraction |
    NCDS 1.1  |   1.114918     .80775     1.38   0.168    -.4706112    2.700447
     BCS 1.1  |  -2.346143   .7565414    -3.10   0.002    -3.832342   -.8599437
    NCDS 1.2  |   .1410478   .7311071     0.19   0.847    -1.296071    1.578167
     BCS 1.2  |  -.7462844   .7104072    -1.05   0.294    -2.142445    .6498759
       BCS 2  |  -2.608698   .5483809    -4.76   0.000    -3.686405    -1.53099
      NCDS 3  |  -1.708454   .6153074    -2.78   0.006    -2.917995   -.4989125
       BCS 3  |  -3.294149   .6291214    -5.24   0.000    -4.530563   -2.057736
      NCDS 4  |  -4.949463   .5637005    -8.78   0.000    -6.056628   -3.842298
       BCS 4  |  -7.131959    .554731   -12.86   0.000    -8.221485   -6.042433
      NCDS 5  |  -4.645526   .5202506    -8.93   0.000    -5.667113   -3.623939
       BCS 5  |  -6.998421   .5229782   -13.38   0.000    -8.025714   -5.971127
      NCDS 6  |  -6.185404   .5216915   -11.86   0.000    -7.210182   -5.160626
       BCS 6  |  -8.461009   .5648557   -14.98   0.000    -9.571182   -7.350835
      NCDS 7  |  -8.828296   .5072247   -17.41   0.000    -9.825113   -7.831479
       BCS 7  |  -10.48861   .4902061   -21.40   0.000    -11.45097   -9.526259
              |
        _cons |   103.4441    .416877   248.14   0.000     102.6253    104.2628
-------------------------------------------------------------------------------

. *There is a significant difference between BCS and NCDS for 2

In [43]:
mi estimate: regress ability male i.parented ib7.nsinteraction  [pweight=ipw], allbaselevels
*There is a significant difference between BCS and NCDS for 3
. mi estimate: regress ability male i.parented ib7.nsinteraction  [pweight=ipw], allbaselevels

Multiple-imputation estimates                   Imputations       =         60
Linear regression                               Number of obs     =     28,331
                                                Average RVI       =     0.4115
                                                Largest FMI       =     0.3949
                                                Complete DF       =      28311
DF adjustment:   Small sample                   DF:     min       =     376.16
                                                        avg       =     611.14
                                                        max       =   2,252.49
Model F test:       Equal FMI                   F(  19, 8704.8)   =     182.60
Within VCE type:       Robust                   Prob > F          =     0.0000

-------------------------------------------------------------------------------
      ability |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
         male |  -.5596313   .1796991    -3.11   0.002    -.9120244   -.2072382
              |
     parented |
           2  |   5.871626   .2362326    24.86   0.000     5.408002     6.33525
           3  |    8.30813   .5331239    15.58   0.000     7.261597    9.354663
           4  |   10.64564   .4568768    23.30   0.000     9.748591    11.54268
              |
nsinteraction |
    NCDS 1.1  |   2.823372    .864195     3.27   0.001     1.126202    4.520541
     BCS 1.1  |  -.6376888   .7817899    -0.82   0.415     -2.17297    .8975921
    NCDS 1.2  |   1.849502   .7733177     2.39   0.017     .3296352    3.369368
     BCS 1.2  |   .9621695   .7714264     1.25   0.213    -.5543167    2.478656
      NCDS 2  |   1.708454   .6153074     2.78   0.006     .4989125    2.917995
       BCS 2  |  -.9002437   .5898865    -1.53   0.128    -2.058989    .2585018
       BCS 3  |  -1.585695    .683324    -2.32   0.021    -2.928931   -.2424602
      NCDS 4  |  -3.241009   .6280868    -5.16   0.000    -4.475416   -2.006603
       BCS 4  |  -5.423505    .624567    -8.68   0.000    -6.651111     -4.1959
      NCDS 5  |  -2.937072    .592384    -4.96   0.000    -4.101296   -1.772848
       BCS 5  |  -5.289967   .5726627    -9.24   0.000    -6.414886   -4.165048
      NCDS 6  |   -4.47695   .5983433    -7.48   0.000    -5.653467   -3.300433
       BCS 6  |  -6.752555   .5875267   -11.49   0.000    -7.906433   -5.598676
      NCDS 7  |  -7.119842   .5645244   -12.61   0.000    -8.229579   -6.010104
       BCS 7  |   -8.78016   .5601276   -15.68   0.000     -9.88061    -7.67971
              |
        _cons |   101.7356   .4928228   206.43   0.000     100.7669    102.7043
-------------------------------------------------------------------------------

. *There is a significant difference between BCS and NCDS for 3

In [44]:
mi estimate: regress ability male i.parented ib9.nsinteraction  [pweight=ipw], allbaselevels
*There is a significant difference between BCS and NCDS for 4
. mi estimate: regress ability male i.parented ib9.nsinteraction  [pweight=ipw], allbaselevels

Multiple-imputation estimates                   Imputations       =         60
Linear regression                               Number of obs     =     28,331
                                                Average RVI       =     0.4115
                                                Largest FMI       =     0.3636
                                                Complete DF       =      28311
DF adjustment:   Small sample                   DF:     min       =     442.12
                                                        avg       =     775.49
                                                        max       =   2,252.49
Model F test:       Equal FMI                   F(  19, 8704.8)   =     182.60
Within VCE type:       Robust                   Prob > F          =     0.0000

-------------------------------------------------------------------------------
      ability |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
         male |  -.5596313   .1796991    -3.11   0.002    -.9120244   -.2072382
              |
     parented |
           2  |   5.871626   .2362326    24.86   0.000     5.408002     6.33525
           3  |    8.30813   .5331239    15.58   0.000     7.261597    9.354663
           4  |   10.64564   .4568768    23.30   0.000     9.748591    11.54268
              |
nsinteraction |
    NCDS 1.1  |   6.064381   .8117831     7.47   0.000     4.471301    7.657461
     BCS 1.1  |   2.603321   .7804266     3.34   0.001     1.070235    4.136406
    NCDS 1.2  |   5.090511   .7340165     6.94   0.000     3.648723      6.5323
     BCS 1.2  |   4.203179   .7387809     5.69   0.000     2.751532    5.654826
      NCDS 2  |   4.949463   .5637005     8.78   0.000     3.842298    6.056628
       BCS 2  |   2.340766   .5572555     4.20   0.000      1.24648    3.435051
      NCDS 3  |   3.241009   .6280868     5.16   0.000     2.006603    4.475416
       BCS 3  |   1.655314   .6106865     2.71   0.007     .4565879     2.85404
       BCS 4  |  -2.182496    .566279    -3.85   0.000    -3.294549   -1.070443
      NCDS 5  |   .3039371     .51749     0.59   0.557    -.7118248    1.319699
       BCS 5  |  -2.048958   .5256657    -3.90   0.000    -3.081013   -1.016902
      NCDS 6  |  -1.235941   .5165811    -2.39   0.017    -2.250243   -.2216388
       BCS 6  |  -3.511545   .5481737    -6.41   0.000    -4.587886   -2.435205
      NCDS 7  |  -3.878832   .4973326    -7.80   0.000    -4.855647   -2.902017
       BCS 7  |   -5.53915   .4898767   -11.31   0.000    -6.500522   -4.577779
              |
        _cons |   98.49461   .4022243   244.87   0.000     97.70519    99.28403
-------------------------------------------------------------------------------

In [45]:
mi estimate: regress ability male i.parented ib11.nsinteraction  [pweight=ipw], allbaselevels
*There is a significant difference between BCS and NCDS for 5
. mi estimate: regress ability male i.parented ib11.nsinteraction  [pweight=ipw], allbaselevels

Multiple-imputation estimates                   Imputations       =         60
Linear regression                               Number of obs     =     28,331
                                                Average RVI       =     0.4115
                                                Largest FMI       =     0.3628
                                                Complete DF       =      28311
DF adjustment:   Small sample                   DF:     min       =     444.14
                                                        avg       =     790.78
                                                        max       =   2,252.49
Model F test:       Equal FMI                   F(  19, 8704.8)   =     182.60
Within VCE type:       Robust                   Prob > F          =     0.0000

-------------------------------------------------------------------------------
      ability |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
         male |  -.5596313   .1796991    -3.11   0.002    -.9120244   -.2072382
              |
     parented |
           2  |   5.871626   .2362326    24.86   0.000     5.408002     6.33525
           3  |    8.30813   .5331239    15.58   0.000     7.261597    9.354663
           4  |   10.64564   .4568768    23.30   0.000     9.748591    11.54268
              |
nsinteraction |
    NCDS 1.1  |   5.760444   .7813254     7.37   0.000     4.227363    7.293525
     BCS 1.1  |   2.299383   .7361005     3.12   0.002     .8539926    3.744774
    NCDS 1.2  |   4.786574   .7140294     6.70   0.000     3.383785    6.189363
     BCS 1.2  |   3.899242   .6908882     5.64   0.000     2.542406    5.256078
      NCDS 2  |   4.645526   .5202506     8.93   0.000     3.623939    5.667113
       BCS 2  |   2.036829   .5130722     3.97   0.000     1.029581    3.044076
      NCDS 3  |   2.937072    .592384     4.96   0.000     1.772848    4.101296
       BCS 3  |   1.351377   .5907958     2.29   0.022     .1913112    2.511443
      NCDS 4  |  -.3039371     .51749    -0.59   0.557    -1.319699    .7118248
       BCS 4  |  -2.486433   .5347522    -4.65   0.000    -3.536741   -1.436125
       BCS 5  |  -2.352895   .4796666    -4.91   0.000    -3.294389   -1.411401
      NCDS 6  |  -1.539878   .4734201    -3.25   0.001    -2.469313   -.6104425
       BCS 6  |  -3.815482   .5068211    -7.53   0.000    -4.810484   -2.820481
      NCDS 7  |  -4.182769   .4621682    -9.05   0.000    -5.090794   -3.274745
       BCS 7  |  -5.843088   .4597781   -12.71   0.000    -6.745676   -4.940499
              |
        _cons |   98.79855   .3534348   279.54   0.000     98.10489     99.4922
-------------------------------------------------------------------------------

. *There is a significant difference between BCS and NCDS for 5

In [46]:
mi estimate: regress ability male i.parented ib13.nsinteraction  [pweight=ipw], allbaselevels
*There is a significant difference between BCS and NCDS for 6
. mi estimate: regress ability male i.parented ib13.nsinteraction  [pweight=ipw], allbaselevels

Multiple-imputation estimates                   Imputations       =         60
Linear regression                               Number of obs     =     28,331
                                                Average RVI       =     0.4115
                                                Largest FMI       =     0.3949
                                                Complete DF       =      28311
DF adjustment:   Small sample                   DF:     min       =     376.16
                                                        avg       =     770.95
                                                        max       =   2,252.49
Model F test:       Equal FMI                   F(  19, 8704.8)   =     182.60
Within VCE type:       Robust                   Prob > F          =     0.0000

-------------------------------------------------------------------------------
      ability |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
         male |  -.5596313   .1796991    -3.11   0.002    -.9120244   -.2072382
              |
     parented |
           2  |   5.871626   .2362326    24.86   0.000     5.408002     6.33525
           3  |    8.30813   .5331239    15.58   0.000     7.261597    9.354663
           4  |   10.64564   .4568768    23.30   0.000     9.748591    11.54268
              |
nsinteraction |
    NCDS 1.1  |   7.300322   .7878075     9.27   0.000     5.754177    8.846467
     BCS 1.1  |   3.839261    .740482     5.18   0.000     2.384931    5.293591
    NCDS 1.2  |   6.326452   .7030693     9.00   0.000     4.945375    7.707528
     BCS 1.2  |    5.43912   .7243258     7.51   0.000     4.015152    6.863087
      NCDS 2  |   6.185404   .5216915    11.86   0.000     5.160626    7.210182
       BCS 2  |   3.576706   .5187153     6.90   0.000     2.557914    4.595498
      NCDS 3  |    4.47695   .5983433     7.48   0.000     3.300433    5.653467
       BCS 3  |   2.891255   .5824098     4.96   0.000     1.747697    4.034812
      NCDS 4  |   1.235941   .5165811     2.39   0.017     .2216388    2.250243
       BCS 4  |  -.9465552   .5074465    -1.87   0.063    -1.942659    .0495483
      NCDS 5  |   1.539878   .4734201     3.25   0.001     .6104425    2.469313
       BCS 5  |  -.8130169   .4617774    -1.76   0.079    -1.719167    .0931337
       BCS 6  |  -2.275604   .5006608    -4.55   0.000    -3.258657   -1.292552
      NCDS 7  |  -2.642892   .4281881    -6.17   0.000    -3.483497   -1.802286
       BCS 7  |   -4.30321   .4464721    -9.64   0.000    -5.179643   -3.426777
              |
        _cons |   97.25867   .3372155   288.42   0.000     96.59674     97.9206
-------------------------------------------------------------------------------

In [47]:
mi estimate: regress ability male i.parented ib15.nsinteraction  [pweight=ipw], allbaselevels
*There is a significant difference between BCS and NCDS for 7
. mi estimate: regress ability male i.parented ib15.nsinteraction  [pweight=ipw], allbaselevels

Multiple-imputation estimates                   Imputations       =         60
Linear regression                               Number of obs     =     28,331
                                                Average RVI       =     0.4115
                                                Largest FMI       =     0.3787
                                                Complete DF       =      28311
DF adjustment:   Small sample                   DF:     min       =     408.24
                                                        avg       =     707.24
                                                        max       =   2,252.49
Model F test:       Equal FMI                   F(  19, 8704.8)   =     182.60
Within VCE type:       Robust                   Prob > F          =     0.0000

-------------------------------------------------------------------------------
      ability |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
         male |  -.5596313   .1796991    -3.11   0.002    -.9120244   -.2072382
              |
     parented |
           2  |   5.871626   .2362326    24.86   0.000     5.408002     6.33525
           3  |    8.30813   .5331239    15.58   0.000     7.261597    9.354663
           4  |   10.64564   .4568768    23.30   0.000     9.748591    11.54268
              |
nsinteraction |
    NCDS 1.1  |   9.943213   .7866582    12.64   0.000     8.398859    11.48757
     BCS 1.1  |   6.482153    .699118     9.27   0.000     5.109842    7.854464
    NCDS 1.2  |   8.969343   .6840177    13.11   0.000     7.625696    10.31299
     BCS 1.2  |   8.082011   .6867457    11.77   0.000     6.732589    9.431434
      NCDS 2  |   8.828296   .5072247    17.41   0.000     7.831479    9.825113
       BCS 2  |   6.219598   .4891422    12.72   0.000     5.259014    7.180182
      NCDS 3  |   7.119842   .5645244    12.61   0.000     6.010104    8.229579
       BCS 3  |   5.534146    .586579     9.43   0.000     4.381427    6.686865
      NCDS 4  |   3.878832   .4973326     7.80   0.000     2.902017    4.855647
       BCS 4  |   1.696336   .4954524     3.42   0.001      .723276    2.669397
      NCDS 5  |   4.182769   .4621682     9.05   0.000     3.274745    5.090794
       BCS 5  |   1.829875    .447286     4.09   0.000      .951729     2.70802
      NCDS 6  |   2.642892   .4281881     6.17   0.000     1.802286    3.483497
       BCS 6  |   .3672871   .4787048     0.77   0.443    -.5728835    1.307458
       BCS 7  |  -1.660318   .4242854    -3.91   0.000    -2.493524   -.8271119
              |
        _cons |   94.61578   .2989141   316.53   0.000     94.02888    95.20267
-------------------------------------------------------------------------------

. *There is a significant difference between BCS and NCDS for 7

We now produce a graph of coefficients and 95% quasi-variance comparison intervals based on table 6 model 2.

To do this we would ideally use the -qv- command. However, this doesn't work with multiple imputation.

We therefore use the online quasi-variance calculator - kuvee.

To make this more straightforward I move the reference category to be the first category. This will make it easier when entering the data in the kuvee calculator.

In [48]:
numlabel, add
tab nsinteraction

capture drop newint
    gen newint = .
    replace newint = 1 if (nsinteraction==7)
    replace newint = 2 if (nsinteraction==1)
    replace newint = 3 if (nsinteraction==2)
    replace newint = 4 if (nsinteraction==3)
    replace newint = 5 if (nsinteraction==4)
    replace newint = 6 if (nsinteraction==5)
    replace newint = 7 if (nsinteraction==6)
    replace newint = 8 if (nsinteraction==8)
    replace newint = 9 if (nsinteraction==9)
    replace newint = 10 if (nsinteraction==10)
    replace newint = 11 if (nsinteraction==11)
    replace newint = 12 if (nsinteraction==12)
    replace newint = 13 if (nsinteraction==13)
    replace newint = 14 if (nsinteraction==14)
    replace newint = 15 if (nsinteraction==15)
    replace newint = 16 if (nsinteraction==16)
    tab newint
    
mi register passive newint
    
tab nsinteraction newint

* return to jupyter
. numlabel, add

. tab nsinteraction

      NSSEC |
Interaction |      Freq.     Percent        Cum.
------------+-----------------------------------
1. NCDS 1.1 |      9,690        1.46        1.46
 2. BCS 1.1 |     16,558        2.50        3.96
3. NCDS 1.2 |     13,174        1.99        5.95
 4. BCS 1.2 |     19,825        2.99        8.95
  5. NCDS 2 |     34,201        5.17       14.11
   6. BCS 2 |     50,360        7.61       21.72
  7. NCDS 3 |     28,093        4.24       25.96
   8. BCS 3 |     28,461        4.30       30.26
  9. NCDS 4 |     36,728        5.55       35.81
  10. BCS 4 |     49,205        7.43       43.24
 11. NCDS 5 |     50,592        7.64       50.88
  12. BCS 5 |     56,808        8.58       59.46
 13. NCDS 6 |     55,606        8.40       67.85
  14. BCS 6 |     53,038        8.01       75.86
 15. NCDS 7 |     80,234       12.12       87.98
  16. BCS 7 |     79,595       12.02      100.00
------------+-----------------------------------
      Total |    662,168      100.00

. 
. capture drop newint

.     gen newint = .
(668,711 missing values generated)

.     replace newint = 1 if (nsinteraction==7)
(28,093 real changes made)

.     replace newint = 2 if (nsinteraction==1)
(9,690 real changes made)

.     replace newint = 3 if (nsinteraction==2)
(16,558 real changes made)

.     replace newint = 4 if (nsinteraction==3)
(13,174 real changes made)

.     replace newint = 5 if (nsinteraction==4)
(19,825 real changes made)

.     replace newint = 6 if (nsinteraction==5)
(34,201 real changes made)

.     replace newint = 7 if (nsinteraction==6)
(50,360 real changes made)

.     replace newint = 8 if (nsinteraction==8)
(28,461 real changes made)

.     replace newint = 9 if (nsinteraction==9)
(36,728 real changes made)

.     replace newint = 10 if (nsinteraction==10)
(49,205 real changes made)

.     replace newint = 11 if (nsinteraction==11)
(50,592 real changes made)

.     replace newint = 12 if (nsinteraction==12)
(56,808 real changes made)

.     replace newint = 13 if (nsinteraction==13)
(55,606 real changes made)

.     replace newint = 14 if (nsinteraction==14)
(53,038 real changes made)

.     replace newint = 15 if (nsinteraction==15)
(80,234 real changes made)

.     replace newint = 16 if (nsinteraction==16)
(79,595 real changes made)

.     tab newint

     newint |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |     28,093        4.24        4.24
          2 |      9,690        1.46        5.71
          3 |     16,558        2.50        8.21
          4 |     13,174        1.99       10.20
          5 |     19,825        2.99       13.19
          6 |     34,201        5.17       18.36
          7 |     50,360        7.61       25.96
          8 |     28,461        4.30       30.26
          9 |     36,728        5.55       35.81
         10 |     49,205        7.43       43.24
         11 |     50,592        7.64       50.88
         12 |     56,808        8.58       59.46
         13 |     55,606        8.40       67.85
         14 |     53,038        8.01       75.86
         15 |     80,234       12.12       87.98
         16 |     79,595       12.02      100.00
------------+-----------------------------------
      Total |    662,168      100.00

.     
. mi register passive newint

.     
. tab nsinteraction newint

      NSSEC |                                                    newint
Interaction |         1          2          3          4          5          6          7          8          9         10 |     Total
------------+--------------------------------------------------------------------------------------------------------------+----------
1. NCDS 1.1 |         0      9,690          0          0          0          0          0          0          0          0 |     9,690 
 2. BCS 1.1 |         0          0     16,558          0          0          0          0          0          0          0 |    16,558 
3. NCDS 1.2 |         0          0          0     13,174          0          0          0          0          0          0 |    13,174 
 4. BCS 1.2 |         0          0          0          0     19,825          0          0          0          0          0 |    19,825 
  5. NCDS 2 |         0          0          0          0          0     34,201          0          0          0          0 |    34,201 
   6. BCS 2 |         0          0          0          0          0          0     50,360          0          0          0 |    50,360 
  7. NCDS 3 |    28,093          0          0          0          0          0          0          0          0          0 |    28,093 
   8. BCS 3 |         0          0          0          0          0          0          0     28,461          0          0 |    28,461 
  9. NCDS 4 |         0          0          0          0          0          0          0          0     36,728          0 |    36,728 
  10. BCS 4 |         0          0          0          0          0          0          0          0          0     49,205 |    49,205 
 11. NCDS 5 |         0          0          0          0          0          0          0          0          0          0 |    50,592 
  12. BCS 5 |         0          0          0          0          0          0          0          0          0          0 |    56,808 
 13. NCDS 6 |         0          0          0          0          0          0          0          0          0          0 |    55,606 
  14. BCS 6 |         0          0          0          0          0          0          0          0          0          0 |    53,038 
 15. NCDS 7 |         0          0          0          0          0          0          0          0          0          0 |    80,234 
  16. BCS 7 |         0          0          0          0          0          0          0          0          0          0 |    79,595 
------------+--------------------------------------------------------------------------------------------------------------+----------
      Total |    28,093      9,690     16,558     13,174     19,825     34,201     50,360     28,461     36,728     49,205 |   662,168 


      NSSEC |                              newint
Interaction |        11         12         13         14         15         16 |     Total
------------+------------------------------------------------------------------+----------
1. NCDS 1.1 |         0          0          0          0          0          0 |     9,690 
 2. BCS 1.1 |         0          0          0          0          0          0 |    16,558 
3. NCDS 1.2 |         0          0          0          0          0          0 |    13,174 
 4. BCS 1.2 |         0          0          0          0          0          0 |    19,825 
  5. NCDS 2 |         0          0          0          0          0          0 |    34,201 
   6. BCS 2 |         0          0          0          0          0          0 |    50,360 
  7. NCDS 3 |         0          0          0          0          0          0 |    28,093 
   8. BCS 3 |         0          0          0          0          0          0 |    28,461 
  9. NCDS 4 |         0          0          0          0          0          0 |    36,728 
  10. BCS 4 |         0          0          0          0          0          0 |    49,205 
 11. NCDS 5 |    50,592          0          0          0          0          0 |    50,592 
  12. BCS 5 |         0     56,808          0          0          0          0 |    56,808 
 13. NCDS 6 |         0          0     55,606          0          0          0 |    55,606 
  14. BCS 6 |         0          0          0     53,038          0          0 |    53,038 
 15. NCDS 7 |         0          0          0          0     80,234          0 |    80,234 
  16. BCS 7 |         0          0          0          0          0     79,595 |    79,595 
------------+------------------------------------------------------------------+----------
      Total |    50,592     56,808     55,606     53,038     80,234     79,595 |   662,168 


. 
. * return to jupyter

In [49]:
mi estimate, post: regress ability ib1.newint male i.parented [pweight=ipw], allbaselevels

* return to jupyter
. mi estimate, post: regress ability ib1.newint male i.parented [pweight=ipw], allbaselevels

Multiple-imputation estimates                   Imputations       =         60
Linear regression                               Number of obs     =     28,331
                                                Average RVI       =     0.4115
                                                Largest FMI       =     0.3949
                                                Complete DF       =      28311
DF adjustment:   Small sample                   DF:     min       =     376.16
                                                        avg       =     611.14
                                                        max       =   2,252.49
Model F test:       Equal FMI                   F(  19, 8704.8)   =     182.60
Within VCE type:       Robust                   Prob > F          =     0.0000

------------------------------------------------------------------------------
     ability |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      newint |
          2  |   2.823372    .864195     3.27   0.001     1.126202    4.520541
          3  |  -.6376888   .7817899    -0.82   0.415     -2.17297    .8975921
          4  |   1.849502   .7733177     2.39   0.017     .3296352    3.369368
          5  |   .9621695   .7714264     1.25   0.213    -.5543167    2.478656
          6  |   1.708454   .6153074     2.78   0.006     .4989125    2.917995
          7  |  -.9002437   .5898865    -1.53   0.128    -2.058989    .2585018
          8  |  -1.585695    .683324    -2.32   0.021    -2.928931   -.2424602
          9  |  -3.241009   .6280868    -5.16   0.000    -4.475416   -2.006603
         10  |  -5.423505    .624567    -8.68   0.000    -6.651111     -4.1959
         11  |  -2.937072    .592384    -4.96   0.000    -4.101296   -1.772848
         12  |  -5.289967   .5726627    -9.24   0.000    -6.414886   -4.165048
         13  |   -4.47695   .5983433    -7.48   0.000    -5.653467   -3.300433
         14  |  -6.752555   .5875267   -11.49   0.000    -7.906433   -5.598676
         15  |  -7.119842   .5645244   -12.61   0.000    -8.229579   -6.010104
         16  |   -8.78016   .5601276   -15.68   0.000     -9.88061    -7.67971
             |
        male |  -.5596313   .1796991    -3.11   0.002    -.9120244   -.2072382
             |
    parented |
          2  |   5.871626   .2362326    24.86   0.000     5.408002     6.33525
          3  |    8.30813   .5331239    15.58   0.000     7.261597    9.354663
          4  |   10.64564   .4568768    23.30   0.000     9.748591    11.54268
             |
       _cons |   101.7356   .4928228   206.43   0.000     100.7669    102.7043
------------------------------------------------------------------------------

. 
. * return to jupyter

In [50]:
estat vce

* return to jupyter
. estat vce

Covariance matrix of coefficients of regress model

             |          2.          3.          4.          5.          6.          7.          8.          9.         10.         11.
        e(V) |     newint      newint      newint      newint      newint      newint      newint      newint      newint      newint 
-------------+------------------------------------------------------------------------------------------------------------------------
    2.newint |  .74683305                                                                                                             
    3.newint |  .18603788   .61119543                                                                                                 
    4.newint |  .22785882   .21719497   .59802021                                                                                     
    5.newint |  .25155519   .22151191   .27220963   .59509866                                                                         
    6.newint |  .23648811   .20872186    .2210529   .23451172   .37860321                                                             
    7.newint |  .24044282   .21109345   .23845249   .24178883   .21292383   .34796605                                                 
    8.newint |  .23715027   .18746127   .19723372   .20705179    .2248706   .21560595   .46693166                                     
    9.newint |  .24116712   .19831142   .22686655   .22189722     .227669    .2159627   .24424337   .39449306                         
   10.newint |  .23390068   .19426161   .22526396   .22989888   .23048032   .21112124   .23795957   .23195252   .39008389             
   11.newint |  .24364124   .21013518   .21955054   .23434548   .22943067   .21782092   .23440539   .23880801   .22752142   .35091884 
   12.newint |   .2352734   .20438136   .21444775   .19188841    .2165198    .2081815   .22195449   .22305559   .20770804   .22439068 
   13.newint |  .24210358    .2104483   .23086428   .21423275   .23222796   .21845759   .24287263   .24282589   .24529834   .24240349 
   14.newint |   .2286826   .19901053   .21448431   .20325265   .20236443   .20196978   .20485222   .21959314    .2171903   .21961942 
   15.newint |  .22334486   .22055867    .2244139   .22108345    .2200071   .21369691   .22077229   .23292057   .23164934   .22800361 
   16.newint |   .2315445      .19481   .20927759   .20746533   .22602204   .19893182   .21390876    .2341284   .22183976   .22663294 
        male | -.00091878   .00106025  -.00242461  -.00480137   .00295409  -.00143565  -.00168798  -.00133548  -.00337208  -.00044733 
  2.parented | -.00377538  -.00211464  -.00729941  -.00521114  -.00010989  -.00760007   .00114305   .02348996   .01137725   .02130149 
  3.parented |  .01285024  -.02102008  -.04211511  -.04587194  -.00814881  -.01365593  -.00273648   .02067818  -.01161175   .01990581 
  4.parented | -.01107807  -.03566959  -.07243865  -.09386182  -.01440704  -.03461003   .01043986   .02380378   .01807404   .01534046 
       _cons | -.22828625  -.20331192  -.21458798  -.20842671  -.22384554  -.20638479   -.2216094  -.23779147   -.2269583  -.23443849 

             |         12.         13.         14.         15.         16.                      2.          3.          4.            
        e(V) |     newint      newint      newint      newint      newint        male    parented    parented    parented       _cons 
-------------+------------------------------------------------------------------------------------------------------------------------
   12.newint |  .32794261                                                                                                             
   13.newint |  .23635951   .35801475                                                                                                 
   14.newint |  .21284362   .22627058   .34518763                                                                                     
   15.newint |  .22328283   .24667879    .2173586   .31868785                                                                         
   16.newint |  .21598215   .23621015   .20279678   .22620632   .31374291                                                             
        male |  .00226928  -.00177822  -.00327582  -.00041014  -.00052549   .03229177                                                 
  2.parented |  .01083125   .02631137   .01317801    .0240857   .02554511   .00105094   .05580584                                     
  3.parented |  .01305168   .02546057  -.00046406   .01498719   .01164087   .00348899   .02291568   .28422109                         
  4.parented |  .01772126   .03475864   .02510464   .02376739   .02835881   .00202651   .03276072    .0403896   .20873638             
       _cons | -.22146877  -.24358735  -.21362248  -.23610623  -.22731873  -.01625676  -.03184747  -.02453975  -.03064344   .24287427 

. 

We enter the matrix into the calculator. Here are the results:

Kuvee Results

95% Quasi-variance standard errors are calculated as +- 1.96(QSE)

We input these results into Stata along with the coefficients.

N.B. We have moved the reference category back into its appropriate position.

In [51]:
clear
input cohort class  coef se qv qvse lb ub
1 1 2.82 0.864 0.502 0.709 1.43036 4.20964
2 1.1 -0.64 0.782 0.423 0.65 -1.914 0.634
1 2 1.85 0.773 0.375 0.612 0.65048 3.04952
2 2.2 0.96 0.771 0.374 0.611 -0.23756 2.15756
1 3 1.71 0.615 0.157 0.396 0.93384 2.48616
2 3.2 -0.9 0.59 0.146 0.382 -1.64872 -0.15128
1 4 0 0 0.223 0.472 -0.92512 0.92512
2 4.2 -1.59 0.683 0.243 0.493 -2.55628 -0.62372
1 5 -3.24 0.628 0.157 0.397 -4.01812 -2.46188
2 5.2 -5.42 0.625 0.162 0.403 -6.20988 -4.63012
1 6 -2.94 0.592 0.117 0.342 -3.61032 -2.26968
2 6.2 -5.29 0.573 0.114 0.338 -5.95248 -4.62752
1 7 -4.48 0.598 0.107 0.328 -5.12288 -3.83712
2 7.2 -6.75 0.588 0.144 0.379 -7.49284 -6.00716
1 8 -7.12 0.565 0.089 0.298 -7.70408 -6.53592
2 8.2 -8.78 0.56 0.097 0.312 -9.39152 -8.16848
end

summarize

* return to jupyter
. clear

. input cohort class  coef se qv qvse lb ub

        cohort      class       coef         se         qv       qvse         lb         ub
  1. 1 1 2.82 0.864 0.502 0.709 1.43036 4.20964
  2. 2 1.1 -0.64 0.782 0.423 0.65 -1.914 0.634
  3. 1 2 1.85 0.773 0.375 0.612 0.65048 3.04952
  4. 2 2.2 0.96 0.771 0.374 0.611 -0.23756 2.15756
  5. 1 3 1.71 0.615 0.157 0.396 0.93384 2.48616
  6. 2 3.2 -0.9 0.59 0.146 0.382 -1.64872 -0.15128
  7. 1 4 0 0 0.223 0.472 -0.92512 0.92512
  8. 2 4.2 -1.59 0.683 0.243 0.493 -2.55628 -0.62372
  9. 1 5 -3.24 0.628 0.157 0.397 -4.01812 -2.46188
 10. 2 5.2 -5.42 0.625 0.162 0.403 -6.20988 -4.63012
 11. 1 6 -2.94 0.592 0.117 0.342 -3.61032 -2.26968
 12. 2 6.2 -5.29 0.573 0.114 0.338 -5.95248 -4.62752
 13. 1 7 -4.48 0.598 0.107 0.328 -5.12288 -3.83712
 14. 2 7.2 -6.75 0.588 0.144 0.379 -7.49284 -6.00716
 15. 1 8 -7.12 0.565 0.089 0.298 -7.70408 -6.53592
 16. 2 8.2 -8.78 0.56 0.097 0.312 -9.39152 -8.16848
 17. end

. 
. summarize

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
      cohort |         16         1.5    .5163978          1          2
       class |         16     4.59375    2.378366          1        8.2
        coef |         16   -2.488125    3.557466      -8.78       2.82
          se |         16    .6129375     .188273          0       .864
          qv |         16     .214375    .1311396       .089       .502
-------------+---------------------------------------------------------
        qvse |         16     .445125    .1316657       .298       .709
          lb |         16    -3.36057    3.362431   -9.39152    1.43036
          ub |         16    -1.61568    3.760104   -8.16848    4.20964

. 
. * return to jupyter

Code to make figure 1:

In [52]:
label variable class "Father's NSSEC"
label variable coef "OLS Coefficient"
label variable lb "Upper bound"
label variable ub "Lower bound"
summarize
*return to jupyter
. label variable class "Father's NSSEC"

. label variable coef "OLS Coefficient"

. label variable lb "Upper bound"

. label variable ub "Lower bound"

. summarize

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
      cohort |         16         1.5    .5163978          1          2
       class |         16     4.59375    2.378366          1        8.2
        coef |         16   -2.488125    3.557466      -8.78       2.82
          se |         16    .6129375     .188273          0       .864
          qv |         16     .214375    .1311396       .089       .502
-------------+---------------------------------------------------------
        qvse |         16     .445125    .1316657       .298       .709
          lb |         16    -3.36057    3.362431   -9.39152    1.43036
          ub |         16    -1.61568    3.760104   -8.16848    4.20964

. *return to jupyter

In [53]:
graph set window fontface "Times New Roman"

graph twoway (scatter coef class if (cohort==1), msymbol(circle))(scatter coef class if (cohort==2), msymbol(diamond))|| rspike ub lb class, xlabel(1 "1.1" 2 "1.2" 3 "2" 4 "3" 5 "4" 6 "5" 7 "6" 8 "7")xtitle( , size(small))xline(3.5, lp(dash))xline(5.6, lp(dash))text(-7.5 2 "Managerial and" "Professional")text(-7.5 5 "Intermediate")text(1 7.1  "Routine and" "Manual")scheme(s1mono)legend(order(1 "NCDS 58" 2 "BCS 70") row(1) region(lwidth(none)))title("Predictions of General Ability Test Score by Father's Social Class", size(medsmall))subtitle("OLS Regression Coefficients and 95% Quasi-Variance Comparison Intervals ", size(small))note("Data: 1958 National Child Development Study and 1970 British Cohort Study." "Note: Estimates are taken from table 6, model 2. Model also contains Gender and Parent's Highest Education.", size(vsmall))
graph export F:\DATA\MYDATA\TEMP\figure1.png, replace

*return to jupyter
. graph set window fontface "Times New Roman"

. 
. graph twoway (scatter coef class if (cohort==1), msymbol(circle))(scatter coef class if (cohort==2), msymbol(diamond))|| rspike ub lb clas
> s, xlabel(1 "1.1" 2 "1.2" 3 "2" 4 "3" 5 "4" 6 "5" 7 "6" 8 "7")xtitle( , size(small))xline(3.5, lp(dash))xline(5.6, lp(dash))text(-7.5 2 "M
> anagerial and" "Professional")text(-7.5 5 "Intermediate")text(1 7.1  "Routine and" "Manual")scheme(s1mono)legend(order(1 "NCDS 58" 2 "BCS 
> 70") row(1) region(lwidth(none)))title("Predictions of General Ability Test Score by Father's Social Class", size(medsmall))subtitle("OLS 
> Regression Coefficients and 95% Quasi-Variance Comparison Intervals ", size(small))note("Data: 1958 National Child Development Study and 1
> 970 British Cohort Study." "Note: Estimates are taken from table 6, model 2. Model also contains Gender and Parent's Highest Education.", 
> size(vsmall))

. graph export F:\DATA\MYDATA\TEMP\figure1.png, replace
(file F:\DATA\MYDATA\TEMP\figure1.png written in PNG format)

. 
. *return to jupyter

!! Now change your kernel to the Python kernel

Go to: Kernel -> Change Kernel -> Python

In [1]:
from IPython.core.display import Image
Image(filename=('f:/DATA/MYDATA/TEMP/figure1.png'))
Out[1]:

figure 1

!! Now return to the Stata kernel

Go to: Kernel -> Change Kernel -> Stata

We need to reset the paths in Stata. Stata forgets this when the kernel is changed.

In [1]:
global path1 "F:\Data\RAWDATA"
global path2 "F:\Data\MYDATA\WORK"
global path3 "F:\Data\MYDATA\TEMP"
global path4 "F:\Data\MYDATA\FINAL"

clear

*return to jupyter
. global path1 "F:\Data\RAWDATA"

. global path2 "F:\Data\MYDATA\WORK"

. global path3 "F:\Data\MYDATA\TEMP"

. global path4 "F:\Data\MYDATA\FINAL"

. 
. clear

. 
. *return to jupyter


Discussion of Social Class Effect

There is a clear and observable negative social class gradient that is net of gender and parental education. Overall children from more occupationally advantaged social classes perform better on the general cognitive ability test. The negative social class gradient, and the differences between social class categories, may reflect the instability, and the economic and social strain that results from belonging to the more disadvantaged social class groups (Layte, 2017; Elder, 1994; Conger and Conger, 2002). These differences may also reflect other characteristics of parents jobs, such as complexity (see Parcel and Menaghan, 1994).

At the apex of the social class hierarchy are fathers in the ‘managerial and professional’ class. These include fathers in NS-SEC 1.1 (Large employers, higher managerial and administrative occupations), along with fathers in NS-SEC 1.2 (Higher professional occupations) and fathers in NS-SEC 2 (Lower managerial, administrative and professional occupations). The ‘managerial and professional’ class comprises more complex and higher skilled occupations, and employees usually enjoy a high degree of job security, and have a regular, and known, monthly income (Goldthorpe and McKnight, 2006). Fathers in the ‘managerial and professional’ class can have realistic expectations of salary increases, for example via incremental pay scales, and they can realistically expect to be promoted within their occupations up to the age of 50 and even beyond. These advantages are likely to make substantial economic, social, and cultural contributions to the households in which children grow up.

At the base of the social class hierarchy are the ‘routine and manual occupations’ (NS-SEC 5, 6 and 7). In both cohorts children born into families in ‘routine and manual occupations’ have markedly lower cognitive ability test scores than children from ‘managerial and professional occupations’ families (NS-SEC 1.1, 1.2 and 2). The fathers in NS-SEC 6 and NS-SEC 7 comprise a group of wage-workers in lower skilled jobs that are usually of a routine nature. The economic lives of the fathers in these classes are characterised by a relatively high risk of job loss, recurrent and often long-term unemployment, and lower earnings. Occupations in NS-SEC 6 and NS-SEC 7 are often rewarded on a weekly rather than an annual basis, and pay can vary as a result of the availability of overtime, piece-rates or shift work premia (Goldthorpe, 2016). The advent of negative events such as job loss and unemployment are likely to have immediate impacts on a household’s economic and social circumstances. We speculate that the precarious nature of the employment conditions that are experienced by employees in routine and manual occupations hangs like a sword of Damocles over these families. The lack of economic security and the lower material rewards associated with jobs in NS-SEC 6 and NS-SEC 7 may contribute to the impoverished cognitive ability of children with fathers in these classes.

Occupations in NS-SEC 5 (Lower Supervisory and Technical) usually require specific skills and organisational knowledge. Occupations in this class generally provide more stable employment and include some of the conditions of employment, for example an annual salary, typical in the managerial and professional class. The additional occupational complexity, along with the improved economic security and benefits associated with occupations in NS-SEC 5 may contribute to the improved cognitive ability of children with fathers in this class.

Between the ‘managerial and professional occupations’ (NS-SEC 1.1, 1.2 and 2) and the ‘routine and manual occupations’ (NS-SEC 5, 6 and 7) rests the ‘intermediate occupations’ (NS-SEC 3 and 4). Despite being distinctive the ‘intermediate occupations’ are not organised into a hierarchical order. NS-SEC 4 (Small Employers and Own Account Workers) theoretically stands apart from NS-SEC 3 (Intermediate) because it is composed of self-employed workers and small employers. NS-SEC 4 comprises both those who are engaged in largely manual work along with others who are engaged in non-manual work. In contrast to fathers in NS-SEC 1.1 (Large Employers and Higher Managerial Occupations), the fathers in NS-SEC 4 carry out the majority of the entrepreneurial and managerial functions within their enterprise. The children with fathers in NS-SEC 4 have cognitive test scores that are more similar to counterparts in NS-SEC 5 than to other children with fathers in NS-SEC 3. The better performance of children with fathers in NS-SEC 3 may be a reflection of their father’s being engaged in intermediate occupations that can reasonably be described as being ‘white collar’. Being engaged in white collar occupations generally leads to better employment conditions and economic rewards.

In the discussion above we have focussed on the employment characteristics and conditions associated with the NS-SEC categories. The observable negative gradient leads to the plausible conclusion that class differences with substantial differences in the economic, social and cultural milieus within households. We speculate that social class differences in cultural values, parenting styles and family activities may also play a role in reproducing inequalities (see Bourdieu and Passeron, 1977; Ermisch, 2008; Kiernan and Mensah, 2011; Lareau, 2011; Washbrook, 2011; Vincent and Ball, 2007; Sullivan et al., 2013). Researchers in fields such as psychology have pointed to the heritability of general cognitive ability (see for example Tucker-Drob et al., 2013; Hill et al., 2014; Deary et al., 2006), which might be another potentially plausible dimension contributing to the social class gradient.


Conclusions

Overall, this article provides persuasive evidence that whilst there are sociologically important and informative differences between social classes, there has not been a notable change in the relative ordering of social class inequalities in childhood general cognitive ability test scores between these two birth cohorts. These analyses detect that gender, parental education and social class have structuring effects on general cognitive ability in childhood. This underlines the benefits of moving beyond psychology’s standard disciplinary boundaries in order to develop a more comprehensive understanding of social influences on cognitive inequalities (Flynn, 2012).

In Britain since the end of the Second World War there have been ongoing concerns about social inequality in education. Despite numerous new educational policies and initiatives the structure and organisation of primary schools remained relatively unchanged in the second half of the twentieth century. Primary schools in the post-war period might reasonably be described as being in a state of ‘constant flux’. The children of the NCDS began primary school in the early 1960s, and the children of the BCS entered primary school twelve years later. Nevertheless, the evidence from analysing the two British birth cohorts is that social class inequalities in childhood cognitive ability test scores were notable and persistent. The extent to which parental social class inequalities in general cognitive ability test scores have changed in more recent cohorts is a question for further empirical investigation. Unfortunately, at the current time we are not aware of any nationally representative UK datasets that contain suitable general cognitive ability test measures to effectively examine more recent cohorts.

Children’s cognitive ability test scores summarize their capability to understand complex ideas, to engage in various forms of reasoning, to learn from experience and to effectively adapt to their environment. The overall finding, that social class divisions in cognitive ability can be observed when children are still at primary school, and that these inequalities are persistent, is a disturbing result. Pupils with fathers in ‘routine and manual occupations’ are at a distinct disadvantage. These pupils arrived at secondary school already weighed down with stones in their satchels. This is an important finding to emphasise because cognitive ability is known to influence individuals throughout their lives (see Deary et al., 2007; Nettle, 2003; Vanhanen, 2011; Schoon, 2010).

There is an increasing desire and requirement to make sociological research more transparent, and to actively render it reproducible. In addition to the substantive findings, this article makes a ground breaking methodological contribution by using Jupyter notebooks which are an internationally recognised open source research platform. Publishing the Jupyter notebook allows third parties to fully reproduce the complete workflow behind the production of the article, and to duplicate the empirical results. In addition to increasing transparency, this approach enables the possibility for other researchers to extend the work, for example with different measures, additional data or alternative techniques. Improving transparency is an attractive feature and is highly likely to make a major contribution to quantitative sociology.

In developing an open and published workflow we have drawn upon ideas advanced in computer science especially the concept ‘literate computing’, which is the weaving of a narrative directly into live computation, interleaving text with code and results in order to construct a complete piece that achieves the goals of communicating results 6 (Knuth, 1992). A further innovation within this work has been the adoption of ‘pair programming’ which is a technique from software development in which two programmers work together in the development of code. In addition we have also used ‘code peer review’, and each author has run the complete workflow independently using a different computer and software set-up. This has enabled us to undertake an in-depth test of the reproducibility of the work. These practices are rarely utilised in sociological research but bring great benefits to the discipline.


Notes

[1] Although there is some evidence that these increases may have slowed, or even stopped, in recent years (Teasdale & Owen, 2008).

[2] The 1970 British Cohort Study included babies born in Northern Ireland in the first interview (at birth) but these babies were dropped from all subsequent sweeps of data collection within the study.

[3] A measure of mother’s occupation before pregnancy was collected in the NCDS birth survey, variable n539. However more than half of mothers in our sample have no occupational information and the information available is only provided as non-standard occupational categories (e.g. ‘bank clerks etc.’, ‘Textile-labourer’, ‘Clerks, typists’). Information on mother’s occupation is also provided in the age 11 NCDS survey dataset, variable n1225. This variable indicates that over half of mothers in our sample have no occupational information and also uses a non-standard categorisation of occupations. Mother’s Registrar General Social Class (n2393) and Socio-Economic Group (n2394) are available from the age 16 NCDS survey, however more than half mothers in our sample have no occupational information. We chose not to use the available mother’s occupational information because of the large number of mothers with no occupational information. The non-standard classification of occupations would not enable us to produce comparable socio-economic measures in a suitably standardised manner. We do not use the mother’s occupational information from the age 16 sweep of the survey as it is collected 5 years after the outcome of interest and it would not allow us to produce NS-SEC in a standardised manner, we therefore consider that it is not an appropriate measure for the present analysis.

In [2]:
use $path3\pooledNCDSBCS_v2.dta, clear

tab n539 if (cohort==1), mi
tab n1225 if (cohort==1), mi
tab n2393 if (cohort==1), mi
tab n2394 if (cohort==1), mi

* return to jupyter
. use $path3\pooledNCDSBCS_v2.dta, clear

. 
. tab n539 if (cohort==1), mi

0 Mums paid job when |
  starting this baby |
          (GRO 1951) |      Freq.     Percent        Cum.
---------------------+-----------------------------------
         1. Teachers |        269        1.54        1.54
 2. Nurses qualified |         92        0.53        2.07
  3. Bank clerks etc |        246        1.41        3.49
  4. Shopkeepers etc |         60        0.34        3.83
 5. Others in SCI,II |        101        0.58        4.41
 6. Nurses- not qual |        109        0.63        5.04
   7. Clerks,typists |      1,559        8.95       13.99
 8. Shop asst,hairdr |        799        4.59       18.58
  9. Garment workers |        152        0.87       19.45
10. Textile wkr skld |        281        1.61       21.06
11. Personal service |        224        1.29       22.35
12. Others in SC III |        553        3.18       25.52
      13. Machinists |        287        1.65       27.17
14. Textile wkr SCIV |        104        0.60       27.77
   15. Personal-SCIV |        379        2.18       29.95
 16. Others in SC IV |        988        5.67       35.62
17. Textile-labourer |        356        2.04       37.66
   18. Personal-SC V |        122        0.70       38.36
                   . |     10,734       61.64      100.00
---------------------+-----------------------------------
               Total |     17,415      100.00

. tab n1225 if (cohort==1), mi

   2P Mothers's most |
 recent work and SEG |
          (GRO 1966) |      Freq.     Percent        Cum.
---------------------+-----------------------------------
       1. Prof,manag |        255        1.46        1.46
 2. Intermed non-man |        751        4.31        5.78
  3. Typist,clerical |      1,257        7.22       12.99
   4. Shop assistant |        805        4.62       17.62
 5. Telephonists etc |        178        1.02       18.64
 6. Personal service |      1,717        9.86       28.50
 7. Forewomen,manual |        120        0.69       29.19
   8. Manual workers |      2,705       15.53       44.72
      9. Own account |         67        0.38       45.10
    10. Farm workers |        140        0.80       45.91
 11. Inadequate info |         42        0.24       46.15
                   . |      9,378       53.85      100.00
---------------------+-----------------------------------
               Total |     17,415      100.00

. tab n2393 if (cohort==1), mi

  3P Mother-s social |
 class,if works (GRO |
               1970) |      Freq.     Percent        Cum.
---------------------+-----------------------------------
                1. I |         35        0.20        0.20
               2. II |      1,134        6.51        6.71
      3. III non-man |      2,230       12.81       19.52
       4. III manual |        522        3.00       22.52
       5. IV non-man |      1,290        7.41       29.92
        6. IV manual |      1,107        6.36       36.28
                7. V |        742        4.26       40.54
     8. Unclassified |        104        0.60       41.14
                   . |     10,251       58.86      100.00
---------------------+-----------------------------------
               Total |     17,415      100.00

. tab n2394 if (cohort==1), mi

          3P Mothers |
      Socio-economic |
 group,if works (GRO |
               1970) |      Freq.     Percent        Cum.
---------------------+-----------------------------------
  1. Emp,manag large |         43        0.25        0.25
  2. Emp,manag small |        212        1.22        1.46
    3. Prof-self-emp |          7        0.04        1.50
   4. Prof-employees |         39        0.22        1.73
 5. Intermed non-man |        935        5.37        7.10
   6. Junior non-man |      2,239       12.86       19.95
 7. Personal service |      1,289        7.40       27.36
     8. Foremen-man. |         55        0.32       27.67
   9. Skilled manual |        279        1.60       29.27
10. Semi skld manual |      1,039        5.97       35.24
11. Unskilled manual |        743        4.27       39.51
12. Work own account |        111        0.64       40.14
13. Farmer-emp,manag |          2        0.01       40.16
14. Farm-own account |          6        0.03       40.19
    15. Agric worker |         59        0.34       40.53
    16. Armed forces |          1        0.01       40.53
 17. Inadequate info |        104        0.60       41.13
                   . |     10,252       58.87      100.00
---------------------+-----------------------------------
               Total |     17,415      100.00

. 
. * return to jupyter

[4] There is a significant relationship between parental education and father’s social class (χ2 = 4700; p < 0.001 @ 21 d.f.). The association between parental education and father’s social class is relatively weak (V = 0.30). The average variance inflation from the complete records model was 1.70. Following conventional methodological advice we conclude that multicollinearity is not a concern in this model (see Menard, 2002).

In [3]:
use $path3\pooledNCDSBCS_v3.dta, clear

* return to jupyter
. use $path3\pooledNCDSBCS_v3.dta, clear

. 
. * return to jupyter

In [4]:
tab parented dadnssec if(samplenssec==0), chi V r
kap parented dadnssec if(samplenssec==0)
pwcorr parented dadnssec if(samplenssec==0), sig

* return to jupyter
. tab parented dadnssec if(samplenssec==0), chi V r

+----------------+
| Key            |
|----------------|
|   frequency    |
| row percentage |
+----------------+

  Parent's |
   Highest |                                     Father's NSSEC
 Education | 1. Large   2. Higher  3. Lower   4. Interm  5. Small   6. Lower   7. Semi-R  8. Routin |     Total
-----------+----------------------------------------------------------------------------------------+----------
         1 |       210        141        831        747      1,529      2,099      2,219      3,350 |    11,126 
           |      1.89       1.27       7.47       6.71      13.74      18.87      19.94      30.11 |    100.00 
-----------+----------------------------------------------------------------------------------------+----------
         2 |       321        360        980        718        592        774        589        562 |     4,896 
           |      6.56       7.35      20.02      14.67      12.09      15.81      12.03      11.48 |    100.00 
-----------+----------------------------------------------------------------------------------------+----------
         3 |        47         78        146         82         62        102         54         42 |       613 
           |      7.67      12.72      23.82      13.38      10.11      16.64       8.81       6.85 |    100.00 
-----------+----------------------------------------------------------------------------------------+----------
         4 |        89        351        383         88         47         63         40         20 |     1,081 
           |      8.23      32.47      35.43       8.14       4.35       5.83       3.70       1.85 |    100.00 
-----------+----------------------------------------------------------------------------------------+----------
     Total |       667        930      2,340      1,635      2,230      3,038      2,902      3,974 |    17,716 
           |      3.76       5.25      13.21       9.23      12.59      17.15      16.38      22.43 |    100.00 

         Pearson chi2(21) =  4.7e+03   Pr = 0.000
               Cramér's V =   0.2968

. kap parented dadnssec if(samplenssec==0)

             Expected
Agreement   Agreement     Kappa   Std. Err.         Z      Prob>Z
-----------------------------------------------------------------
   4.54%       4.84%    -0.0031     0.0013      -2.34      0.9904

. pwcorr parented dadnssec if(samplenssec==0), sig

             | parented dadnssec
-------------+------------------
    parented |   1.0000 
             |
             |
    dadnssec |  -0.4370   1.0000 
             |   0.0000
             |

. 
. * return to jupyter

[5] Quasi-variance comparison intervals allow comparisons to be made between all categories whereas conventional confidence intervals are restricted to comparisons with the reference category (see Firth, 2003; Gayle and Lambert, 2007). The quasi-variance estimation approach is based on an approximation (see Firth, 2003).

[6] See also here.


Supplementary Materials

Complete Records Models
In [5]:
use $path3\pooledNCDSBCS_v3.dta, clear

* return to jupyter
. use $path3\pooledNCDSBCS_v3.dta, clear

. 
. * return to jupyter

In [6]:
regress ability male i.parented ib4.dadnssec i.cohort if(samplenssec==0), allbaselevels

* return to jupyter
. regress ability male i.parented ib4.dadnssec i.cohort if(samplenssec==0), allbaselevels

      Source |       SS           df       MS      Number of obs   =    17,716
-------------+----------------------------------   F(12, 17703)    =    226.92
       Model |  512719.985        12  42726.6654   Prob > F        =    0.0000
    Residual |  3333217.85    17,703   188.28548   R-squared       =    0.1333
-------------+----------------------------------   Adj R-squared   =    0.1327
       Total |  3845937.84    17,715   217.10064   Root MSE        =    13.722

-----------------------------------------------------------------------------------------------------------
                                  ability |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------------------------------+----------------------------------------------------------------
                                     male |  -.6621118   .2063506    -3.21   0.001    -1.066579   -.2576445
                                          |
                                 parented |
                                       1  |          0  (base)
                                       2  |   5.629584   .2533412    22.22   0.000      5.13301    6.126158
                                       3  |   7.873391   .5870722    13.41   0.000     6.722672     9.02411
                                       4  |     10.239   .4896922    20.91   0.000     9.279153    11.19884
                                          |
                                 dadnssec |
1. Large Employers and Higher Managerial  |   1.564068   .6322793     2.47   0.013     .3247385    2.803397
                  2. Higher Professional  |   2.076091   .5853141     3.55   0.000     .9288175    3.223364
    3. Lower managerial and professional  |   1.152207   .4456866     2.59   0.010     .2786179    2.025797
                         4. Intermediate  |          0  (base)
      5. Small employers and own account  |  -3.436122   .4502615    -7.63   0.000    -4.318679   -2.553566
      6. Lower Supervisory and Technical  |  -3.255671   .4248892    -7.66   0.000    -4.088495   -2.422847
                         7. Semi-Routine  |  -4.664142   .4306101   -10.83   0.000     -5.50818   -3.820104
                              8. Routine  |    -6.9763   .4134982   -16.87   0.000    -7.786797   -6.165803
                                          |
                                   cohort |
                                 1. NCDS  |          0  (base)
                                  2. BCS  |  -2.004158   .2129956    -9.41   0.000     -2.42165   -1.586666
                                          |
                                    _cons |   102.6546   .3858041   266.08   0.000     101.8984    103.4108
-----------------------------------------------------------------------------------------------------------

. 
. * return to jupyter

In [7]:
vif

testparm ib4.dadnssec##i.cohort

predict r, resid
kdensity r, normal
pnorm r
qnorm r

rvfplot, yline(0)
estat imtest
estat hettest

* return to jupyter
. vif

    Variable |       VIF       1/VIF  
-------------+----------------------
        male |      1.00    0.999011
    parented |
          2  |      1.21    0.828021
          3  |      1.08    0.923138
          4  |      1.29    0.773549
    dadnssec |
          1  |      1.36    0.733736
          2  |      1.60    0.623698
          3  |      2.14    0.466728
          5  |      2.10    0.476441
          6  |      2.41    0.414359
          7  |      2.39    0.418451
          8  |      2.80    0.357238
    2.cohort |      1.06    0.943997
-------------+----------------------
    Mean VIF |      1.70

. 
. testparm ib4.dadnssec##i.cohort

 ( 1)  1.dadnssec = 0
 ( 2)  2.dadnssec = 0
 ( 3)  3.dadnssec = 0
 ( 4)  5.dadnssec = 0
 ( 5)  6.dadnssec = 0
 ( 6)  7.dadnssec = 0
 ( 7)  8.dadnssec = 0
 ( 8)  2.cohort = 0

       F(  8, 17703) =   99.17
            Prob > F =    0.0000

. 
. predict r, resid
(16,267 missing values generated)

. kdensity r, normal

. pnorm r

. qnorm r

. 
. rvfplot, yline(0)

. estat imtest

Cameron & Trivedi's decomposition of IM-test

---------------------------------------------------
              Source |       chi2     df      p
---------------------+-----------------------------
  Heteroskedasticity |     158.55     54    0.0000
            Skewness |     153.80     12    0.0000
            Kurtosis |      51.61      1    0.0000
---------------------+-----------------------------
               Total |     363.96     67    0.0000
---------------------------------------------------

. estat hettest

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity 
         Ho: Constant variance
         Variables: fitted values of ability

         chi2(1)      =    41.27
         Prob > chi2  =   0.0000

. 
. * return to jupyter

In [8]:
*TABLE S4 MODEL 1
regress ability male i.parented  ib4.dadnssec i.cohort if(samplenssec==0), allbaselevels

* return to jupyter
. *TABLE S4 MODEL 1

. regress ability male i.parented  ib4.dadnssec i.cohort if(samplenssec==0), allbaselevels

      Source |       SS           df       MS      Number of obs   =    17,716
-------------+----------------------------------   F(12, 17703)    =    226.92
       Model |  512719.985        12  42726.6654   Prob > F        =    0.0000
    Residual |  3333217.85    17,703   188.28548   R-squared       =    0.1333
-------------+----------------------------------   Adj R-squared   =    0.1327
       Total |  3845937.84    17,715   217.10064   Root MSE        =    13.722

-----------------------------------------------------------------------------------------------------------
                                  ability |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------------------------------+----------------------------------------------------------------
                                     male |  -.6621118   .2063506    -3.21   0.001    -1.066579   -.2576445
                                          |
                                 parented |
                                       1  |          0  (base)
                                       2  |   5.629584   .2533412    22.22   0.000      5.13301    6.126158
                                       3  |   7.873391   .5870722    13.41   0.000     6.722672     9.02411
                                       4  |     10.239   .4896922    20.91   0.000     9.279153    11.19884
                                          |
                                 dadnssec |
1. Large Employers and Higher Managerial  |   1.564068   .6322793     2.47   0.013     .3247385    2.803397
                  2. Higher Professional  |   2.076091   .5853141     3.55   0.000     .9288175    3.223364
    3. Lower managerial and professional  |   1.152207   .4456866     2.59   0.010     .2786179    2.025797
                         4. Intermediate  |          0  (base)
      5. Small employers and own account  |  -3.436122   .4502615    -7.63   0.000    -4.318679   -2.553566
      6. Lower Supervisory and Technical  |  -3.255671   .4248892    -7.66   0.000    -4.088495   -2.422847
                         7. Semi-Routine  |  -4.664142   .4306101   -10.83   0.000     -5.50818   -3.820104
                              8. Routine  |    -6.9763   .4134982   -16.87   0.000    -7.786797   -6.165803
                                          |
                                   cohort |
                                 1. NCDS  |          0  (base)
                                  2. BCS  |  -2.004158   .2129956    -9.41   0.000     -2.42165   -1.586666
                                          |
                                    _cons |   102.6546   .3858041   266.08   0.000     101.8984    103.4108
-----------------------------------------------------------------------------------------------------------

. 
. * return to jupyter

In [9]:
est sto m1
fitstat, s(m1)

* return to jupyter
. est sto m1

. fitstat, s(m1)

Measures of Fit for regress of ability

Log-Lik Intercept Only:   -72796.653     Log-Lik Full Model:       -71529.256
D(17700):                 143058.513     LR(12):                     2534.793
                                         Prob > LR:                     0.000
R2:                            0.133     Adjusted R2:                   0.133
AIC:                           8.077     AIC*n:                    143090.513
BIC:                      -30086.843     BIC':                      -2417.407

(Indices saved in matrix fs_m1)

. 
. * return to jupyter

In [10]:
*TABLE S4 Model 2
regress ability male i.parented ib7.nsinteraction if(samplenssec==0), allbaselevels

* return to jupyter
. *TABLE S4 Model 2

. regress ability male i.parented ib7.nsinteraction if(samplenssec==0), allbaselevels

      Source |       SS           df       MS      Number of obs   =    17,716
-------------+----------------------------------   F(19, 17696)    =    143.71
       Model |  514115.506        19  27058.7109   Prob > F        =    0.0000
    Residual |  3331822.33    17,696  188.281099   R-squared       =    0.1337
-------------+----------------------------------   Adj R-squared   =    0.1327
       Total |  3845937.84    17,715   217.10064   Root MSE        =    13.722

-------------------------------------------------------------------------------
      ability |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
         male |  -.6695247   .2064042    -3.24   0.001    -1.074097   -.2649522
              |
     parented |
           1  |          0  (base)
           2  |   5.635049   .2536317    22.22   0.000     5.137906    6.132192
           3  |   7.903272   .5880555    13.44   0.000     6.750625    9.055918
           4  |   10.27633   .4906155    20.95   0.000     9.314679    11.23799
              |
nsinteraction |
    NCDS 1.1  |      2.409   .9207012     2.62   0.009     .6043356    4.213665
     BCS 1.1  |  -1.037565   .8526089    -1.22   0.224    -2.708762    .6336323
    NCDS 1.2  |    1.43215   .8092031     1.77   0.077    -.1539669    3.018268
     BCS 1.2  |   .7326651   .8009008     0.91   0.360    -.8371791    2.302509
      NCDS 2  |   1.509022   .6158601     2.45   0.014     .3018761    2.716168
       BCS 2  |  -1.101743   .6127955    -1.80   0.072    -2.302883    .0993958
      NCDS 3  |          0  (base)
       BCS 3  |   -1.90088   .6853285    -2.77   0.006    -3.244191   -.5575688
      NCDS 4  |   -3.40209   .6086739    -5.59   0.000    -4.595151    -2.20903
       BCS 4  |  -5.373346   .6258719    -8.59   0.000    -6.600116   -4.146576
      NCDS 5  |  -3.033046   .5757856    -5.27   0.000    -4.161642    -1.90445
       BCS 5  |   -5.40104   .5834729    -9.26   0.000    -6.544704   -4.257376
      NCDS 6  |  -4.571711   .5696539    -8.03   0.000    -5.688288   -3.455133
       BCS 6  |  -6.679397    .607365   -11.00   0.000    -7.869892   -5.488902
      NCDS 7  |  -7.164837     .54458   -13.16   0.000    -8.232268   -6.097407
       BCS 7  |  -8.580919    .573249   -14.97   0.000    -9.704543   -7.457294
              |
        _cons |   102.6061   .4826751   212.58   0.000       101.66    103.5522
-------------------------------------------------------------------------------

. 
. * return to jupyter

In [11]:
est sto m2
fitstat

* return to jupyter
. est sto m2

. fitstat

Measures of Fit for regress of ability

Log-Lik Intercept Only:   -72796.653     Log-Lik Full Model:       -71525.547
D(17694):                 143051.094     LR(19):                     2542.212
                                         Prob > LR:                     0.000
R2:                            0.134     Adjusted R2:                   0.133
AIC:                           8.077     AIC*n:                    143095.094
BIC:                      -30035.568     BIC':                      -2356.350

. 
. * return to jupyter

In [12]:
fitstat, using(m1)

* return to jupyter
. fitstat, using(m1)

Measures of Fit for regress of ability

                             Current            Saved       Difference
Model:                       regress          regress
N:                             17716            17716                0
Log-Lik Intercept Only:   -72796.653       -72796.653            0.000
Log-Lik Full Model:       -71525.547       -71529.256            3.709
D:                        143051.094(17694) 143058.513(17700)   -7.419(-6)
LR:                         2542.212(19)     2534.793(12)        7.419(7)
Prob > LR:                     0.000            0.000            0.000
R2:                            0.134            0.133            0.000
Adjusted R2:                   0.133            0.133            0.000
AIC:                           8.077            8.077            0.000
AIC*n:                    143095.094       143090.513            4.581
BIC:                      -30035.568       -30086.843           51.275
BIC':                      -2356.350        -2417.407           61.057

Difference of   61.057 in BIC' provides very strong support for saved model.

. 
. * return to jupyter

In [13]:
esttab m1 m2, replace cells(b(star fmt(2) label(Coef.)) se(par fmt(2))) stats(N) unstack

* return to jupyter
. esttab m1 m2, replace cells(b(star fmt(2) label(Coef.)) se(par fmt(2))) stats(N) unstack

--------------------------------------------
                      (1)             (2)   
                  ability         ability   
                 Coef./se        Coef./se   
--------------------------------------------
male                -0.66**         -0.67** 
                   (0.21)          (0.21)   
1.parented           0.00            0.00   
                      (.)             (.)   
2.parented           5.63***         5.64***
                   (0.25)          (0.25)   
3.parented           7.87***         7.90***
                   (0.59)          (0.59)   
4.parented          10.24***        10.28***
                   (0.49)          (0.49)   
1.dadnssec           1.56*                  
                   (0.63)                   
2.dadnssec           2.08***                
                   (0.59)                   
3.dadnssec           1.15**                 
                   (0.45)                   
4.dadnssec           0.00                   
                      (.)                   
5.dadnssec          -3.44***                
                   (0.45)                   
6.dadnssec          -3.26***                
                   (0.42)                   
7.dadnssec          -4.66***                
                   (0.43)                   
8.dadnssec          -6.98***                
                   (0.41)                   
1.cohort             0.00                   
                      (.)                   
2.cohort            -2.00***                
                   (0.21)                   
1.nsintera~n                         2.41** 
                                   (0.92)   
2.nsintera~n                        -1.04   
                                   (0.85)   
3.nsintera~n                         1.43   
                                   (0.81)   
4.nsintera~n                         0.73   
                                   (0.80)   
5.nsintera~n                         1.51*  
                                   (0.62)   
6.nsintera~n                        -1.10   
                                   (0.61)   
7.nsintera~n                         0.00   
                                      (.)   
8.nsintera~n                        -1.90** 
                                   (0.69)   
9.nsintera~n                        -3.40***
                                   (0.61)   
10.nsinter~n                        -5.37***
                                   (0.63)   
11.nsinter~n                        -3.03***
                                   (0.58)   
12.nsinter~n                        -5.40***
                                   (0.58)   
13.nsinter~n                        -4.57***
                                   (0.57)   
14.nsinter~n                        -6.68***
                                   (0.61)   
15.nsinter~n                        -7.16***
                                   (0.54)   
16.nsinter~n                        -8.58***
                                   (0.57)   
_cons              102.65***       102.61***
                   (0.39)          (0.48)   
--------------------------------------------
N                17716.00        17716.00   
--------------------------------------------

. 
. * return to jupyter

Table S4

Model with IPW (only) and MI (only)
In [14]:
*Table S5 Model 1

use $path3\pooledNCDSBCS_v3.dta, clear

* return to jupyter
. *Table S5 Model 1

. 
. use $path3\pooledNCDSBCS_v3.dta, clear

. 
. * return to jupyter

In [15]:
regress ability male i.parented  ib2.dadnssec i.cohort if(samplenssec==0) [pweight=ipw], allbaselevels

* return to jupyter
. regress ability male i.parented  ib2.dadnssec i.cohort if(samplenssec==0) [pweight=ipw], allbaselevels
(sum of wgt is   2.1096e+04)

Linear regression                               Number of obs     =     17,716
                                                F(12, 17703)      =     248.97
                                                Prob > F          =     0.0000
                                                R-squared         =     0.1338
                                                Root MSE          =     13.723

-----------------------------------------------------------------------------------------------------------
                                          |               Robust
                                  ability |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------------------------------+----------------------------------------------------------------
                                     male |  -.6477882   .2063705    -3.14   0.002    -1.052295   -.2432817
                                          |
                                 parented |
                                       1  |          0  (base)
                                       2  |   5.651065   .2551178    22.15   0.000      5.15101    6.151121
                                       3  |   7.930292    .576153    13.76   0.000     6.800976    9.059608
                                       4  |   10.26015   .4696664    21.85   0.000      9.33956    11.18074
                                          |
                                 dadnssec |
1. Large Employers and Higher Managerial  |  -.4656737   .6546943    -0.71   0.477    -1.748939    .8175914
                  2. Higher Professional  |          0  (base)
    3. Lower managerial and professional  |  -.9255402    .492401    -1.88   0.060    -1.890694    .0396141
                         4. Intermediate  |  -2.029144   .5376404    -3.77   0.000    -3.082972    -.975316
      5. Small employers and own account  |  -5.509167   .5327768   -10.34   0.000    -6.553462   -4.464873
      6. Lower Supervisory and Technical  |  -5.326667   .5091641   -10.46   0.000    -6.324678   -4.328655
                         7. Semi-Routine  |  -6.735277   .5200197   -12.95   0.000    -7.754566   -5.715987
                              8. Routine  |  -9.037007   .5062525   -17.85   0.000    -10.02931   -8.044703
                                          |
                                   cohort |
                                 1. NCDS  |          0  (base)
                                  2. BCS  |  -2.029992   .2115781    -9.59   0.000    -2.444706   -1.615279
                                          |
                                    _cons |   104.7123   .4818961   217.29   0.000     103.7678    105.6569
-----------------------------------------------------------------------------------------------------------

. 
. * return to jupyter

In [16]:
regress ability male i.parented ib7.nsinteraction if(samplenssec==0) [pweight=ipw], allbaselevels

* return to jupyter
. regress ability male i.parented ib7.nsinteraction if(samplenssec==0) [pweight=ipw], allbaselevels
(sum of wgt is   2.1096e+04)

Linear regression                               Number of obs     =     17,716
                                                F(19, 17696)      =     157.98
                                                Prob > F          =     0.0000
                                                R-squared         =     0.1341
                                                Root MSE          =     13.723

-------------------------------------------------------------------------------
              |               Robust
      ability |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
         male |  -.6557797   .2063706    -3.18   0.001    -1.060286   -.2512732
              |
     parented |
           1  |          0  (base)
           2  |    5.65727   .2551589    22.17   0.000     5.157134    6.157406
           3  |   7.955523    .577343    13.78   0.000     6.823874    9.087172
           4  |   10.29457   .4706686    21.87   0.000     9.372012    11.21712
              |
nsinteraction |
    NCDS 1.1  |   2.394681   .8726179     2.74   0.006     .6842641    4.105097
     BCS 1.1  |  -.9899849   .8255089    -1.20   0.230    -2.608063    .6280935
    NCDS 1.2  |   1.423045   .7383901     1.93   0.054    -.0242719    2.870362
     BCS 1.2  |   .6962341   .7361712     0.95   0.344    -.7467336    2.139202
      NCDS 2  |   1.497368   .5844982     2.56   0.010     .3516939    2.643041
       BCS 2  |  -1.139992     .59768    -1.91   0.056    -2.311504    .0315194
      NCDS 3  |          0  (base)
       BCS 3  |  -1.845178   .6617521    -2.79   0.005    -3.142277    -.548079
      NCDS 4  |  -3.411276   .6080687    -5.61   0.000    -4.603151   -2.219402
       BCS 4  |  -5.404264   .6235907    -8.67   0.000    -6.626563   -4.181965
      NCDS 5  |  -3.030302    .575939    -5.26   0.000    -4.159199   -1.901405
       BCS 5  |   -5.44063   .5753243    -9.46   0.000    -6.568322   -4.312938
      NCDS 6  |  -4.557855   .5661954    -8.05   0.000    -5.667654   -3.448057
       BCS 6  |  -6.737881   .6089291   -11.07   0.000    -7.931442    -5.54432
      NCDS 7  |  -7.159155   .5433949   -13.17   0.000    -8.224262   -6.094047
       BCS 7  |  -8.603699   .5704641   -15.08   0.000    -9.721864   -7.485533
              |
        _cons |   102.5984    .474524   216.21   0.000     101.6683    103.5285
-------------------------------------------------------------------------------

. 
. * return to jupyter

In [17]:
est sto m3
fitstat

* return to jupyter
. est sto m3

. fitstat

Measures of Fit for regress of ability

Log-Lik Intercept Only:   -72802.894     Log-Lik Full Model:       -71527.330
D(17694):                 143054.660     LR(19):                     2551.128
                                         Prob > LR:                     0.000
R2:                            0.134     Adjusted R2:                   0.133
AIC:                           8.077     AIC*n:                    143098.660
BIC:                      -30032.002     BIC':                      -2365.266

. 
. * return to jupyter

In [18]:
testparm ib7.nsinteraction
esttab m3, replace cells(b(star fmt(2) label(Coef.)) se(par fmt(2))) stats(N) unstack

* return to jupyter
. testparm ib7.nsinteraction

 ( 1)  1.nsinteraction = 0
 ( 2)  2.nsinteraction = 0
 ( 3)  3.nsinteraction = 0
 ( 4)  4.nsinteraction = 0
 ( 5)  5.nsinteraction = 0
 ( 6)  6.nsinteraction = 0
 ( 7)  8.nsinteraction = 0
 ( 8)  9.nsinteraction = 0
 ( 9)  10.nsinteraction = 0
 (10)  11.nsinteraction = 0
 (11)  12.nsinteraction = 0
 (12)  13.nsinteraction = 0
 (13)  14.nsinteraction = 0
 (14)  15.nsinteraction = 0
 (15)  16.nsinteraction = 0

       F( 15, 17696) =   54.19
            Prob > F =    0.0000

. esttab m3, replace cells(b(star fmt(2) label(Coef.)) se(par fmt(2))) stats(N) unstack

----------------------------
                      (1)   
                  ability   
                 Coef./se   
----------------------------
male                -0.66** 
                   (0.21)   
1.parented           0.00   
                      (.)   
2.parented           5.66***
                   (0.26)   
3.parented           7.96***
                   (0.58)   
4.parented          10.29***
                   (0.47)   
1.nsintera~n         2.39** 
                   (0.87)   
2.nsintera~n        -0.99   
                   (0.83)   
3.nsintera~n         1.42   
                   (0.74)   
4.nsintera~n         0.70   
                   (0.74)   
5.nsintera~n         1.50*  
                   (0.58)   
6.nsintera~n        -1.14   
                   (0.60)   
7.nsintera~n         0.00   
                      (.)   
8.nsintera~n        -1.85** 
                   (0.66)   
9.nsintera~n        -3.41***
                   (0.61)   
10.nsinter~n        -5.40***
                   (0.62)   
11.nsinter~n        -3.03***
                   (0.58)   
12.nsinter~n        -5.44***
                   (0.58)   
13.nsinter~n        -4.56***
                   (0.57)   
14.nsinter~n        -6.74***
                   (0.61)   
15.nsinter~n        -7.16***
                   (0.54)   
16.nsinter~n        -8.60***
                   (0.57)   
_cons              102.60***
                   (0.47)   
----------------------------
N                17716.00   
----------------------------

. 
. * return to jupyter

In [19]:
*Table S5 Model 2

use $path3\pooledNCDSBCS_v3_imputed.dta, clear

set seed 1485

* return to jupyter
. *Table S5 Model 2

. 
. use $path3\pooledNCDSBCS_v3_imputed.dta, clear

. 
. set seed 1485

. 
. * return to jupyter

In [20]:
*drop obervations who had died by age 10/11 sweep
tab deadtestoutcome, mi
summ ability if (deadtestoutcome==1)
drop if deadtestoutcome==1

* return to jupyter
. *drop obervations who had died by age 10/11 sweep

. tab deadtestoutcome, mi

Dead at age |
      10/11 |
     survey |      Freq.     Percent        Cum.
------------+-----------------------------------
      0. No |    925,948       91.36       91.36
     1. Yes |     87,535        8.64      100.00
------------+-----------------------------------
      Total |  1,013,483      100.00

. summ ability if (deadtestoutcome==1)

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     ability |     86,100    99.38687    15.00202   28.07528   160.2226

. drop if deadtestoutcome==1
(87,535 observations deleted)

. 
. * return to jupyter

In [21]:
* Create the interaction
capture drop nsinteraction
gen nsinteraction = .
replace nsinteraction = 1 if ((dadnssec==1)&(cohort==1))
replace nsinteraction = 2 if ((dadnssec==1)&(cohort==2))
replace nsinteraction = 3 if ((dadnssec==2)&(cohort==1))
replace nsinteraction = 4 if ((dadnssec==2)&(cohort==2))
replace nsinteraction = 5 if ((dadnssec==3)&(cohort==1))
replace nsinteraction = 6 if ((dadnssec==3)&(cohort==2))
replace nsinteraction = 7 if ((dadnssec==4)&(cohort==1))
replace nsinteraction = 8 if ((dadnssec==4)&(cohort==2))
replace nsinteraction = 9 if ((dadnssec==5)&(cohort==1))
replace nsinteraction = 10 if ((dadnssec==5)&(cohort==2))
replace nsinteraction = 11 if ((dadnssec==6)&(cohort==1))
replace nsinteraction = 12 if ((dadnssec==6)&(cohort==2))
replace nsinteraction = 13 if ((dadnssec==7)&(cohort==1))
replace nsinteraction = 14 if ((dadnssec==7)&(cohort==2))
replace nsinteraction = 15 if ((dadnssec==8)&(cohort==1))
replace nsinteraction = 16 if ((dadnssec==8)&(cohort==2))
tab nsinteraction
label variable nsinteraction "NSSEC Interaction"
label define nsint 1 "NCDS 1.1" 2 "BCS 1.1" 3 "NCDS 1.2" 4 "BCS 1.2" 5 "NCDS 2" 6 "BCS 2" 7 "NCDS 3" 8 "BCS 3" 9 "NCDS 4" 10 "BCS 4" 11 "NCDS 5" 12 "BCS 5" 13 "NCDS 6" 14 "BCS 6" 15 "NCDS 7" 16 "BCS 7", replace
label values nsinteraction nsint

mi register passive nsinteraction

estimates clear

* return to jupyter
. * Create the interaction

. capture drop nsinteraction

. gen nsinteraction = .
(925,948 missing values generated)

. replace nsinteraction = 1 if ((dadnssec==1)&(cohort==1))
(13,535 real changes made)

. replace nsinteraction = 2 if ((dadnssec==1)&(cohort==2))
(22,498 real changes made)

. replace nsinteraction = 3 if ((dadnssec==2)&(cohort==1))
(18,391 real changes made)

. replace nsinteraction = 4 if ((dadnssec==2)&(cohort==2))
(27,622 real changes made)

. replace nsinteraction = 5 if ((dadnssec==3)&(cohort==1))
(47,752 real changes made)

. replace nsinteraction = 6 if ((dadnssec==3)&(cohort==2))
(69,934 real changes made)

. replace nsinteraction = 7 if ((dadnssec==4)&(cohort==1))
(39,097 real changes made)

. replace nsinteraction = 8 if ((dadnssec==4)&(cohort==2))
(39,944 real changes made)

. replace nsinteraction = 9 if ((dadnssec==5)&(cohort==1))
(51,654 real changes made)

. replace nsinteraction = 10 if ((dadnssec==5)&(cohort==2))
(66,476 real changes made)

. replace nsinteraction = 11 if ((dadnssec==6)&(cohort==1))
(70,532 real changes made)

. replace nsinteraction = 12 if ((dadnssec==6)&(cohort==2))
(79,389 real changes made)

. replace nsinteraction = 13 if ((dadnssec==7)&(cohort==1))
(77,046 real changes made)

. replace nsinteraction = 14 if ((dadnssec==7)&(cohort==2))
(73,037 real changes made)

. replace nsinteraction = 15 if ((dadnssec==8)&(cohort==1))
(110,371 real changes made)

. replace nsinteraction = 16 if ((dadnssec==8)&(cohort==2))
(107,913 real changes made)

. tab nsinteraction

nsinteracti |
         on |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |     13,535        1.48        1.48
          2 |     22,498        2.46        3.94
          3 |     18,391        2.01        5.95
          4 |     27,622        3.02        8.96
          5 |     47,752        5.22       14.18
          6 |     69,934        7.64       21.82
          7 |     39,097        4.27       26.10
          8 |     39,944        4.36       30.46
          9 |     51,654        5.64       36.10
         10 |     66,476        7.26       43.37
         11 |     70,532        7.71       51.08
         12 |     79,389        8.67       59.75
         13 |     77,046        8.42       68.17
         14 |     73,037        7.98       76.15
         15 |    110,371       12.06       88.21
         16 |    107,913       11.79      100.00
------------+-----------------------------------
      Total |    915,191      100.00

. label variable nsinteraction "NSSEC Interaction"

. label define nsint 1 "NCDS 1.1" 2 "BCS 1.1" 3 "NCDS 1.2" 4 "BCS 1.2" 5 "NCDS 2" 6 "BCS 2" 7 "NCDS 3" 8 "BCS 3" 9 "NCDS 4" 10 "BCS 4" 11 "N
> CDS 5" 12 "BCS 5" 13 "NCDS 6" 14 "BCS 6" 15 "NCDS 7" 16 "BCS 7", replace

. label values nsinteraction nsint

. 
. mi register passive nsinteraction
(system variable _mi_id updated due to changed number of obs.)

. 
. estimates clear

. 
. * return to jupyter

In [22]:
* TABLE S5: Model 2
estimates clear

mi estimate, post: regress ability male i.parented ib7.nsinteraction, allbaselevels

* return to jupyter
. * TABLE S5: Model 2

. estimates clear

. 
. mi estimate, post: regress ability male i.parented ib7.nsinteraction, allbaselevels

Multiple-imputation estimates                   Imputations       =         60
Linear regression                               Number of obs     =     32,548
                                                Average RVI       =     0.6145
                                                Largest FMI       =     0.4773
                                                Complete DF       =      32528
DF adjustment:   Small sample                   DF:     min       =     259.36
                                                        avg       =     361.08
                                                        max       =     675.68
Model F test:       Equal FMI                   F(  19, 6063.8)   =     168.96
Within VCE type:          OLS                   Prob > F          =     0.0000

-------------------------------------------------------------------------------
      ability |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
         male |  -.5792898   .1829449    -3.17   0.002    -.9384987   -.2200809
              |
     parented |
           2  |   5.841913   .2323516    25.14   0.000     5.385475    6.298352
           3  |   8.215579   .5555597    14.79   0.000     7.123194    9.307965
           4  |   10.63947   .4667382    22.80   0.000     9.722092    11.55685
              |
nsinteraction |
    NCDS 1.1  |   2.907636   .9075401     3.20   0.001     1.123164    4.692108
     BCS 1.1  |  -.6489383   .7901949    -0.82   0.412    -2.202167    .9042902
    NCDS 1.2  |   1.863414   .8200751     2.27   0.024     .2504675     3.47636
     BCS 1.2  |   .9660809   .8012698     1.21   0.229    -.6109012    2.543063
      NCDS 2  |    1.71432   .6402835     2.68   0.008     .4542301    2.974411
       BCS 2  |  -.8967187   .5858467    -1.53   0.127    -2.048396    .2549588
       BCS 3  |  -1.642699   .6991959    -2.35   0.019    -3.018973   -.2664246
      NCDS 4  |  -3.264382   .6328476    -5.16   0.000    -4.509892   -2.018872
       BCS 4  |  -5.416053   .6137982    -8.82   0.000    -6.623595   -4.208511
      NCDS 5  |  -2.933258   .5881108    -4.99   0.000    -4.090345   -1.776171
       BCS 5  |  -5.307132   .5814633    -9.13   0.000    -6.450962   -4.163302
      NCDS 6  |  -4.485968   .6058989    -7.40   0.000    -5.679075    -3.29286
       BCS 6  |  -6.726404   .5894738   -11.41   0.000    -7.885691   -5.567118
      NCDS 7  |  -7.091002   .5762464   -12.31   0.000     -8.22565   -5.956353
       BCS 7  |  -8.764152   .5622134   -15.59   0.000    -9.870086   -7.658219
              |
        _cons |   101.7585   .4936476   206.14   0.000     100.7872    102.7298
-------------------------------------------------------------------------------

. 
. * return to jupyter

In [23]:
mibeta ability male i.parented ib7.nsinteraction, allbaselevels

* return to jupyter
. mibeta ability male i.parented ib7.nsinteraction, allbaselevels

Multiple-imputation estimates                   Imputations       =         60
Linear regression                               Number of obs     =     32,548
                                                Average RVI       =     0.6145
                                                Largest FMI       =     0.4773
                                                Complete DF       =      32528
DF adjustment:   Small sample                   DF:     min       =     259.36
                                                        avg       =     361.08
                                                        max       =     675.68
Model F test:       Equal FMI                   F(  19, 6063.8)   =     168.96
Within VCE type:          OLS                   Prob > F          =     0.0000

-------------------------------------------------------------------------------
      ability |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
         male |  -.5792898   .1829449    -3.17   0.002    -.9384987   -.2200809
              |
     parented |
           2  |   5.841913   .2323516    25.14   0.000     5.385475    6.298352
           3  |   8.215579   .5555597    14.79   0.000     7.123194    9.307965
           4  |   10.63947   .4667382    22.80   0.000     9.722092    11.55685
              |
nsinteraction |
    NCDS 1.1  |   2.907636   .9075401     3.20   0.001     1.123164    4.692108
     BCS 1.1  |  -.6489383   .7901949    -0.82   0.412    -2.202167    .9042902
    NCDS 1.2  |   1.863414   .8200751     2.27   0.024     .2504675     3.47636
     BCS 1.2  |   .9660809   .8012698     1.21   0.229    -.6109012    2.543063
      NCDS 2  |    1.71432   .6402835     2.68   0.008     .4542301    2.974411
       BCS 2  |  -.8967187   .5858467    -1.53   0.127    -2.048396    .2549588
       BCS 3  |  -1.642699   .6991959    -2.35   0.019    -3.018973   -.2664246
      NCDS 4  |  -3.264382   .6328476    -5.16   0.000    -4.509892   -2.018872
       BCS 4  |  -5.416053   .6137982    -8.82   0.000    -6.623595   -4.208511
      NCDS 5  |  -2.933258   .5881108    -4.99   0.000    -4.090345   -1.776171
       BCS 5  |  -5.307132   .5814633    -9.13   0.000    -6.450962   -4.163302
      NCDS 6  |  -4.485968   .6058989    -7.40   0.000    -5.679075    -3.29286
       BCS 6  |  -6.726404   .5894738   -11.41   0.000    -7.885691   -5.567118
      NCDS 7  |  -7.091002   .5762464   -12.31   0.000     -8.22565   -5.956353
       BCS 7  |  -8.764152   .5622134   -15.59   0.000    -9.870086   -7.658219
              |
        _cons |   101.7585   .4936476   206.14   0.000     100.7872    102.7298
-------------------------------------------------------------------------------

Standardized coefficients and R-squared
Summary statistics over 60 imputations

             |       mean       min        p25     median        p75       max
-------------+----------------------------------------------------------------
        male |  -.0193603    -.0268  -.0217452  -.0193787  -.0175059    -.0102
             |
    parented |
          2  |   .1736487      .166   .1708019   .1734237    .176119      .184
          3  |   .1018874     .0913   .0994058   .1018867   .1049844       .11
          4  |   .1670908      .156   .1638327   .1671821    .169572      .178
             |
nsinteract~n |
   NCDS 1.1  |   .0242594     .0133   .0218914   .0248503   .0270079     .0341
    BCS 1.1  |  -.0064477     -.016  -.0093394  -.0063697  -.0043418    .00824
   NCDS 1.2  |   .0186347    .00605   .0153932   .0190033   .0216102     .0351
    BCS 1.2  |   .0107719   -.00212   .0068104   .0108283   .0142366     .0251
     NCDS 2  |   .0268754    .00936   .0222938   .0261917   .0313399     .0491
      BCS 2  |   -.015498    -.0317  -.0192108   -.014646  -.0114262   -.00158
      BCS 3  |  -.0221424    -.0368  -.0257876   -.023296   -.017001   -.00749
     NCDS 4  |   -.052821    -.0718   -.057195  -.0537473  -.0479207     -.036
      BCS 4  |   -.089176     -.104  -.0926978  -.0891446  -.0847819     -.076
     NCDS 5  |  -.0544292     -.067  -.0603177  -.0531146  -.0497369    -.0329
      BCS 5  |  -.0984158     -.117  -.1021296  -.0976271   -.093829    -.0862
     NCDS 6  |  -.0863433     -.107  -.0912311  -.0861199  -.0812531    -.0707
      BCS 6  |  -.1168845     -.129   -.120967  -.1166671   -.112037     -.104
     NCDS 7  |   -.158157     -.176  -.1649127  -.1569527  -.1523098     -.138
      BCS 7  |  -.1785939     -.194   -.183423   -.177937  -.1724781      -.16
-------------+----------------------------------------------------------------
    R-square |   .1389118      .134   .1373248   .1389225   .1405054      .144
Adj R-square |   .1384088      .134   .1368209   .1384195   .1400034      .144
------------------------------------------------------------------------------

. 
. * return to jupyter

Table S5


References

Atkinson, M. (2015). Millennium Cohort Study: Interpreting the CANTAB cognitive measures. London: UCL Centre for Longitudinal Studies.

Blanden, J., Goodman, A., Gregg, P., & Machin, S. (2004). Changes in Intergenerational mobility in Britain. In M. Corak (Ed.), Generational Income Mobility in North America and Europe. Cambridge: Cambridge University Press.

Blanden, J., & Gregg, P. (2004). Family income and educational attainment: a review of approaches and evidence for Britain. Oxford Review of Economic Policy, 20(2), 245-263.

Blanden, J., Gregg, P., & Machin, S. (2005). Educational Inequality and Intergenerational Mobility. In S. Machin & A. Vignoles (Eds.), What's The Good of Education? The Economics of Education In The UK. (pp. 99-114). Princeton: Princeton University Press.

Blanden, J., Gregg, P., & Macmillan, L. (2007). Accounting for intergenerational income persistence: non-cognitive skills, ability and education. Economic Journal, 117(519), 43-60.

Blanden, J., Gregg, P., & Macmillan, L. (2013). Intergenerational persistence in income and social class: the effect of within-group inequality. Journal of the Royal Statistical Society: Series A (Statistics in Society), 176(2), 541-563.

Blanden, J., & Machin, S. (2004). Educational Inequality and the Expansion of UK Higher Education. Scottish Journal of Political Economy, 51(2), 230-249.

Blanden, J., & Machin, S. (2010). Millennium Cohort Study Briefing 13: Intergenerational inequality in early years assessments. London: Institute for Education.

Bourdieu, P., & Passeron, J.-C. (1977). Reproduction in education, culture and society. London: Sage.

Bradbury, B., Corak, M., Waldfogel, J., & Washbrook, E. (2015). Too many children left behind: The US achievement gap in comparative perspective. New York: Russell Sage Foundation.

Breen, R. (Ed.). (2004). Social Mobility in Europe. Oxford: Oxford University Press.

Breen, R., & Goldthorpe, J. H. (2001). Class, Mobility and Merit: The experience of Two Birth Cohorts. European Sociological Review, 17(2), 81-101.

Breen, R., Luijkx, R., Muller, W., & Pollak, R. (2010). Long-term Trends in Educational Inequality in Europe: Class Inequalities and Gender Differences. European Sociological Review, 26 (1), 31-48.

Caldwell, T., Rodgers, B., Clark, C., Jefferis, B., Stansfeld, S., & Power, C. (2008). Lifecourse socioeconomic predictors of midlife drinking patterns, problems and abstention: findings from the 1958 British Birth Cohort Study. Drug and alcohol dependence, 95(3), 269-278.

Carpenter, J., & Kenward, M. (2012). Multiple imputation and its application. London: John Wiley & Sons.

Cheung, S. Y., & Egerton, M. (2007). Great Britain: Higher Education Expansion and Reform: Changing Educational Inequalities Stratification in Higher Education: A Comparative Study (pp. 195-219). Stanford: Stanford University Press.

Conger, R. D., & Conger, K. J. (2002). Resilience in Midwestern families: Selected findings from the first decade of a prospective, longitudinal study. Journal of Marriage and Family, 64(2), 361-373.

Connelly, R., Gayle, V., & Lambert, P. S. (2016a). A review of educational attainment measures for social survey research. Methodological Innovations, 9, 2059799116638001.

Connelly, R., Gayle, V., & Lambert, P. S. (2016b). A Review of occupation-based social classifications for social survey research. Methodological Innovations, 9, 2059799116638003.

Connelly, R., & Platt, L. (2014). Cohort profile: UK millennium Cohort study (MCS). International journal of epidemiology, 43(6), 1719-1725.

Crompton, R. (2008). Class and Stratification. Cambridge: Polity Press.

Cunha, F., & Heckman, J. (2009). The Economics and Psychology of Inequality and Human Development NBER Working Paper No. 14695. Cambridge: National Bureau of Economic Research.

Deary, I., Strand, S., Smith, P., & Fernandes, C. (2007). Intelligence and educational achievement. Intelligence, 35, 13-21.

Deary, I. J., Spinath, F. M., & Bates, T. C. (2006). Genetics of intelligence. European journal of human genetics: EJHG, 14(6), 690.

Dickerson, A., & Popli, G. (2012). Persistent Poverty and Children's Cognitive Development CLS Working Paper 2012/2. London: Centre for Longitudinal Studies.

Dickerson, A., & Popli, G. (2016). Persistent poverty and children's cognitive development: Evidence from the UK Millennium Cohort Study. Journal of the Royal Statistical Society: Series A (Statistics in Society), 179(2), 534-558.

Diggle, P. J. (2015). Statistics: a data science for the 21st century. Journal of the Royal Statistical Society: Series A (Statistics in Society), 178(4), 793-813.

Duncan, G., Yeung, W., J., B.-G., & Smith, J. (1998). How much does childhood poverty affect the lifechances of children? American Sociological Review, 63(3), 406-423.

Elder, G. H. (1994). Families in troubled times: Adapting to change in rural America. awthorne, NY: De Gruyter Aldine.

Elliott, C., Murray, D., & Pearson, L. (1978). British Ability Scales. London: National Foundation for Educational Research.

Elliott, J., & Shepherd, P. (2006). Cohort profile: 1970 British birth cohort (BCS70). International journal of epidemiology, 35(4), 836-843.

Enders, C. K. (2010). Applied Missing Data Analysis. London: Guilford Press.

Erikson, R., & Goldthorpe, J. H. (1992). The Constant Flux: A Study of Class Mobility in Industrial Societies. Oxford: Clarendon.

Erikson, R., Goldthorpe, J. H., Jackson, M., Yaish, M., & Cox, D. R. (2005). On class differentials in educational attainment. Proceedings of the National Academy of Sciences of the United States of America, 102(27), 9730-9733.

Ermisch, J. (2008). Origins of social immobility and inequality: Parenting and early child development. National Institute Economic Review, 205(1), 62-71.

Feinstein, L. (2003). Inequality in the early cognitive development of British children in the 1970 cohort. Economica, 70(277), 73-97.

Firth, D. (2003). Overcoming the Reference Category Problem in the Presentation of Statistical Models. Sociological Methodology, 33(1), 1-18.

Flynn, J. R. (2012). Are We Getting Smarter? Rising IQ in the Twenty-First Century. Cambridge: Cambridge University Press.

Gayle, V., & Lambert, P. (2007). Using Quasi-Variance To Communicate Sociological Results From Statistical Models. Sociology, 41(6), 1191-1208.

Goisis, A., Özcan, B., & Myrskylä, M. (2017). Decline in the negative association between low birth weight and cognitive ability. Proceedings of the National Academy of Sciences, 114(1), 84-88.

Goldthorpe, J., & Jackson, M. (2007). Intergenerational Class Mobility in Contemporary Britain: Political Concerns And Empirical Findings. The British Journal of Sociology, 58(4), 525-546.

Goldthorpe, J., & McKnight, A. (2006). The Economic Basis of Social Class. In S. L. Morgan, D. B. Grusky & G. S. Fields (Eds.), Mobility and Inequality (pp. 109-136). Stanford: Stanford University Press.

Goldthorpe, J. H. (2016). Social class mobility in modern Britain: changing structure, constant process. Journal of the British Academy, 4, 89-111.

Goodman, A., & Gregg, P. (2010). Poorer children's educational attainment: how important are attitudes and behaviour. London: Joseph Rowntree Foundation.

Gottfried, A., Gottfried, A., Bathurst, K., Guerin, D., & Parramore, M. (2003). Socioeconomic status in children's development and family environment: infancy through adolescence. In M. Bornstein & R. Bradley (Eds.), Socioeconomic status, parenting and child development (Vol. 189-207). Mahwah: Lawrence Erlbaum.

Gregg, P. (2012). Occupational Coding for the National Child Development Study (1969, 1991-2008) and the 1970 British Cohort Study (1980, 2000-2008). [data collection]. SN: 7023. Colchester: UK Data Archive.

Hawkes, D., & Plewis, I. (2006). Modelling non‐response in the national child development study. Journal of the Royal Statistical Society: Series A (Statistics in Society), 169(3), 479-491.

Hill, W. D., Davies, G., Van De Lagemaat, L. N., Christoforou, A., Marioni, R., Fernandes, C., . . . Craig, L. C. (2014). Human cognitive ability is influenced by genetic variation in components of postsynaptic signalling complexes assembled by NMDA receptors and MAGUK proteins. Translational psychiatry, 4(1), e341.

Höfler, M., Pfister, H., Lieb, R., & Wittchen, H. (2005). The use of weights to account for non-response and drop-out. Social psychiatry and psychiatric epidemiology, 40(4), 291-299.

Kiernan, K., & Mensah, F. K. (2011). Poverty, family resources and children's educational attainment: The mediating role of parenting. British Educational Research Journal, 37(2), 317-336.

King, G. (1995). Replication, replication. PS: Political Science & Politics, 28(3), 444-452.

King, G. (2003). The future of replication. International Studies Perspectives, 4, 100–105.

Kluyver, T., Ragan-Kelley, B., Pérez, F., Granger, B. E., Bussonnier, M., Frederic, J., . . . Corlay, S. (2016). Jupyter Notebooks-a publishing format for reproducible computational workflows. Paper presented at the ELPUB.

Knuth, D. E. (1992). Literate Programming. Stanford: Stanford University Press. Lareau, A. (2011). Unequal childhoods: Class, race, and family life: University of California Pr.

Lawlor, D. A., Batty, G. D., Morton, S., Deary, I., Macintyre, S., Ronalds, G., & Leon, D. A. (2005). Early life predictors of childhood intelligence: evidence from the Aberdeen children of the 1950s study. Journal of Epidemiology and Community Health, 59(8), 656-663.

Layte, R. (2017). Why Do Working-Class Kids Do Worse in School? An Empirical Test of Two Theories of Educational Disadvantage. European Sociological Review, Online Preview DOI: https://doi.org/10.1093/esr/jcx054.

Little, R., & Rubin, D. (2014). Statistical Analysis with Missing Data. Hoboken: John Wiley & Sons.

Machin, S., & Vignoles, A. (2004). Educational Inequality: The Widening Socio-Economic Gap. Fiscal Studies, 25(2), 22.

McCulloch, A., & Joshi, H. (2001). Neighbourhood and family influences on the cognitive ability of children in the British National Child Development Study. Social Science and Medicine, 53(5), 579-591.

McDermott, P. A., Fantuzzo, J. W., & Glutting, J. J. (1990). Just say no to subtest analysis: A critique on Wechsler theory and practice. Journal of Psychoeducational Assessment, 8(3), 290-302.

Menard, S. (2002). Applied logistic regression analysis (Vol. 106): Sage. Mostafa, T., & Wiggins, R. (2014). Handling attrition and non-response in the 1970 British Cohort Study. London: Centre for Longitudinal Studies.

Mostafa, T., & Wiggins, R. (2015). The impact of attrition and non-response in birth cohort studies: a need to incorporate missingness strategies. Longitudinal and Life Course Studies, 6(2), 131-146.

Must, O., te Nijenhuis, J., Must, A., & van Vianen, A. E. M. (2009). Comparability of IQ scores over time. Intelligence, 37(1), 25-33.

Nature [Editorial]. (2016). Reality check on reproducibility. Nature, 533, 437. Neisser, U., Boodoo, G., Bouchard, T., Boykin, A., Brody, N., Ceci, S., . . . Urbina, S. (1995). Intelligence: Knowns And Unknowns. American Psychologist, 51(2), 77-101.

Nettle, D. (2003). Intelligence and Class Mobility in the British Population. British Journal Of Psychology, 94(4), 551-561.

Parcel, T. L., & Menaghan, E. G. (1994). Parents' jobs and children's lives. New York: Aldine De Gruyter.

Parsons, S. (2014). Childhood Cognition in the 1970 British Cohort Study. London: Centre for Longitudinal Studies.

Plewis, I., Calderwood, L., Hawkes, D., & Nathan, G. (2004). Changes in the NCDS and BCS70 populations and samples over time. London: Centre for Longitudinal Studies.

Power, C., & Elliott, J. (2006). Cohort profile: 1958 British birth cohort (national child development study). International journal of epidemiology, 35(1), 34-41.

Rose, D., & Pevalin, D. J. (2003). The NS-SEC Explained. In D. Rose & D. J. Pevalin (Eds.), A Researcher's Guide to the National Statistics Socio-economic Classification (pp. 28-43). London: Sage.

Rose, D., & Pevalin, D. J. (2005). The National Statistics Socio-Economic Classification: Origins, Development and Use. Colchester: University of Essex.

Schoon, I. (2010). Childhood cognitive ability and adult academic attainment: Evidence from three British cohort studies. Longitudinal and Life Course Studies, 1(3), 241-158.

Schoon, I., Jones, E., Cheng, H., & Maughan, B. (2010). Resilience in children's development. In K. Hansen, H. Joshi & S. Dex (Eds.), Children of the 21st Century: The first five years. Bristol: Policy Press.

Schoon, I., Jones, E., Cheng, H., & Maughan, B. (2011). Family hardship, family instability and cognitive development. Journal of Epidemiology and Community Health, 643(1), 239-266.

Seaman, S., White, I., Copas, A., & Li, L. (2012). Combining multiple imputation and inverse‐probability weighting. Biometrics, 68(1), 129-137.

Shavit, Y., & Blossfeld, H. (1991). Persistent Inequality: Changing Educational Attainment in Thirteen Countries. Boulder, Colorado: Westview Press.

Shavit, Y., Yaisch, M., & Bar-Haim, E. (2007). The Persistence Of Persistent Inequality. In S. Scherer, R. Pollack, G. Otte & M. Gangl (Eds.), From origin to destination: Trends and mechanisms in social stratification research. Frankfurt: Campus Verlag.

Shenkin, S., Starr, J., Pattie, A., Rush, M., Whalley, L., & Deary, I. (2001). Birth weight and cognitive function at age 11 years: the Scottish Mental Survey 1932. Archives of disease in childhood, 85(3), 189-196.

Shepherd, P. (2012). 1958 National Child Development Study User Guide: Measures of Ability At ages 7 to 16. London: Centre for Longitudinal Studies, University of London.

Stansfeld, S. A., Clark, C., Caldwell, T., Rodgers, B., & Power, C. (2008). Psychosocial work characteristics and anxiety and depressive disorders in midlife: the effects of prior psychological distress. Occupational and Environmental Medicine, 65(9), 634-642.

Sternberg, R., Grigorenko, E., & Bundy, D. (2001). The predictive value of IQ. Merrill-Palmer Quarterly, 47(1), 1-41.

Strand, S., Deary, I., & Smith, P. (2006). Sex differences in cognitive abilities test scores: A UK national picture. British Journal of Educational Psychology, 76(3), 463-480.

Sullivan, A., Ketende, S., & Joshi, H. (2013). Social Class and Inequalities in Early Cognitive Scores. Sociology, 47(6), 1187-1206.

Sullivan, T. R., Salter, A. B., Ryan, P., & Lee, K. J. (2015). Bias and precision of the “multiple imputation, then deletion” method for dealing with missing outcome data. American journal of epidemiology, 182(6), 528-534.

Tampubolon, G., & Savage, M. (2012). Intergenerational and Intragenerational Social Mobility in Britain. In P. S. Lambert, R. Connelly, M. Blackburn & V. Gayle (Eds.), Social Stratification: Trends and Processes (pp. 115-131). Aldershot: Ashgate.

Teasdale, T. W., & Owen, D. R. (2008). Secular declines in cognitive test scores: A reversal of the Flynn Effect. Intelligence, 36(2), 121-126.

Tucker-Drob, E. M., Briley, D. A., & Harden, K. P. (2013). Genetic and environmental influences on cognition across development and context. Current directions in psychological science, 22(5), 349-355.

University of London. (2013). 1970 British Cohort Study: Birth and 22-Month Subsample, 1970-1972. [data collection]. 3rd Edition. SN2666.: UK Data Service.

University of London. (2014). National Child Development Study: Childhood Data, Sweeps 0-3, 1958-1974. [data collection]. 3rd Edition. SN: 5565. In [Original Data Producer(s)], National Birthday Trust Fund & National Children's Bureau (Eds.): UK Data Service.

University of London. (2015a). Millennium Cohort Study: Fifth Survey, 2012. [data collection]. 2nd Edition. SN7464.: UK Data Service.

University of London. (2015b). National Child Development Study Response and Outcomes Dataset, 1958-2013. [data collection]. 5th Edition. SN: 5560. In U. D. Service. (Ed.): UK Data Service.

University of London. (2016a). 1970 British Cohort Study: Five-Year Follow-Up, 1975. [data collection]. 5th Edition. SN2699.: UK Data Service.

University of London. (2016b). 1970 British Cohort Study: Ten-Year Follow-Up, 1980. [data collection]. 6th Edition. SN3723.: UK Data Service.

Van der Sluis, S., Posthuma, D., Dolan, C., de Geus, E., Colom, R., & Boomsma, D. (2006). Sex differences on the Dutch WAIS-III. Intelligence, 34(3), 273-289. Vanhanen, T. (2011). IQ and international wellbeing indexes. The Journal of Social, Political, and Economic Studies, 36(1), 80.

Vincent, C., & Ball, S. J. (2007). 'Making up' the middle-class child: Families, activities and class dispositions. Sociology, 41(6), 1061-1077.

Von Hippel, P. T. (2007). Regression with missing Ys: An improved strategy for analyzing multiply imputed data. Sociological Methodology, 37(1), 83-117.

Washbrook, E. (2011). Early Environments and Child Outcomes: An Analysis Commission for the Independent Review on Life Chances. Bristol: Centre for Market and Public Organization, University of Bristol.