This markdown contains the example code for the blog post How to read and write Stata files in R. The code is, of course, more thouroughly explained in that blog post.

Note, the difference between this example, and the code in the blog post is as that "RScript" is removed frome the file.path() function. This is done, because when running the Jupyter notebook will get the path of the .ipynp file. That is, if the file is in "RScripts" already we only need tell R where the data files are (i.e., in "Data"). Remember to change the file path so it corresponds to where the .dta files are. If we run the code, as a script, in RStudio, for instance, we need the "RScript" to be the second argument. E.g., dtafile <- file.path(getwd(), "RScript", "Data", "FifthDayData.dta").

Install Haven if missing¶

First, we need the packages. This first code chunk will install them if we don't have them installed. Note, that we could just install tidyverse and get all the above packages.

In [11]:

list.of.packages <- c("haven", "readr", "readxl", "dplyr")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)

Path¶

Note, if you have cloned the Git repository or downloaded the files from the git repository and have the example data in the same folder as in the repo you can do as follows to get a running script:

In [12]:

data.path <- file.path(getwd(), "..", "SimData")

Load Haven:¶

In [13]:

library(haven)

Load a .dta file:¶

Remember to change to the path where the .dta file is.

In [14]:

dtafile <- file.path(data.path, "FifthDayData.dta")

## Load the .dta file

fifthD.df <- read_dta(dtafile)

## Get the first five rows
head(fifthD.df)

A tibble: 6 × 7
index	ID	Name	Day	Age	Response	Gender
<dbl>	<dbl>	<chr>	<chr>	<dbl>	<dbl>	<dbl>
0	1	John	Fifth	23	0.4537330	0
1	2	Billie	Fifth	22	0.2573597	0
2	3	Robert	Fifth	20	0.4433932	0
3	4	Don	Fifth	27	0.4235921	0
4	5	Joseph	Fifth	21	0.5713554	0
5	6	James	Fifth	25	0.5577922	0

Load a .dta file from URL:¶

Now, here we will read the Stata file from a URL:

In [15]:

url = "http://www.principlesofeconometrics.com/stata/broiler.dta"

data.df <- read_dta(url)

head(data.df)

A tibble: 6 × 12
year	q	y	pchick	pbeef	pcor	pf	cpi	qproda	pop	meatex	time
<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>
1950	14.3	7863	69.5	31.2	59.8	NA	24.1	2628500	151.684	NA	41
1951	15.1	7953	72.9	36.5	72.1	NA	26.0	2843000	154.287	NA	42
1952	15.3	8071	73.1	36.2	71.3	NA	26.5	2851200	156.954	NA	43
1953	15.2	8319	71.3	28.5	62.7	NA	26.7	2953900	159.565	NA	44
1954	15.8	8276	64.4	27.4	63.4	NA	26.9	3099700	162.391	NA	45
1955	14.7	8675	67.0	27.1	56.1	NA	26.8	2958100	165.275	NA	46

Read a specific column¶

In [16]:

data.df <- read_dta(url, col_select="pbeef")

head(data.df)

A tibble: 6 × 1
pbeef
<dbl>
31.2
36.5
36.2
28.5
27.4
27.1

Read multiple columns¶

In [17]:

cols <- c("year", "pbeef", "q", "pop")

data.df <- read_dta(url, col_select=all_of(cols))

head(data.df)

A tibble: 6 × 4
year	q	pbeef	pop
<dbl>	<dbl>	<dbl>	<dbl>
1950	14.3	31.2	151.684
1951	15.1	36.5	154.287
1952	15.3	36.2	156.954
1953	15.2	28.5	159.565
1954	15.8	27.4	162.391
1955	14.7	27.1	165.275

Read multiple columns¶

In [18]:

library(haven);library(dplyr)

## Dta file:
dtafile  <-  file.path(data.path, "FifthDayData.dta")

dta.df <- read_dta(dtafile)

newdta.df <- select(dta.df, -c(index, Day))

write_dta(newdta.df, file.path(data.path, "NewFifthDayData.dta"))

Read a CSV File and Write a .dta file¶

Here we use readr and read_csv to read a CSV file and then write it to a .dta file:

In [19]:

library(readr)

csvfile <- file.path(data.path, "FirstDayData.csv") 

data.df <- read_csv(csvfile)

## Saving it as a dta

write_dta(data.df, file.path(data.path, "FirstDayData.dta"))

Parsed with column specification:
cols(
  ID = col_double(),
  Name = col_character(),
  Day = col_character(),
  Age = col_double(),
  Response = col_double(),
  Gender = col_double()
)

Read a Excel File and Write a .dta file¶

Here we use readxl and read_excel to read a CSV file and then write it to a .dta file:

In [20]:

library(readxl)

csvfile <- file.path(data.path, "example_concat.xlsx") 

data.df <- read_excel(csvfile)

## Saving it as a dta

write_dta(data.df, file.path(data.path,  "play_data2.dta"))

In [ ]: