This markdown contains the example code for the blog post How to read and write Stata files in R. The code is, of course, more thouroughly explained in that blog post.
Note, the difference between this example, and the code in the blog post is as that "RScript" is removed frome the file.path() function. This is done, because when running the Jupyter notebook will get the path of the .ipynp file. That is, if the file is in "RScripts" already we only need tell R where the data files are (i.e., in "Data"). Remember to change the file path so it corresponds to where the .dta files are. If we run the code, as a script, in RStudio, for instance, we need the "RScript" to be the second argument. E.g., dtafile <- file.path(getwd(), "RScript", "Data", "FifthDayData.dta")
.
First, we need the packages. This first code chunk will install them if we don't have them installed. Note, that we could just install tidyverse and get all the above packages.
list.of.packages <- c("haven", "readr", "readxl", "dplyr")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
Note, if you have cloned the Git repository or downloaded the files from the git repository and have the example data in the same folder as in the repo you can do as follows to get a running script:
data.path <- file.path(getwd(), "..", "SimData")
library(haven)
Remember to change to the path where the .dta file is.
dtafile <- file.path(data.path, "FifthDayData.dta")
## Load the .dta file
fifthD.df <- read_dta(dtafile)
## Get the first five rows
head(fifthD.df)
index | ID | Name | Day | Age | Response | Gender |
---|---|---|---|---|---|---|
<dbl> | <dbl> | <chr> | <chr> | <dbl> | <dbl> | <dbl> |
0 | 1 | John | Fifth | 23 | 0.4537330 | 0 |
1 | 2 | Billie | Fifth | 22 | 0.2573597 | 0 |
2 | 3 | Robert | Fifth | 20 | 0.4433932 | 0 |
3 | 4 | Don | Fifth | 27 | 0.4235921 | 0 |
4 | 5 | Joseph | Fifth | 21 | 0.5713554 | 0 |
5 | 6 | James | Fifth | 25 | 0.5577922 | 0 |
Now, here we will read the Stata file from a URL:
url = "http://www.principlesofeconometrics.com/stata/broiler.dta"
data.df <- read_dta(url)
head(data.df)
year | q | y | pchick | pbeef | pcor | pf | cpi | qproda | pop | meatex | time |
---|---|---|---|---|---|---|---|---|---|---|---|
<dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> |
1950 | 14.3 | 7863 | 69.5 | 31.2 | 59.8 | NA | 24.1 | 2628500 | 151.684 | NA | 41 |
1951 | 15.1 | 7953 | 72.9 | 36.5 | 72.1 | NA | 26.0 | 2843000 | 154.287 | NA | 42 |
1952 | 15.3 | 8071 | 73.1 | 36.2 | 71.3 | NA | 26.5 | 2851200 | 156.954 | NA | 43 |
1953 | 15.2 | 8319 | 71.3 | 28.5 | 62.7 | NA | 26.7 | 2953900 | 159.565 | NA | 44 |
1954 | 15.8 | 8276 | 64.4 | 27.4 | 63.4 | NA | 26.9 | 3099700 | 162.391 | NA | 45 |
1955 | 14.7 | 8675 | 67.0 | 27.1 | 56.1 | NA | 26.8 | 2958100 | 165.275 | NA | 46 |
data.df <- read_dta(url, col_select="pbeef")
head(data.df)
pbeef |
---|
<dbl> |
31.2 |
36.5 |
36.2 |
28.5 |
27.4 |
27.1 |
cols <- c("year", "pbeef", "q", "pop")
data.df <- read_dta(url, col_select=all_of(cols))
head(data.df)
year | q | pbeef | pop |
---|---|---|---|
<dbl> | <dbl> | <dbl> | <dbl> |
1950 | 14.3 | 31.2 | 151.684 |
1951 | 15.1 | 36.5 | 154.287 |
1952 | 15.3 | 36.2 | 156.954 |
1953 | 15.2 | 28.5 | 159.565 |
1954 | 15.8 | 27.4 | 162.391 |
1955 | 14.7 | 27.1 | 165.275 |
library(haven);library(dplyr)
## Dta file:
dtafile <- file.path(data.path, "FifthDayData.dta")
dta.df <- read_dta(dtafile)
newdta.df <- select(dta.df, -c(index, Day))
write_dta(newdta.df, file.path(data.path, "NewFifthDayData.dta"))
Here we use readr and read_csv to read a CSV file and then write it to a .dta file:
library(readr)
csvfile <- file.path(data.path, "FirstDayData.csv")
data.df <- read_csv(csvfile)
## Saving it as a dta
write_dta(data.df, file.path(data.path, "FirstDayData.dta"))
Parsed with column specification: cols( ID = col_double(), Name = col_character(), Day = col_character(), Age = col_double(), Response = col_double(), Gender = col_double() )
Here we use readxl and read_excel to read a CSV file and then write it to a .dta file:
library(readxl)
csvfile <- file.path(data.path, "example_concat.xlsx")
data.df <- read_excel(csvfile)
## Saving it as a dta
write_dta(data.df, file.path(data.path, "play_data2.dta"))