This notebook can be used to subset the 2012 medicare provider utilization and payment data by state.
The raw data are available here:
The files are downloaded in a zip archive. After extracting the files, compress the main data file. We used gzip here, if you compress it in a different way you will need to edit some of the code below.
We will use these modules from the standard library.
import gzip import os import csv
Choose a state to subset.
state = "FL"
This should be the name of the data file downloaded from the CMS web site, edit if needed.
fname = "Medicare-Physician-and-Other-Supplier-PUF-CY2012.txt.gz"
Set up a reader for a tab-delimited file. If you compressed the file using something other than gzip you will need to edit this cell to use the corresponding compressed file reader.
fid = gzip.open(fname, 'rt') inp = csv.reader(fid, delimiter="\t")
Set up a writer for the state subset file to be created.
oname = state + "-subset.csv.gz" oid = gzip.open(oname, "wt") out = csv.writer(oid)
Always include the header.
head = next(inp) out.writerow(head)
Read the rest of the file and write the selected records.
for line in inp: if line == state: out.writerow(line)