Medicare payment database - subset by state

This notebook can be used to subset the 2012 medicare provider utilization and payment data by state.

The raw data are available here:

http://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Physician-and-Other-Supplier.html

The files are downloaded in a zip archive. After extracting the files, compress the main data file. We used gzip here, if you compress it in a different way you will need to edit some of the code below.

We will use these modules from the standard library.

In [ ]:
import gzip
import os
import csv

Choose a state to subset.

In [ ]:
state = "FL"

This should be the name of the data file downloaded from the CMS web site, edit if needed.

In [ ]:
fname = "Medicare-Physician-and-Other-Supplier-PUF-CY2012.txt.gz"

Set up a reader for a tab-delimited file. If you compressed the file using something other than gzip you will need to edit this cell to use the corresponding compressed file reader.

In [ ]:
fid = gzip.open(fname, 'rt')
inp = csv.reader(fid, delimiter="\t")

Set up a writer for the state subset file to be created.

In [ ]:
oname = state + "-subset.csv.gz"
oid = gzip.open(oname, "wt")
out = csv.writer(oid)

Always include the header.

In [ ]:
head = next(inp)
out.writerow(head)

Read the rest of the file and write the selected records.

In [ ]:
for line in inp:
    if line[11] == state:
        out.writerow(line)

Clean up.

In [ ]:
oid.close()
fid.close()