Medicare payment database - subset by state

This notebook can be used to subset the 2012 medicare provider utilization and payment data by state.

The raw data are available here:

The files are downloaded in a zip archive. After extracting the files, compress the main data file. We used gzip here, if you compress it in a different way you will need to edit some of the code below.

We will use these modules from the standard library.

In [ ]:
import gzip
import os
import csv

Choose a state to subset.

In [ ]:
state = "FL"

This should be the name of the data file downloaded from the CMS web site, edit if needed.

In [ ]:
fname = "Medicare-Physician-and-Other-Supplier-PUF-CY2012.txt.gz"

Set up a reader for a tab-delimited file. If you compressed the file using something other than gzip you will need to edit this cell to use the corresponding compressed file reader.

In [ ]:
fid =, 'rt')
inp = csv.reader(fid, delimiter="\t")

Set up a writer for the state subset file to be created.

In [ ]:
oname = state + "-subset.csv.gz"
oid =, "wt")
out = csv.writer(oid)

Always include the header.

In [ ]:
head = next(inp)

Read the rest of the file and write the selected records.

In [ ]:
for line in inp:
    if line[11] == state:

Clean up.

In [ ]: