Medicare payment data - provider types and service types

Each record in the Medicare payment database summarizes a single type of service provided by a single provider. Each provider belongs to a "provider type" category. There are around 2700 different service types and around 80 different provider types.

A provider of a given type can engage in various services. In this notebook, we explore the relationships between provider types and service types.

Since the provider type and service type are nominal variables, the main focus here will be on the large contingency table that can be formed by counting the number of services of a given type that are provided by all providers of a given type.

We will use these libraries:

In [1]:
import pandas as pd
import numpy as np

Next we read in the data file. This is the payment data for one state. We can check the size of the data set and the variable names (columns) to get a sense for what we have.

In [2]:
data = pd.read_csv("MI-subset.csv.gz", compression="gzip")
print(data.shape)
print(data.columns)
(339456, 27)
Index([u'npi', u'nppes_provider_last_org_name', u'nppes_provider_first_name', u'nppes_provider_mi', u'nppes_credentials', u'nppes_provider_gender', u'nppes_entity_code', u'nppes_provider_street1', u'nppes_provider_street2', u'nppes_provider_city', u'nppes_provider_zip', u'nppes_provider_state', u'nppes_provider_country', u'provider_type', u'medicare_participation_indicator', u'place_of_service', u'hcpcs_code', u'hcpcs_description', u'line_srvc_cnt', u'bene_unique_cnt', u'bene_day_srvc_cnt', u'average_Medicare_allowed_amt', u'stdev_Medicare_allowed_amt', u'average_submitted_chrg_amt', u'stdev_submitted_chrg_amt', u'average_Medicare_payment_amt', u'stdev_Medicare_payment_amt'], dtype='object')

Here is a simple way to produce a contingency table, however as discussed further below, it is not quite right for our purposes:

In [3]:
tab = pd.crosstab(data["hcpcs_description"], data["provider_type"])
print tab.shape
tab.iloc[0:3,0:3]
(2754, 83)
Out[3]:
provider_type Addiction Medicine All Other Suppliers Allergy/Immunology
hcpcs_description
3d render w/o postprocess 0 0 0
3d rendering w/postprocess 0 0 0
5% dextrose/normal saline 0 0 0

The difficulty is that each row in the data set reflects multiple instances in which a provider provided services to a patient (the exact number is given in the line_srvc_cnt variable). The contingency table above counts the number of rows of the data set with a given combination of provider type and service type. We want to weight these counts by the line_srvc_cnt variable. This is done by the code in the next cell:

In [4]:
tab = data.groupby(["hcpcs_description", "provider_type"]).agg({"line_srvc_cnt": np.sum})
tab.head()
Out[4]:
line_srvc_cnt
hcpcs_description provider_type
3d render w/o postprocess Cardiology 265
Diagnostic Radiology 4859
General Practice 62
Infectious Disease 38
Internal Medicine 284

What we have now is a contingency table represented as a Series object with a hierarchical index. To convert it to an actual table, we use unstack, which takes the second level of the hierarchical row index and moves it to become the column index.

In [5]:
tab = tab.unstack()
tab.head()
Out[5]:
line_srvc_cnt
provider_type Addiction Medicine All Other Suppliers Allergy/Immunology Ambulance Service Supplier Ambulatory Surgical Center Anesthesiologist Assistants Anesthesiology Audiologist (billing independently) CRNA Cardiac Electrophysiology ... Radiation Therapy Registered Dietician/Nutrition Professional Rheumatology Speech Language Pathologist Sports Medicine Surgical Oncology Thoracic Surgery Unknown Supplier/Provider Urology Vascular Surgery
hcpcs_description
3d render w/o postprocess NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN 12 NaN
3d rendering w/postprocess NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN 214 NaN
5% dextrose/normal saline NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5% dextrose/water NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
ALS1-emergency NaN NaN NaN 209322 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 83 columns

A minor issue is that unstack creates a hierarchical column index with the name line_srvc_cnt as the first level for all columns. Since line_srvc_cnt is the only variable being aggregated here, we can drop this and convert the hierarchical column index to a regular column index.

In [6]:
tab.columns = tab.columns.get_level_values(1)
tab.head()
Out[6]:
provider_type Addiction Medicine All Other Suppliers Allergy/Immunology Ambulance Service Supplier Ambulatory Surgical Center Anesthesiologist Assistants Anesthesiology Audiologist (billing independently) CRNA Cardiac Electrophysiology ... Radiation Therapy Registered Dietician/Nutrition Professional Rheumatology Speech Language Pathologist Sports Medicine Surgical Oncology Thoracic Surgery Unknown Supplier/Provider Urology Vascular Surgery
hcpcs_description
3d render w/o postprocess NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN 12 NaN
3d rendering w/postprocess NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN 214 NaN
5% dextrose/normal saline NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5% dextrose/water NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
ALS1-emergency NaN NaN NaN 209322 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 83 columns

Since some service type by provider type combinations don't appear in the data, they are counted as NaN in the data file. Here we replace them with zero.

In [7]:
tab = tab.replace(np.nan, 0)
tab.head()
Out[7]:
provider_type Addiction Medicine All Other Suppliers Allergy/Immunology Ambulance Service Supplier Ambulatory Surgical Center Anesthesiologist Assistants Anesthesiology Audiologist (billing independently) CRNA Cardiac Electrophysiology ... Radiation Therapy Registered Dietician/Nutrition Professional Rheumatology Speech Language Pathologist Sports Medicine Surgical Oncology Thoracic Surgery Unknown Supplier/Provider Urology Vascular Surgery
hcpcs_description
3d render w/o postprocess 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 12 0
3d rendering w/postprocess 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 214 0
5% dextrose/normal saline 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
5% dextrose/water 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
ALS1-emergency 0 0 0 209322 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 rows × 83 columns

Now we're ready to explore the relationship between provider types and service types. First, we look at the number of distinct services that are performed at least once by providers of a given type.

In [8]:
sum_ptype = (tab > 0).sum(0)
print(len(sum_ptype))
print(sum_ptype.head())
83
provider_type
Addiction Medicine             33
All Other Suppliers             6
Allergy/Immunology             56
Ambulance Service Supplier     14
Ambulatory Surgical Center    212
dtype: int64
In [9]:
sum_ptype.plot(kind='hist')
plt.xlabel("Number of services provided", size=15)
plt.ylabel("Number of provider types", size=15)
Out[9]:
<matplotlib.text.Text at 0x7f70913a9fd0>

Here are the provider types that perform the greatest number of different types of services.

In [10]:
print(sum_ptype[sum_ptype > 400])
provider_type
Clinical Laboratory     549
Diagnostic Radiology    503
Family Practice         529
General Practice        409
General Surgery         401
Internal Medicine       715
Physician Assistant     488
dtype: int64

Here are the provider types that perform the fewest distinct service types.

In [11]:
print(sum_ptype[sum_ptype < 10])
provider_type
All Other Suppliers                            6
Anesthesiologist Assistants                    8
Chiropractic                                   1
Geriatric Psychiatry                           4
Licensed Clinical Social Worker                9
Multispecialty Clinic/Group Practice           4
Psychologist (billing independently)           1
Public Health Welfare Agency                   2
Registered Dietician/Nutrition Professional    6
Speech Language Pathologist                    7
Unknown Supplier/Provider                      7
dtype: int64

Two of the provider types only provide one service type, let's see what that is:

In [12]:
for svc in sum_ptype[sum_ptype == 1].index:
    tab1 = tab.loc[:, svc].copy()
    tab1.sort(ascending=False)
    print(svc)
    print(tab1[0:3])
    print
Chiropractic
hcpcs_description
Chiropractic manipulation    1206635
bls                                0
Dx bronchoscope/wash               0
Name: Chiropractic, dtype: float64

Psychologist (billing independently)
hcpcs_description
Neuropsych tst by psych/phys    291
bls                               0
Dx bronchoscope/lavage            0
Name: Psychologist (billing independently), dtype: float64

Now we perform the same analysis on the service types. For each service type, we count the number of distinct provider types who provided that service at least once.

In [13]:
sum_stype = (tab > 0).sum(1)
print(sum_stype.head())
sum_stype.plot(kind='hist')
plt.xlabel("Number of providers that provide service", size=15)
plt.ylabel("Number of services", size=15)
hcpcs_description
3d render w/o postprocess     7
3d rendering w/postprocess    9
5% dextrose/normal saline     3
5% dextrose/water             5
ALS1-emergency                1
dtype: int64
Out[13]:
<matplotlib.text.Text at 0x7f70913a90d0>

Here are the service types that are provided by many different provider types:

In [14]:
print(sum_stype[sum_stype > 30])
hcpcs_description
Admin influenza virus vac       38
Behav chng smoking 3-10 min     32
Dexamethasone sodium phos       31
Echo guide for biopsy           32
Electrocardiogram complete      35
Extracranial study              32
Hospital discharge day          38
Initial hospital care           52
Initial observation care        31
MD certification HHA patient    34
Nursing fac care subseq         31
Office/outpatient visit est     61
Office/outpatient visit new     61
Routine venipuncture            42
Subsequent hospital care        53
Ther/proph/diag inj sc/im       40
Vitamin b12 injection           31
dtype: int64

Here are the service types that are provided by only a single provider type:

In [15]:
print(sum_stype[sum_stype == 1])
hcpcs_description
ALS1-emergency                 1
AbobotulinumtoxinA             1
Abrasion lesion single         1
Acetone assay                  1
Acoustic refl threshold tst    1
Adenovirus assay w/optic       1
Admin ecg contrast agent       1
Ag detect nos eia mult         1
Alpha-1-antitrypsin pheno      1
Alpha-1-antitrypsin total      1
Alpha-fetoprotein l3           1
Als 1                          1
Alteplase recombinant          1
Amifostine                     1
Amikacin sulfate injection     1
...
X-ray exam of body section      1
X-ray exam of eye sockets       1
X-ray exam of jaw joint         1
X-ray exam of mastoids          1
X-ray exam of shoulders         1
X-ray guide gu dilation         1
X-ray head for orthodontia      1
X-ray stress view               1
X-rays bone survey complete     1
X-rays bone survey limited      1
Xm archive tissue molec anal    1
Xpose endoprosth brachial       1
Xray endovasc thor ao repr      1
als 2                           1
bls                             1
Length: 1034, dtype: int64

For the service types that are provided by only one type of provider, we can tabulate which provider it is that provides the service, and also display the number of times that the service was provided.

In [16]:
ii = (sum_stype == 1)
tab1 = tab.loc[ii, :]
usv = tab1.apply(np.argmax, 1)
usv = pd.DataFrame(usv)
usv["Num"] = tab1.apply(np.max, 1)
usv = usv.rename(columns={0: "Service"})
print(usv)
                                                          Service     Num
hcpcs_description                                                        
ALS1-emergency                         Ambulance Service Supplier  209322
AbobotulinumtoxinA                                      Neurology   13541
Abrasion lesion single                             Otolaryngology      20
Acetone assay                                 Clinical Laboratory      18
Acoustic refl threshold tst   Audiologist (billing independently)      73
Adenovirus assay w/optic                                Optometry      18
Admin ecg contrast agent                               Cardiology      93
Ag detect nos eia mult                        Clinical Laboratory     213
Alpha-1-antitrypsin pheno                     Clinical Laboratory      50
Alpha-1-antitrypsin total                     Clinical Laboratory     280
Alpha-fetoprotein l3                          Clinical Laboratory      47
Als 1                                  Ambulance Service Supplier   14697
Alteplase recombinant                         Hematology/Oncology     149
Amifostine                                       Medical Oncology      28
Amikacin sulfate injection                                Urology      93
Amines vaginal fluid qual                   Obstetrics/Gynecology      13
Amputate leg at thigh                            Vascular Surgery      48
Amputation toe & metatarsal                      Vascular Surgery      50
Analysis skeletal muscle                                Pathology      17
Analyz neurostim brain addon                            Neurology      18
Analyze spine infus pump                           Anesthesiology      33
Anesth abdominal wall surg                         Anesthesiology      15
Anesth cabg w/o pump                               Anesthesiology      14
Anesth dx knee arthroscopy                                   CRNA      22
Anesth ear surgery                                 Anesthesiology      79
Anesth genitalia surgery                           Anesthesiology      24
Anesth kidney transplant                           Anesthesiology      14
Anesth kidney/ureter surg                          Anesthesiology     148
Anesth knee joint procedure                        Anesthesiology      13
Anesth lower leg surgery                           Anesthesiology      52
...                                                           ...     ...
Verteporfin injection                               Ophthalmology   15600
Vinorelbine tartrate inj                      Hematology/Oncology     963
Virus antibody nos                            Clinical Laboratory      95
Virus inoculation shell via                   Clinical Laboratory     198
Virus inoculation tissue                      Clinical Laboratory      61
Visual audiometry (vra)       Audiologist (billing independently)      19
Vit for membrane dissect                            Ophthalmology      17
Wcd device interrogate                                 Cardiology     222
Wedge resect of lung initial                      Cardiac Surgery      12
West nile virus ab igm                        Clinical Laboratory      54
West nile virus antibody                      Clinical Laboratory      46
Wheelchair mngment training                    Physical Therapist     361
Withdrawal of arterial blood                    Pulmonary Disease     452
Wound closure by adhesive                      Emergency Medicine      51
Wound prep addl 100 cm         Plastic and Reconstructive Surgery      24
X-ray exam of body section           Oral Surgery (dentists only)     280
X-ray exam of eye sockets                    Diagnostic Radiology      15
X-ray exam of jaw joint              Oral Surgery (dentists only)     165
X-ray exam of mastoids                          Internal Medicine      77
X-ray exam of shoulders                        Orthopedic Surgery     411
X-ray guide gu dilation                      Diagnostic Radiology      19
X-ray head for orthodontia           Oral Surgery (dentists only)      31
X-ray stress view                              Orthopedic Surgery      90
X-rays bone survey complete                  Diagnostic Radiology     531
X-rays bone survey limited                   Diagnostic Radiology     122
Xm archive tissue molec anal                            Pathology     513
Xpose endoprosth brachial                                 Urology      13
Xray endovasc thor ao repr               Interventional Radiology      12
als 2                                  Ambulance Service Supplier    4210
bls                                    Ambulance Service Supplier  196370

[1034 rows x 2 columns]

Exercises

  • Produce sorted lists of the 10 most common procedures in Michigan and in Florida.

  • Produce sorted lists of the 10 most common provider types in Michigan and in Florida.

  • Compute the difference between the number of times that each service was provided in Michigan and in Florida. Then compute the average of the number of times that each service was provided in Michigan and in Florida. Make a scatterplot of the difference against the average. If you think it is more helpful, do the analysis and plotting on the log scale.