Now that we've covered the Pandas fundamentals, let's dive into some of the other tasks you'll want to be able to cover with your dataset.
Cleaning is arguably the most important part of the data analysis process. If your dataset has fields that were entered by hand, duplicated rows or missing data, not even the best analysis is going to yield worthwhile results. Garbage in, garbage out.
Some things to look for when you're diving into at a dataset:
So, let's keep these things in mind and take another look at the datasets we used in the last session.
#import main libraries and read in files (you know the drill)
import pandas as pd
import numpy as np
facilities = pd.read_csv("data/facilities.csv")
complaints = pd.read_excel("data/complaints.xlsx")
facilities.head()
facid | fac_type | capacity | fac_name | fac_address | city_state_zip | Unnamed: 6 | owner | operator | |
---|---|---|---|---|---|---|---|---|---|
0 | 385008 | NF | 96.0 | Presbyterian Community Care Center | 1085 N Oregon St | Ontario, OR 97914 | NaN | Presbyterian Nursing Home, Inc. | Presbyterian Nursing Home, Inc. |
1 | 385010 | NF | 159.0 | Laurelhurst Village Rehabilitation Center | 3060 SE Stark St | Portland, OR 97214 | NaN | Laurelhurst Operations, LLC | Laurelhurst Operations, LLC |
2 | 385015 | NF | 128.0 | Regency Gresham Nursing & Rehabilitation Center | 5905 SE Powell Valley Rd | Gresham, OR 97080 | NaN | Regency Gresham Nursing & Rehabilitation Cente... | Regency Pacific Management, LLC |
3 | 385018 | NF | 98.0 | Providence Benedictine Nursing Center | 540 South Main St | Mt. Angel, OR 97362 | NaN | Providence Health & Services - Oregon | Providence Health & Services - Oregon |
4 | 385024 | NF | 91.0 | Avamere Health Services of Rogue Valley | 625 Stevens St | Medford, OR 97504 | NaN | Medford Operations, LLC | Medford Operations, LLC |
There are a couple things we could do to clean this dataset up and make it easier to scan. First, let's take a look at that "Unnamed" column. It looks like we could just drop it to simplify this dataset, but let's double-check to see if there are any actual values in that column before we make a decision.
facilities["Unnamed: 6"].unique()
array([nan])
Nope, nothing to see here. Let's get rid of it!
facilities = facilities.drop("Unnamed: 6", axis=1)
facilities.head()
facid | fac_type | capacity | fac_name | fac_address | city_state_zip | owner | operator | |
---|---|---|---|---|---|---|---|---|
0 | 385008 | NF | 96.0 | Presbyterian Community Care Center | 1085 N Oregon St | Ontario, OR 97914 | Presbyterian Nursing Home, Inc. | Presbyterian Nursing Home, Inc. |
1 | 385010 | NF | 159.0 | Laurelhurst Village Rehabilitation Center | 3060 SE Stark St | Portland, OR 97214 | Laurelhurst Operations, LLC | Laurelhurst Operations, LLC |
2 | 385015 | NF | 128.0 | Regency Gresham Nursing & Rehabilitation Center | 5905 SE Powell Valley Rd | Gresham, OR 97080 | Regency Gresham Nursing & Rehabilitation Cente... | Regency Pacific Management, LLC |
3 | 385018 | NF | 98.0 | Providence Benedictine Nursing Center | 540 South Main St | Mt. Angel, OR 97362 | Providence Health & Services - Oregon | Providence Health & Services - Oregon |
4 | 385024 | NF | 91.0 | Avamere Health Services of Rogue Valley | 625 Stevens St | Medford, OR 97504 | Medford Operations, LLC | Medford Operations, LLC |
Looking better already!
Next, let's take a look at this facid
column. It appears to be a unique id, which is always helpful to have when dealing with a dataset. So let's see if it really is unique.
facilities["facid"].value_counts().head()
50M098 2 385120 1 70A263 1 5MA240 1 70A012 1 Name: facid, dtype: int64
OK, so there are a few things going on here.
First, Pandas imported these ids as integers, but they've got letters in them and they're unique ids, so we don't want to be able to add and subtract them.
Let's recast that column as a string value rather than an integer.
facilities["facid"] = facilities["facid"].astype(str)
facilities["facid"].dtype
dtype('O')
Cool. Now let's take a closer look at that ID that appears to have two entries — are there two facilities with the same ID, or do we just have a duplicate in here?
Try filtering the dataset to find out.
Have group filter and report back.
facilities[facilities["facid"]=="50M098"]
facid | fac_type | capacity | fac_name | fac_address | city_state_zip | owner | operator | |
---|---|---|---|---|---|---|---|---|
165 | 50M098 | RCF | NaN | Aaren Brooke Place | 995 N Oregon St | Ontario, OR 97914 | Ashley Manor, L.L.C. | Ashley Manor, L.L.C. |
166 | 50M098 | RCF | 14.0 | Aaren Brooke Place | 995 N Oregon St | Ontario, OR 97914 | Ashley Manor, L.L.C. | Ashley Manor, L.L.C. |
Yep, we've got a duplicate. Let's drop the one that's missing capacity
.
We're going to drop the row we don't want by selecting its index number, 165
, and then resetting the index so that it's numbered sequentially again. For safety, we're also going to write the result to a new variable so that, if we mess up and have to rerun this cell, we won't run the risk of dropping any other row that gets numbered 165
.
Speaking of messing up: In your Pandas adventures, there's probably going to be a point where you've got a lot of variables saved and things just aren't doing what you think they should be doing. There's an easy way to start fresh in those cases — just select Kernel
>Restart & Clear Output
to restart so you can run everything from the beginning, or select Kernel
>Restart & Run All
to clear all variables and rerun your whole notebook.
facilities_clean = facilities.drop([165]).reset_index(drop=True)
facilities_clean[facilities_clean["facid"]=="50M098"]
facid | fac_type | capacity | fac_name | fac_address | city_state_zip | owner | operator | |
---|---|---|---|---|---|---|---|---|
165 | 50M098 | RCF | 14.0 | Aaren Brooke Place | 995 N Oregon St | Ontario, OR 97914 | Ashley Manor, L.L.C. | Ashley Manor, L.L.C. |
Great, we got rid of that sucker. Now we're duplicate-free, so let's tackle that combined city_state_zip
column — we'd really like to be able to take a look at these facilities by city or zip code, so we're going to want them in separate columns.
facilities_clean["zip"] = facilities_clean["city_state_zip"].str[-5:]
facilities_clean["zip"].unique()
array(['97914', '97214', '97080', '97362', '97504', '97239', '97459', '97230', '97218', '97405', '97058', '97202', '97526', '97030', '97330', '97477', '97031', '97321', '97219', '97630', '97071', '97045', '97224', '97220', '97601', '97701', '97215', '97439', '97470', '97103', '97221', '97038', '97471', '97424', '97116', '97401', '97420', '97862', '97365', '97467', '97415', '97007', '97403', '97355', '97128', '97132', '97741', '97426', '97233', '97236', '97351', '97302', '97005', '97520', '97222', '97801', '97027', '97322', '97338', '97211', '97850', '97216', '97124', '97051', '97301', '97448', '97756', '97303', '97317', '97086', '97385', '97386', '97147', '97123', '97458', '97232', '97013', '97754', '97381', '97838', '97266', '97034', '97070', '97035', '97918', '97378', '97213', '97225', '97068', '97056', '97702', '97869', '97060', '97062', '97206', '97603', '97223', '97203', '97205', '97411', '97404', '97201', '97830', '97814', '97417', '97720', '97487', '97527', '97442', '97307', '97478', '97392', '97306', '97006', '97267', '97402', '97308', '97304', '97008', '97015', '97138', '97305', '97367', '97739', '97229', '97503', '97528', '97479', '97823', '97210', '97141', '97828', '97845', '97140', '97502', '97146', '97055', '97444', '97498', '97333', '97836', '97016', '97037', '97530', '97457', '97846', '97738', '97023', '97540', '97131', '97501', '97361', '97209', '97537', '97524', '97913', '97761', '97383', '05400'], dtype=object)
Anyone have observations about what we're looking at here?
Format-wise, this actually worked out pretty well — looks like there weren't any items in that column entered wrong, so all of these are actually zip codes. As an added bonus, it's always best to read zip codes in as string or categorical data, since integer or float formats will drop the leading zeros.
Speaking of leading zeros, though, that 05400
zipcode doesn't quite look right. Let's take a closer look.
facilities_clean[facilities_clean["zip"]=='05400']
facid | fac_type | capacity | fac_name | fac_address | city_state_zip | owner | operator | zip | |
---|---|---|---|---|---|---|---|---|---|
642 | 0O0O0O | NaN | 57.0 | Fake Facility | 1234 Fake St | Nowheresville, NY 05400 | Fake Company | Not a Company, LLC | 05400 |
Pretty sneaky!
Full disclosure, I threw this one. If it was a little less fake-looking, you might double-check the facid
against the complaints dataset to make sure it doesn't match any entries there. This entry definitely doesn't belong here, so let's just drop it.
facilities_clean = facilities_clean[facilities_clean["zip"] != "05400"]
facilities_clean["zip"].unique()
array(['97914', '97214', '97080', '97362', '97504', '97239', '97459', '97230', '97218', '97405', '97058', '97202', '97526', '97030', '97330', '97477', '97031', '97321', '97219', '97630', '97071', '97045', '97224', '97220', '97601', '97701', '97215', '97439', '97470', '97103', '97221', '97038', '97471', '97424', '97116', '97401', '97420', '97862', '97365', '97467', '97415', '97007', '97403', '97355', '97128', '97132', '97741', '97426', '97233', '97236', '97351', '97302', '97005', '97520', '97222', '97801', '97027', '97322', '97338', '97211', '97850', '97216', '97124', '97051', '97301', '97448', '97756', '97303', '97317', '97086', '97385', '97386', '97147', '97123', '97458', '97232', '97013', '97754', '97381', '97838', '97266', '97034', '97070', '97035', '97918', '97378', '97213', '97225', '97068', '97056', '97702', '97869', '97060', '97062', '97206', '97603', '97223', '97203', '97205', '97411', '97404', '97201', '97830', '97814', '97417', '97720', '97487', '97527', '97442', '97307', '97478', '97392', '97306', '97006', '97267', '97402', '97308', '97304', '97008', '97015', '97138', '97305', '97367', '97739', '97229', '97503', '97528', '97479', '97823', '97210', '97141', '97828', '97845', '97140', '97502', '97146', '97055', '97444', '97498', '97333', '97836', '97016', '97037', '97530', '97457', '97846', '97738', '97023', '97540', '97131', '97501', '97361', '97209', '97537', '97524', '97913', '97761', '97383'], dtype=object)
A note on zip codes: We separated out our zip codes from a column in the string format, which means they're recognized as strings. That's not always the case; if our dataset had had a separate column of zip codes, there's a 99 percent chance they would have imported as numbers. Especially in my neck of the woods, where most zip codes begin with a zero, we end up with four-digit zip codes — no bueno! Here's how you can import a column of zip codes as strings.
Now let's separate out the city name.
# separate beginning of entry to comma
facilities_clean["city"] = facilities_clean["city_state_zip"].str.split(",").str[0]
facilities_clean["city"].unique()
array(['Ontario', 'Portland', 'Gresham', 'Mt. Angel', 'Medford', 'North Bend', 'Eugene', 'The Dalles', 'Grants Pass', 'Corvallis', 'Springfield', 'Hood River', 'Albany', 'Lakeview', 'Woodburn', 'Oregon City', 'Tigard', 'Klamath Falls', 'Bend', 'Florence', 'Roseburg', 'Astoria', 'Molalla', 'Cottage Grove', 'Forest Grove', 'Coos Bay', 'Milton-Freewater', 'Newport', 'Reedsport', 'Brookings', 'Beaverton', 'Lebanon', 'McMinnville', 'Newberg', 'Madras', 'Creswell', 'Independence', 'Salem', 'Ashland', 'Milwaukie', 'Pendleton', 'Gladstone', 'Dallas', 'La Grande', 'Hillsboro', 'St Helens', 'Junction City', 'Redmond', 'Keizer', 'Sublimity', 'Sweet Home', 'Wheeler', 'Myrtle Point', 'Canby', 'Prineville', 'Silverton', 'Hermiston', 'Lake Oswego', 'Wilsonville', 'Vale', 'Sheridan', 'West Linn', 'Scappoose', 'Prairie City', 'Wood Village', 'Tualatin', 'Troutdale', 'Bandon', 'Fossil', 'Baker City', 'Canyonville', 'Burns', 'Veneta', 'Glendale', 'Turner', 'Happy Valley', 'St. Helens', 'Klamath Falls OR', 'Clackamas', 'Seaside', 'Lincoln City', 'La Pine', 'White City', 'Sutherlin', 'Condon', 'Tillamook', 'Enterprise', 'John Day', 'Sherwood', 'Central Point', 'Warrenton', 'Sandy', 'LaGrande', 'Gold Beach', 'Yachats', 'Heppner', 'Clatskanie', 'Maupin', 'Jacksonville', 'Myrtle Creek', 'Joseph', 'Hines', 'Estacada', 'Talent', 'Nehalem', 'Monmouth', 'Rogue River', 'Eagle Point', 'Nyssa', 'Warm Springs', 'Stayton'], dtype=object)
Now let's take a look at our operator and owner fields.
sorted(facilities_clean["operator"].unique())
['A Touch of Grace, LLC', 'AIM Senior Management, LLC', 'ASPEN COURT AID OPCO, LLC', 'ASTOR AID OPCO, LLC', 'AWBREY AID OPCO, LLC', 'Adara Oaks Manor, LLC', 'Advocate Care, LLC', 'Ageia Health Services, LLC', 'Aidan Health Services, Inc.', 'Aidan Senior Living at Reedsport, Inc', 'Angeline Senior Living, LLC', 'Artegan at Hawthorne Gardens, LLC', 'Asa Care, Inc.', 'Ashland View Manor, Inc.', 'Ashley Manor LLC', 'Ashley Manor, L.L.C.', 'Aspen Foundation', 'Aspen Foundation III', 'Autumn Garden Home RCF, LLC', 'Avamere Bethany Operations, LLC', 'Avamere Lake Oswego Operations Investors, LLC', 'Avamere Sandy Operations, LLC', 'Avamere Stafford Operations, LLC', 'Avamere-Bethany Operations, LLC', 'Avamere-Hillsboro Operations, LLC', 'Avamere-Sandy Operations, LLC', 'Avamere-Sherwood Operations, LLC', 'Avamere-St. Helens Operations, LLC ', 'Avamere-St.Helens Operations, LLC', 'Avant Senior Housing Managers & Consultants, LLC', 'BME Enterprises, Inc.', 'BPM Senior Living Company', 'Beaverton Rehab & Specialty Care, LLC', 'Bee Hive Homes of Baker City', 'Benecia Senior Living, LLC', 'Benicia Senior Living, LLC', 'Bible Mennonite Fellowship, Inc.', 'Blue Haven Residential Care Facilities, Inc.', 'Blue Mountain Hospital District', 'Bonaventure Senior Living', 'Brookdale Senior Living Communities, Inc.', 'CARE 3, LLC', 'CARRIAGE AID OPCO, LLC', 'CHG Management Company I, LLC', 'CML, Inc.', 'Cameo Care Management, LLC', 'Capri Senior Living , LLC', 'Care Wise Management, Inc.', 'Caring Places Management, LLC', 'Cascade Living Group, Inc.', 'Cecile Molden', 'Century Park Associates, LLC', 'Chancellor Health Care, Inc,', "Chantele's Loving Touch Memory Care, Inc.", 'Charles Lawrence', 'Cherrywood, Inc., The', 'Chetco Inn RCF Incorporated', 'Churchill Retirement Services, LLC', 'Clackamas Rehabilitation, LLC.', "Clarendon Court Alzheimer's Residence, LLC", 'Clatsop Care Center Health District', 'Concepts in Community Living, Inc.', 'Confederated Tribes of Warm Springs Reservation of Oregon', 'Coos Bay Rehabilitation, LLC.', 'Cornell Investors Group, Inc.', 'Cornerstone Care Option, Inc.', 'Countryside Living of Canby, LLC', 'Countryside Living of Redmond, LLC', 'Courtyard Fountains Care Properties, LLC', 'Crestview Operations, LLC', 'Crown Two Development, LLC', 'DAVENPORT AID OPCO, LLC', 'Dakavia Management, Corp.', 'Dallas Care Center, Inc.', 'Dharma Healthcare Management, Inc.', 'Donham Place, LLC', 'EEA Company', 'ElderHealth & Living Corporation', 'Elderly Care Home, Inc.', 'Elite Care Management Group, LLC', 'Elite Care OE2, LLC', "Elman's House Corp.", 'EmeriCare, Inc.', 'Emeritus Corporation', 'EmpRes Healthcare Management, LLC', 'Eugene Rehabilitation, LLC.', 'Evangelical Lutheran Good Sam. Society', 'Evergreen Healthcare Management, LLC', 'FM Cedar Village, LLC', 'FM Ocean Crest, LLC', 'FM Ocean Ridge, LLC', 'FM Pelican, LLC', 'FM Pheasant Pointe, LLC', 'FM Princeton, LLC', 'FM Redwood Heights, LLC', 'FMG Northeast Weidler Street Oregon LLC', 'FMG Shore Drive Oregon, LLC', 'Fircrest Community Living, Inc.', 'Forest Drive Operations, LLC', 'Forest Meadows RCF, Inc.', 'Fossil Elderly Housing Committee, Inc.', 'Four Seasons RCF Evergreen', 'Four Seasons RCF Fairgrounds', 'Fred T. & Elizabeth C. Asa', 'Friendship Health Center, Inc.', 'Friendsview Manor, Inc.', 'Fronline Management', 'Frontier Management, LLC', 'Frontline Management', 'GRACE AID OPCO, LLC', 'Gateway Assisted Living, Inc.', 'Gateway Gardens Assisted Living, Inc.', 'Geistlinger Enterprises, Inc.', 'Generations, LLC', 'Genesis Newberg Operations Company, LLC', 'Golden Age Living LLC', 'Greenridge Estates at Mountain Park, LLC', 'H & L Care Centers, Inc.', 'Harmony Estates, Inc.', 'Harmony Guest Home, Inc.', 'Harmony Living, Inc.', 'Harvest Homes, Inc.', 'Hawthorn Retirement Group, LLC', 'Hearthstone Management Services, LLC', 'Heights Management, Inc.', 'Heirloom Living Centers, LLC', 'HiLLSIDE AID OPCO, LLC', 'Hillsboro Care Properties, LLC', 'Integral Senior Living, LLC', 'Ivy Court Senior Living, Inc.', 'JACKSON AID OPCO, LLC', 'JEA Senior Living', 'Jerry Erwin Associates, Inc.', 'Joseph ALF, Inc.', 'Junction City Rehabilitation, LLC.', 'Kathleen Howard', 'Keizer Campus Operations, LLC', 'Keizer Care Properties, LLC', 'Keizer River Operations, LLC', 'King City Rehab, LLC', 'Kinsel Ameri Properties, Inc.', 'Lakeview Gardens, LLC', 'Laurel Parc AL at Bethany, LLC', 'Laurelhurst Operations, LLC', 'Lebanon Care Center, LLC', 'Leisure Care, LLC', 'Life Care Centers Of America, Inc.', 'Life Care Services', 'Life Care Services LLC', 'Life Care Services, LLC', 'Living Care Lifestyles', 'Living Care Management', 'Lynn-Ann Development, LLC', 'MACKLYN AID OPCO, LLC', 'MP, LLC', 'Magnolia Gardens L.L.C.', 'Magnolia Gardens, LLC', 'Malheur Memorial Health District', 'Maple Valley Dementia Care, Inc.', 'Marathon Enterprise, LLC', 'Marian Estates Support Services', 'Marjorie House McMinnville, LLC', 'Marquis Companies I, Inc.', 'Marquis Companies II, Inc.', 'Mary Nork', "Mary's Woods at Marylhurst, Inc.", 'McKenzie Living, Inc. ', 'Meadows Courtyard, Inc.', 'Medford Operations, LLC', 'Mennonite Home Of Albany, Inc.', 'Mennonite Management Services, Inc.', 'Milestone Retirement Communities, LLC', 'Milwaukie Care Center, Inc.', 'Mission Senior Living, LLC', 'Morrow County Health District', 'Mosaic Management, Inc.', 'Mountain View Rehab, LLC', 'Mountain West Retirement Corp.', 'Mt. Angel Towers, Oregon, Ltd.', 'Neawanna Care Properties, LLC', 'Necanicum Operations, LLC', 'Newport Rehabilitation, LLC.', 'Norma Ann Hubbard', 'Northridge Center, Inc.', "O'Hara's Manor, Inc.", 'Ocean Park Care Properties', 'Odd Fellows Home of Oregon', 'Ohana Harmony House, LLC', 'Oregon Baptist Retirement Homes', 'Our House of Portland, Inc.', 'PAR, LLC', 'PARKHURST AID OPCO, LLC', 'Pacific Living Centers, Inc.', 'Pacific Retirement Services, Inc.', 'Pacifica Senior Living, LLC', 'Peaks and Valleys, LLC', 'Peckham-Miller, Inc.', 'Pinnacle Healthcare Management, Inc.', 'Pioneer Nursing Home Health District', 'Premier Living Center, Inc.', 'Presbyterian Nursing Home, Inc.', 'Prestige Care Inc.', 'Prestige Care, Inc.', 'Prestige Senior Living', 'Prestige Senior Living, L.L.C', 'Prestige Senior Living, L.L.C.', 'Prestige Senior Living, LLC', 'Providence Health & Services - Oregon', 'Providence Health System - Oregon', 'RN Villa Care Center, LLC', 'Radiant Senior Living, Inc.', 'Raleigh Hills Management, LLC', 'Regency Pacific Management, LLC', 'Regency Park Management, LLC', 'Regent Court Management LLC', 'Ridgeline Management Co.', 'Ridgeview Assisted Living Center, LLC', 'River Run AID OPCO, LLC', 'Riverpark Operations, LLC', 'Riverside Living, Inc.', 'Robison Jewish Home', 'Rose Villa, Inc.', 'Rosewood ALF, LLC', 'Roxy Ann Memory Community, LLC', 'Royalton Place Management, LLC', 'S.A.G.E.', 'SAGE', 'SLH Rainier Manager, LLC', 'SRG Management, LLC', 'Sage AID OPCO, LLC', 'Sandra Tidwell', 'Sapphire at Firwood, LLC', 'Sapphire at Gateway, LLC', 'Sea Aire Assisted Living, LLC', 'Seasons Management, LLC', 'Senior Haven RCF, LLC', 'Senior Housing Managers, LLC', 'Senior Living Management, Inc.', 'Senior Living Services, Inc.', 'Sheldon Park Management LLC', 'Sherwood Park Nursing Home, Inc.', 'Sherwood Pines Residential Care, Inc.', 'Silver Creek Care Properties, LLC', "Silvia & John's Residential Care, Inc.", 'Sistere, Inc.', 'Sisters of St. Mary of Oregon Maryville Corp.', 'South Salem Rehabilitation, LLC.', 'Summit Springs Village Corporation', 'Sunnyside Operations, LLC', 'Suttle Care & Retirement, Inc.', 'Sweet Bye N Bye AFC & RCF Facilities, Inc.', "Sylvia's Legacy, Inc.", 'TS Management, LLC', 'Tabor Crest Residential Care, LLC', 'Tailored Management Services, LLC', 'Terwilliger Plaza, Inc.', 'The Evangelical Lutheran Good Samaritan', 'The Griffin House LLC', 'The Maren Corporation', 'The Springs Living, LLC', 'Tierra Senior Living, LLC', 'Tillamook County CARE, Inc.', 'Touchmark Living Centers, Inc.', 'Touchstone Communities, LLC', 'Trinity Mission Health & Rehab of Portland, LLC', 'Turner Retirement Homes, Inc.', 'Twin Oaks Rehab, LLC.', 'V.M.C., Inc.', 'Valley View Care Centers, Inc.', 'Veterans Care Centers of Oregon', 'Village Health Care I, LLC', 'Vintage Investment Prop., Inc.', 'Waterford Operations, LLC', 'West Hills Convalescent Center Limited Partnership', 'West Hills Village, LP', 'West Wind Court Corporation', 'Westmont Living, Inc.', 'Whitewood Group, LLC', 'Willamette Lutheran Homes, Inc.', 'Willamette Manor, Inc.', 'Willamette View, Inc.', 'Woodland Heights, LLC', 'Woollard Ipsen Management, LLC']
Definitely some duplicates in there. Let's see how many entries there are for operator and owner, then standardize both to take care of some of those issues.
print(len(facilities_clean["operator"].unique()))
print(len(facilities_clean["owner"].unique()))
283 437
The biggest issues we saw in that summary were capitalization and punctuation differences. So instead of going through and fixing duplicates one by one, let's set all owner and operator names to uppercase and remove commas and periods. That should take care of most of the issues we're seeing.
facilities_clean["operator"] = facilities_clean["operator"].str.upper().str.replace(r"[,.]","")
facilities_clean["owner"] = facilities_clean["owner"].str.upper().str.replace(r"[,.]","")
sorted(facilities_clean["operator"].unique())
['A TOUCH OF GRACE LLC', 'ADARA OAKS MANOR LLC', 'ADVOCATE CARE LLC', 'AGEIA HEALTH SERVICES LLC', 'AIDAN HEALTH SERVICES INC', 'AIDAN SENIOR LIVING AT REEDSPORT INC', 'AIM SENIOR MANAGEMENT LLC', 'ANGELINE SENIOR LIVING LLC', 'ARTEGAN AT HAWTHORNE GARDENS LLC', 'ASA CARE INC', 'ASHLAND VIEW MANOR INC', 'ASHLEY MANOR LLC', 'ASPEN COURT AID OPCO LLC', 'ASPEN FOUNDATION', 'ASPEN FOUNDATION III', 'ASTOR AID OPCO LLC', 'AUTUMN GARDEN HOME RCF LLC', 'AVAMERE BETHANY OPERATIONS LLC', 'AVAMERE LAKE OSWEGO OPERATIONS INVESTORS LLC', 'AVAMERE SANDY OPERATIONS LLC', 'AVAMERE STAFFORD OPERATIONS LLC', 'AVAMERE-BETHANY OPERATIONS LLC', 'AVAMERE-HILLSBORO OPERATIONS LLC', 'AVAMERE-SANDY OPERATIONS LLC', 'AVAMERE-SHERWOOD OPERATIONS LLC', 'AVAMERE-ST HELENS OPERATIONS LLC ', 'AVAMERE-STHELENS OPERATIONS LLC', 'AVANT SENIOR HOUSING MANAGERS & CONSULTANTS LLC', 'AWBREY AID OPCO LLC', 'BEAVERTON REHAB & SPECIALTY CARE LLC', 'BEE HIVE HOMES OF BAKER CITY', 'BENECIA SENIOR LIVING LLC', 'BENICIA SENIOR LIVING LLC', 'BIBLE MENNONITE FELLOWSHIP INC', 'BLUE HAVEN RESIDENTIAL CARE FACILITIES INC', 'BLUE MOUNTAIN HOSPITAL DISTRICT', 'BME ENTERPRISES INC', 'BONAVENTURE SENIOR LIVING', 'BPM SENIOR LIVING COMPANY', 'BROOKDALE SENIOR LIVING COMMUNITIES INC', 'CAMEO CARE MANAGEMENT LLC', 'CAPRI SENIOR LIVING LLC', 'CARE 3 LLC', 'CARE WISE MANAGEMENT INC', 'CARING PLACES MANAGEMENT LLC', 'CARRIAGE AID OPCO LLC', 'CASCADE LIVING GROUP INC', 'CECILE MOLDEN', 'CENTURY PARK ASSOCIATES LLC', 'CHANCELLOR HEALTH CARE INC', "CHANTELE'S LOVING TOUCH MEMORY CARE INC", 'CHARLES LAWRENCE', 'CHERRYWOOD INC THE', 'CHETCO INN RCF INCORPORATED', 'CHG MANAGEMENT COMPANY I LLC', 'CHURCHILL RETIREMENT SERVICES LLC', 'CLACKAMAS REHABILITATION LLC', "CLARENDON COURT ALZHEIMER'S RESIDENCE LLC", 'CLATSOP CARE CENTER HEALTH DISTRICT', 'CML INC', 'CONCEPTS IN COMMUNITY LIVING INC', 'CONFEDERATED TRIBES OF WARM SPRINGS RESERVATION OF OREGON', 'COOS BAY REHABILITATION LLC', 'CORNELL INVESTORS GROUP INC', 'CORNERSTONE CARE OPTION INC', 'COUNTRYSIDE LIVING OF CANBY LLC', 'COUNTRYSIDE LIVING OF REDMOND LLC', 'COURTYARD FOUNTAINS CARE PROPERTIES LLC', 'CRESTVIEW OPERATIONS LLC', 'CROWN TWO DEVELOPMENT LLC', 'DAKAVIA MANAGEMENT CORP', 'DALLAS CARE CENTER INC', 'DAVENPORT AID OPCO LLC', 'DHARMA HEALTHCARE MANAGEMENT INC', 'DONHAM PLACE LLC', 'EEA COMPANY', 'ELDERHEALTH & LIVING CORPORATION', 'ELDERLY CARE HOME INC', 'ELITE CARE MANAGEMENT GROUP LLC', 'ELITE CARE OE2 LLC', "ELMAN'S HOUSE CORP", 'EMERICARE INC', 'EMERITUS CORPORATION', 'EMPRES HEALTHCARE MANAGEMENT LLC', 'EUGENE REHABILITATION LLC', 'EVANGELICAL LUTHERAN GOOD SAM SOCIETY', 'EVERGREEN HEALTHCARE MANAGEMENT LLC', 'FIRCREST COMMUNITY LIVING INC', 'FM CEDAR VILLAGE LLC', 'FM OCEAN CREST LLC', 'FM OCEAN RIDGE LLC', 'FM PELICAN LLC', 'FM PHEASANT POINTE LLC', 'FM PRINCETON LLC', 'FM REDWOOD HEIGHTS LLC', 'FMG NORTHEAST WEIDLER STREET OREGON LLC', 'FMG SHORE DRIVE OREGON LLC', 'FOREST DRIVE OPERATIONS LLC', 'FOREST MEADOWS RCF INC', 'FOSSIL ELDERLY HOUSING COMMITTEE INC', 'FOUR SEASONS RCF EVERGREEN', 'FOUR SEASONS RCF FAIRGROUNDS', 'FRED T & ELIZABETH C ASA', 'FRIENDSHIP HEALTH CENTER INC', 'FRIENDSVIEW MANOR INC', 'FRONLINE MANAGEMENT', 'FRONTIER MANAGEMENT LLC', 'FRONTLINE MANAGEMENT', 'GATEWAY ASSISTED LIVING INC', 'GATEWAY GARDENS ASSISTED LIVING INC', 'GEISTLINGER ENTERPRISES INC', 'GENERATIONS LLC', 'GENESIS NEWBERG OPERATIONS COMPANY LLC', 'GOLDEN AGE LIVING LLC', 'GRACE AID OPCO LLC', 'GREENRIDGE ESTATES AT MOUNTAIN PARK LLC', 'H & L CARE CENTERS INC', 'HARMONY ESTATES INC', 'HARMONY GUEST HOME INC', 'HARMONY LIVING INC', 'HARVEST HOMES INC', 'HAWTHORN RETIREMENT GROUP LLC', 'HEARTHSTONE MANAGEMENT SERVICES LLC', 'HEIGHTS MANAGEMENT INC', 'HEIRLOOM LIVING CENTERS LLC', 'HILLSBORO CARE PROPERTIES LLC', 'HILLSIDE AID OPCO LLC', 'INTEGRAL SENIOR LIVING LLC', 'IVY COURT SENIOR LIVING INC', 'JACKSON AID OPCO LLC', 'JEA SENIOR LIVING', 'JERRY ERWIN ASSOCIATES INC', 'JOSEPH ALF INC', 'JUNCTION CITY REHABILITATION LLC', 'KATHLEEN HOWARD', 'KEIZER CAMPUS OPERATIONS LLC', 'KEIZER CARE PROPERTIES LLC', 'KEIZER RIVER OPERATIONS LLC', 'KING CITY REHAB LLC', 'KINSEL AMERI PROPERTIES INC', 'LAKEVIEW GARDENS LLC', 'LAUREL PARC AL AT BETHANY LLC', 'LAURELHURST OPERATIONS LLC', 'LEBANON CARE CENTER LLC', 'LEISURE CARE LLC', 'LIFE CARE CENTERS OF AMERICA INC', 'LIFE CARE SERVICES', 'LIFE CARE SERVICES LLC', 'LIVING CARE LIFESTYLES', 'LIVING CARE MANAGEMENT', 'LYNN-ANN DEVELOPMENT LLC', 'MACKLYN AID OPCO LLC', 'MAGNOLIA GARDENS LLC', 'MALHEUR MEMORIAL HEALTH DISTRICT', 'MAPLE VALLEY DEMENTIA CARE INC', 'MARATHON ENTERPRISE LLC', 'MARIAN ESTATES SUPPORT SERVICES', 'MARJORIE HOUSE MCMINNVILLE LLC', 'MARQUIS COMPANIES I INC', 'MARQUIS COMPANIES II INC', 'MARY NORK', "MARY'S WOODS AT MARYLHURST INC", 'MCKENZIE LIVING INC ', 'MEADOWS COURTYARD INC', 'MEDFORD OPERATIONS LLC', 'MENNONITE HOME OF ALBANY INC', 'MENNONITE MANAGEMENT SERVICES INC', 'MILESTONE RETIREMENT COMMUNITIES LLC', 'MILWAUKIE CARE CENTER INC', 'MISSION SENIOR LIVING LLC', 'MORROW COUNTY HEALTH DISTRICT', 'MOSAIC MANAGEMENT INC', 'MOUNTAIN VIEW REHAB LLC', 'MOUNTAIN WEST RETIREMENT CORP', 'MP LLC', 'MT ANGEL TOWERS OREGON LTD', 'NEAWANNA CARE PROPERTIES LLC', 'NECANICUM OPERATIONS LLC', 'NEWPORT REHABILITATION LLC', 'NORMA ANN HUBBARD', 'NORTHRIDGE CENTER INC', "O'HARA'S MANOR INC", 'OCEAN PARK CARE PROPERTIES', 'ODD FELLOWS HOME OF OREGON', 'OHANA HARMONY HOUSE LLC', 'OREGON BAPTIST RETIREMENT HOMES', 'OUR HOUSE OF PORTLAND INC', 'PACIFIC LIVING CENTERS INC', 'PACIFIC RETIREMENT SERVICES INC', 'PACIFICA SENIOR LIVING LLC', 'PAR LLC', 'PARKHURST AID OPCO LLC', 'PEAKS AND VALLEYS LLC', 'PECKHAM-MILLER INC', 'PINNACLE HEALTHCARE MANAGEMENT INC', 'PIONEER NURSING HOME HEALTH DISTRICT', 'PREMIER LIVING CENTER INC', 'PRESBYTERIAN NURSING HOME INC', 'PRESTIGE CARE INC', 'PRESTIGE SENIOR LIVING', 'PRESTIGE SENIOR LIVING LLC', 'PROVIDENCE HEALTH & SERVICES - OREGON', 'PROVIDENCE HEALTH SYSTEM - OREGON', 'RADIANT SENIOR LIVING INC', 'RALEIGH HILLS MANAGEMENT LLC', 'REGENCY PACIFIC MANAGEMENT LLC', 'REGENCY PARK MANAGEMENT LLC', 'REGENT COURT MANAGEMENT LLC', 'RIDGELINE MANAGEMENT CO', 'RIDGEVIEW ASSISTED LIVING CENTER LLC', 'RIVER RUN AID OPCO LLC', 'RIVERPARK OPERATIONS LLC', 'RIVERSIDE LIVING INC', 'RN VILLA CARE CENTER LLC', 'ROBISON JEWISH HOME', 'ROSE VILLA INC', 'ROSEWOOD ALF LLC', 'ROXY ANN MEMORY COMMUNITY LLC', 'ROYALTON PLACE MANAGEMENT LLC', 'SAGE', 'SAGE AID OPCO LLC', 'SANDRA TIDWELL', 'SAPPHIRE AT FIRWOOD LLC', 'SAPPHIRE AT GATEWAY LLC', 'SEA AIRE ASSISTED LIVING LLC', 'SEASONS MANAGEMENT LLC', 'SENIOR HAVEN RCF LLC', 'SENIOR HOUSING MANAGERS LLC', 'SENIOR LIVING MANAGEMENT INC', 'SENIOR LIVING SERVICES INC', 'SHELDON PARK MANAGEMENT LLC', 'SHERWOOD PARK NURSING HOME INC', 'SHERWOOD PINES RESIDENTIAL CARE INC', 'SILVER CREEK CARE PROPERTIES LLC', "SILVIA & JOHN'S RESIDENTIAL CARE INC", 'SISTERE INC', 'SISTERS OF ST MARY OF OREGON MARYVILLE CORP', 'SLH RAINIER MANAGER LLC', 'SOUTH SALEM REHABILITATION LLC', 'SRG MANAGEMENT LLC', 'SUMMIT SPRINGS VILLAGE CORPORATION', 'SUNNYSIDE OPERATIONS LLC', 'SUTTLE CARE & RETIREMENT INC', 'SWEET BYE N BYE AFC & RCF FACILITIES INC', "SYLVIA'S LEGACY INC", 'TABOR CREST RESIDENTIAL CARE LLC', 'TAILORED MANAGEMENT SERVICES LLC', 'TERWILLIGER PLAZA INC', 'THE EVANGELICAL LUTHERAN GOOD SAMARITAN', 'THE GRIFFIN HOUSE LLC', 'THE MAREN CORPORATION', 'THE SPRINGS LIVING LLC', 'TIERRA SENIOR LIVING LLC', 'TILLAMOOK COUNTY CARE INC', 'TOUCHMARK LIVING CENTERS INC', 'TOUCHSTONE COMMUNITIES LLC', 'TRINITY MISSION HEALTH & REHAB OF PORTLAND LLC', 'TS MANAGEMENT LLC', 'TURNER RETIREMENT HOMES INC', 'TWIN OAKS REHAB LLC', 'VALLEY VIEW CARE CENTERS INC', 'VETERANS CARE CENTERS OF OREGON', 'VILLAGE HEALTH CARE I LLC', 'VINTAGE INVESTMENT PROP INC', 'VMC INC', 'WATERFORD OPERATIONS LLC', 'WEST HILLS CONVALESCENT CENTER LIMITED PARTNERSHIP', 'WEST HILLS VILLAGE LP', 'WEST WIND COURT CORPORATION', 'WESTMONT LIVING INC', 'WHITEWOOD GROUP LLC', 'WILLAMETTE LUTHERAN HOMES INC', 'WILLAMETTE MANOR INC', 'WILLAMETTE VIEW INC', 'WOODLAND HEIGHTS LLC', 'WOOLLARD IPSEN MANAGEMENT LLC']
Now let's run that count again and see how we did on deduping.
print(len(facilities_clean["operator"].unique()))
print(len(facilities_clean["owner"].unique()))
276 433
complaints.head()
complaint_id | facility_id | facility_type | incident_date | notes | severity | fine | Facility Invest Results Abuse | Facility Invest Results Rule | Type Of Abuse | |
---|---|---|---|---|---|---|---|---|---|---|
0 | OT105179A | 385008 | NF | 2010-08-31 | RV reported asking staff to change him/her pri... | 2.0 | 0.0 | Not Substantiated | Substantiated | NaN |
1 | OT105179B | 385008 | NF | 2010-08-31 | RV reported staff answered his/her call light,... | 2.0 | 0.0 | Not Substantiated | Substantiated | NaN |
2 | OT105179C | 385008 | NF | 2010-08-31 | RV reported an unknown "not RV's regular staff... | 2.0 | 0.0 | Not Substantiated | Substantiated | NaN |
3 | OR0000656000 | 385008 | NF | 2010-12-21 | Resident 1 was admitted with multiple diagnose... | 3.0 | 0.0 | Substantiated | Substantiated | Neglect |
4 | OT105397 | 385008 | NF | 2010-09-17 | RV was admitted 9/17/10 with multiple diagnose... | 2.0 | 0.0 | Not Substantiated | Substantiated | NaN |
We're going to want facility_id
in this frame to match up with facid
in the facilities list, and we have to make sure that date column came in as a date, so let's see what our datatypes are here.
complaints.dtypes
complaint_id object facility_id object facility_type object incident_date datetime64[ns] notes object severity float64 fine float64 Facility Invest Results Abuse object Facility Invest Results Rule object Type Of Abuse object dtype: object
Huzzah! Our facility_id
is already a string, and that date field is formatted as datetime, which means we can easily separate out the year. Let's do that now.
complaints["incident_year"] = complaints["incident_date"].dt.year
complaints["incident_year"].unique()
array([2010, 2012, 2011, 2013, 2015, 2014, 2016, 2009, 2001, 2105, 2003])
complaints[complaints["incident_year"]==2105]
complaint_id | facility_id | facility_type | incident_date | notes | severity | fine | Facility Invest Results Abuse | Facility Invest Results Rule | Type Of Abuse | incident_year | |
---|---|---|---|---|---|---|---|---|---|---|---|
4284 | CO15241 | 516637 | AFH | 2105-12-02 | Voluntarily reduced capacity - Condition not n... | 4.0 | NaN | NaN | NaN | NaN | 2105 |
There are a lot of options for how you might fix this, but this is just one entry and we're doing a pretty quick pass that's not for production, so let's just drop this errant entry for now.
complaints = complaints[complaints["incident_year"] != 2105]
complaints["incident_year"].unique()
array([2010, 2012, 2011, 2013, 2015, 2014, 2016, 2009, 2001, 2003])
Now that our data is sparkling clean, let's see what else we can do with it.
First, let's see how many complaints Oregon's gotten by year.
complaints_year = complaints[["incident_year","complaint_id"]].groupby(["incident_year"]).count()
complaints_year.index.names=["year"]
complaints_year
complaint_id | |
---|---|
year | |
2001 | 1 |
2003 | 1 |
2009 | 11 |
2010 | 383 |
2011 | 1396 |
2012 | 1391 |
2013 | 1552 |
2014 | 1792 |
2015 | 1667 |
2016 | 157 |
Interesting. We'll come back to that in a little bit, but right now let's pivot back to look at those individual facility complaint rates we looked at in the last session.
complaints_by_facility = complaints.groupby("facility_id").count()[["complaint_id"]].reset_index()
complaints_by_facility = complaints_by_facility.rename(columns={"complaint_id":"complaints"})
complaints_by_facility.head()
facility_id | complaints | |
---|---|---|
0 | 385008 | 9 |
1 | 385010 | 8 |
2 | 385015 | 17 |
3 | 385018 | 17 |
4 | 385024 | 40 |
Let's merge that with our facilities dataset.
facilities_merge = facilities_clean.merge(complaints_by_facility, left_on="facid", right_on="facility_id",how="left")
facilities_merge = facilities_merge.drop(["facility_id","city_state_zip"],axis=1)
facilities_merge.head()
facid | fac_type | capacity | fac_name | fac_address | owner | operator | zip | city | complaints | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 385008 | NF | 96.0 | Presbyterian Community Care Center | 1085 N Oregon St | PRESBYTERIAN NURSING HOME INC | PRESBYTERIAN NURSING HOME INC | 97914 | Ontario | 9.0 |
1 | 385010 | NF | 159.0 | Laurelhurst Village Rehabilitation Center | 3060 SE Stark St | LAURELHURST OPERATIONS LLC | LAURELHURST OPERATIONS LLC | 97214 | Portland | 8.0 |
2 | 385015 | NF | 128.0 | Regency Gresham Nursing & Rehabilitation Center | 5905 SE Powell Valley Rd | REGENCY GRESHAM NURSING & REHABILITATION CENTE... | REGENCY PACIFIC MANAGEMENT LLC | 97080 | Gresham | 17.0 |
3 | 385018 | NF | 98.0 | Providence Benedictine Nursing Center | 540 South Main St | PROVIDENCE HEALTH & SERVICES - OREGON | PROVIDENCE HEALTH & SERVICES - OREGON | 97362 | Mt. Angel | 17.0 |
4 | 385024 | NF | 91.0 | Avamere Health Services of Rogue Valley | 625 Stevens St | MEDFORD OPERATIONS LLC | MEDFORD OPERATIONS LLC | 97504 | Medford | 40.0 |
Remember that ratio we calculated last time around? Now we can do that again with our clean dataset.
facilities_merge["comp_rate"] = facilities_merge["complaints"]/facilities_merge["capacity"]
facilities_merge.sort_values("comp_rate",ascending=False)
facid | fac_type | capacity | fac_name | fac_address | owner | operator | zip | city | complaints | comp_rate | |
---|---|---|---|---|---|---|---|---|---|---|---|
392 | 5MA170 | RCF | 37.0 | Brookdale McMinnville Westside | 320 SW Hill Road | BROOKDALE SENIOR LIVING COMMUNITIES INC | BROOKDALE SENIOR LIVING COMMUNITIES INC | 97128 | McMinnville | 59.0 | 1.594595 |
402 | 5MA233 | RCF | 30.0 | Ashley Manor - Roseburg | 427 SE Ramp St. | ASHLEY MANOR LLC | ASHLEY MANOR LLC | 97470 | Roseburg | 47.0 | 1.566667 |
389 | 5MA161 | RCF | 32.0 | Skylark Memory Care | 950 Skylark Place | ASHLAND ASSISTED LIVING LLC | MISSION SENIOR LIVING LLC | 97520 | Ashland | 48.0 | 1.500000 |
319 | 50R367 | RCF | 48.0 | Arbor Oaks Terrace Memory Care | 317 Werth Blvd. | NEWBERG MEMORY ASSOCIATES LLC | FRONTIER MANAGEMENT LLC | 97132 | Newberg | 71.0 | 1.479167 |
397 | 5MA215 | RCF | 55.0 | Baycrest Memory Care | 955 Kentucky Avenue | BAY AREA PROPERTIES LLC | RADIANT SENIOR LIVING INC | 97420 | Coos Bay | 77.0 | 1.400000 |
184 | 50M220 | RCF | 15.0 | Ashley Manor - Shasta | 475 S Shasta Pl. Longview Div. | ASHLEY MANOR LLC | ASHLEY MANOR LLC | 97720 | Burns | 19.0 | 1.266667 |
243 | 50R275 | RCF | 28.0 | Avamere at St. Helens | 2400 Gable Rd. | AVAMERE - ST HELENS OPERATIONS LLC | AVAMERE-ST HELENS OPERATIONS LLC | 97051 | St. Helens | 33.0 | 1.178571 |
331 | 50R379 | RCF | 84.0 | Fern Gardens Memory Care | 2636 Table Rock Rd | FERN GARDENS MEMORY CARE LLC | RIDGELINE MANAGEMENT CO | 97504 | Medford | 97.0 | 1.154762 |
248 | 50R280 | RCF | 48.0 | Elderberry Square Community | 3321 Oak St | ELDERBERRY SQUARE COMMUNITY LLC | SENIOR HOUSING MANAGERS LLC | 97439 | Florence | 52.0 | 1.083333 |
285 | 50R323 | RCF | 18.0 | Prestige Senior Living Arbor Place Memory Care | 3150 Juanipero Way | CHP MEDFORD -ARBOR PLACE OR TENANT CORP | PRESTIGE SENIOR LIVING LLC | 97504 | Medford | 19.0 | 1.055556 |
181 | 50M209 | RCF | 36.0 | Forest Glen Senior Residence | 200 SW Frontage Rd. | ASPEN FOUNDATION | ASPEN FOUNDATION | 97417 | Canyonville | 37.0 | 1.027778 |
359 | 50R408 | RCF | 15.0 | Bee Hive Homes of Baker City | 3078 Resort St. | THE HOME PLACE IN BAKER LLC | BEE HIVE HOMES OF BAKER CITY | 97814 | Baker City | 15.0 | 1.000000 |
150 | 50M026 | RCF | 34.0 | Ellendale Residential Care Center | 511 E Ellendale Ave | DALLAS CARE CENTER INC | DALLAS CARE CENTER INC | 97338 | Dallas | 32.0 | 0.941176 |
254 | 50R288 | RCF | 15.0 | Ashley Manor - Athens | 1514 Athens Ave. | ASHLEY MANOR LLC | ASHLEY MANOR LLC | 97801 | Pendleton | 14.0 | 0.933333 |
373 | 5MA003 | RCF | 40.0 | Ashley Manor - Sage | 1355 SW Sage | ASHLEY MANOR LLC | ASHLEY MANOR LLC | 97838 | Hermiston | 37.0 | 0.925000 |
312 | 50R359 | RCF | 25.0 | Spruce Point Memory Care | 375 9th Street | SPRUCE POINT INC | PRESTIGE CARE INC | 97439 | Florence | 23.0 | 0.920000 |
404 | 5MA240 | RCF | 48.0 | Callahan Court Memory Care Comm. | 1770 NW Valley View Drive | LSREF GOLDEN OPS 14(OR) LLC | FRONTIER MANAGEMENT LLC | 97470 | Roseburg | 43.0 | 0.895833 |
169 | 50M132 | RCF | 60.0 | River Grove Memory Care | 140 Green Lane | RIVER GROVE OPERATING COMPANY | BENECIA SENIOR LIVING LLC | 97404 | Eugene | 53.0 | 0.883333 |
299 | 50R345 | RCF | 36.0 | Middlefield Oaks Memory Care Community | 1500 Village Drive | MIDDLEFIELD OAKS ASSISTED LIVING LLC | FRONTLINE MANAGEMENT | 97424 | Cottage Grove | 31.0 | 0.861111 |
399 | 5MA221 | RCF | 42.0 | Aspen Ridge Memory Care | 1025 NE Purcell Blvd | FM ASPEN MC LLC | FRONTIER MANAGEMENT LLC | 97701 | Bend | 35.0 | 0.833333 |
405 | 5MA241 | RCF | 15.0 | Ashley Manor - Oak | 572 NE Oak Street | ASHLEY MANOR LLC | ASHLEY MANOR LLC | 97741 | Madras | 12.0 | 0.800000 |
413 | 5MA255 | RCF | 15.0 | Ashley Manor - Heidi Lane | 2144 NW Heidi Lane | ASHLEY MANOR LLC | ASHLEY MANOR LLC | 97526 | Grants Pass | 12.0 | 0.800000 |
247 | 50R279 | RCF | 64.0 | Brookdale Eugene Alpine Court Memory Care | 3720 N. Clarey St | EMERITUS CORPORATION | BROOKDALE SENIOR LIVING COMMUNITIES INC | 97402 | Eugene | 48.0 | 0.750000 |
416 | 5MA266 | RCF | 30.0 | Wildflower Lodge | 508 16th St | LSREF GOLDEN OPS 14 (OR) LLC | SLH RAINIER MANAGER LLC | 97850 | LaGrande | 22.0 | 0.733333 |
618 | 70M234 | ALF | 70.0 | Brookdale Eagle Point | 261 Loto Street | EMERITUS CORPORATION | BROOKDALE SENIOR LIVING COMMUNITIES INC | 97524 | Eagle Point | 50.0 | 0.714286 |
307 | 50R353 | RCF | 24.0 | Cedar Village Memory Care Community | 4452 Lancaster Drive NE | ARHC CVSALOR01 TRS LLC | FM CEDAR VILLAGE LLC | 97301 | Salem | 17.0 | 0.708333 |
275 | 50R311 | RCF | 43.0 | Gardens, The | 2690 NE Yacht | LAKEVIEW OPERATIONS LLC | WESTMONT LIVING INC | 97367 | Lincoln City | 30.0 | 0.697674 |
269 | 50R305 | RCF | 52.0 | Evergreen Court of Molalla | 250 Kennel St. | MOLALLA SENIOR LIVING LLC | AVANT SENIOR HOUSING MANAGERS & CONSULTANTS LLC | 97038 | Molalla | 36.0 | 0.692308 |
403 | 5MA238 | RCF | 15.0 | Ashley Manor - Meadow Lakes | 228 SW Meadow Lakes Drive | ASHLEY MANOR LLC | ASHLEY MANOR LLC | 97754 | Prineville | 10.0 | 0.666667 |
393 | 5MA205 | RCF | 60.0 | Brookdale Salem | 1355 Boone Rd SE | BROOKDALE SENIOR LIVING COMMUNITIES INC | BROOKDALE SENIOR LIVING COMMUNITIES INC | 97302 | Salem | 38.0 | 0.633333 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
347 | 50R396 | RCF | 39.0 | Whitewood Gardens | 2027 SE 174th Ave. | WHITEWOOD GROUP LLC | WHITEWOOD GROUP LLC | 97233 | Portland | NaN | NaN |
348 | 50R397 | RCF | 31.0 | Footsteps at Sherwood | 15677 SW Oregon | HSRE-SPRINGS II AT SHERWOOD SUB-TRS LLC | THE SPRINGS LIVING LLC | 97140 | Sherwood | NaN | NaN |
349 | 50R398 | RCF | 16.0 | Footsteps at Carman Oaks | 3900 SW Carman Drive | HSRE-SPRINGS II AT LAKE OSWEGO SUB-TRS LLC | THE SPRINGS LIVING LLC | 97035 | Lake Oswego | NaN | NaN |
350 | 50R399 | RCF | 25.0 | Footsteps at Mill Creek | 1021 W 10th Street | HSRE-SPRINGS II AT THE DALLES LLP | THE SPRINGS LIVING LLC | 97058 | The Dalles | NaN | NaN |
356 | 50R405 | RCF | 15.0 | Elite Care Oatfield Estates Larch House | 4405 SE Oatfield Hill Rd | ELITE CARE OE2 LLC | ELITE CARE OE2 LLC | 97267 | Milwaukie | NaN | NaN |
357 | 50R406 | RCF | 15.0 | Elite Care Oatfield Estates Tabor House | 4425 SE Oatfield Hill Rd | ELITE CARE OE2 LLC | ELITE CARE OE2 LLC | 97267 | Milwaukie | NaN | NaN |
361 | 50R410 | RCF | 21.0 | McKenzie Living | 6452 "A" Street | MCKENZIE LIVING INC | MCKENZIE LIVING INC | 97478 | Springfield | NaN | NaN |
362 | 50R411 | RCF | 64.0 | Waterford Grand Memory Care | 600 Waterford Way | BDC/EUGENE LLC | BPM SENIOR LIVING COMPANY | 97401 | Eugene | NaN | NaN |
368 | 50R418 | RCF | 64.0 | Windsong at Eola Hills | 2030 Wallace Road NW | WEST SALEM MEMORY CARE LLC | AIDAN HEALTH SERVICES INC | 97304 | Salem | NaN | NaN |
369 | 50R421 | RCF | 32.0 | Clatsop Care Memory Community | 2219 SE Dolphin Road | CLATSOP CARE CENTER HEALTH DISTRICT | CLATSOP CARE CENTER HEALTH DISTRICT | 97146 | Warrenton | NaN | NaN |
370 | 50R430 | RCF | 23.0 | Bonaventure of Albany Memory Care | 420 Geri Street | MOUNTAIN WEST RETIREMENT CORPORATION | BONAVENTURE SENIOR LIVING | 97321 | Albany | NaN | NaN |
371 | 50R432 | RCF | 23.0 | Bonaventure of Tigard Memory Care | 15000 SW Hall Blvd | BONAVENTURE OF TIGARD LLC | BONAVENTURE SENIOR LIVING | 97224 | Tigard | NaN | NaN |
372 | 50R433 | RCF | 68.0 | Waterhouse Ridge Memory Care Community | 1115 NW 158th Avenue | WATERHOUSE RIDGE MEMORY CARE LLC | FRONTLINE MANAGEMENT | 97006 | Beaverton | NaN | NaN |
419 | 5ME175 | RCF | 16.0 | Premier Living Center | 5120 SE 118th | PREMIER LIVING CENTER INC | PREMIER LIVING CENTER INC | 97266 | Portland | NaN | NaN |
423 | 70A057 | ALF | 63.0 | Markham House Retirement Community | 10606 SW Capitol Hwy | PORTMH LLC | LEISURE CARE LLC | 97219 | Portland | NaN | NaN |
427 | 70A097 | ALF | 70.0 | Timberhill Place | 989 NW Spruce Ave | VINTAGE INVESTMENT PROP INC | VINTAGE INVESTMENT PROP INC | 97330 | Corvallis | NaN | NaN |
432 | 70A235 | ALF | 60.0 | Bonaventure of Albany Assisted Living | 420 Geri Street | MOUNTAIN WEST RETIREMENT CORPORATION | BONAVENTURE SENIOR LIVING | 97321 | Albany | NaN | NaN |
434 | 70A249 | ALF | 20.0 | Providence Brookside Manor | 1550 Brookside Dr. | PROVIDENCE HEALTH & SERVICES - OREGON | PROVIDENCE HEALTH SYSTEM - OREGON | 97031 | Hood River | NaN | NaN |
457 | 70A283 | ALF | 18.0 | Willow Creek Terrace | 400 Frank Gilliam Dr. | WILLOW CREEK VALLEY ASSISTED LIVING CORP | MORROW COUNTY HEALTH DISTRICT | 97836 | Heppner | NaN | NaN |
474 | 70A300 | ALF | 44.0 | Springs at Veranda Park, The | 1641 NE Veranda Park Drive | HSRE-SPRINGS III AT MEDFORD VP SUB TRS LLC | THE SPRINGS LIVING LLC | 97504 | Medford | NaN | NaN |
475 | 70A301 | ALF | 69.0 | Bonaventure of Tigard Assisted Living | 15000 SW Hall Blvd | BONAVENTURE OF TIGARD LLC | BONAVENTURE SENIOR LIVING | 97224 | Tigard | NaN | NaN |
484 | 70A310 | ALF | 16.0 | Stafford Assisted Living Facility, The | 1200 Overlook Dr | AVAMERE STAFFORD OPERATIONS LLC | AVAMERE STAFFORD OPERATIONS LLC | 97034 | Lake Oswego | NaN | NaN |
490 | 70A317 | ALF | 24.0 | Mirabella at South Waterfront | 3550 SW Bond Ave. | MIRABELLA AT SOUTH WATERFRONT | PACIFIC RETIREMENT SERVICES INC | 97239 | Portland | NaN | NaN |
491 | 70A318 | ALF | 22.0 | Countryside Village | 1700 Kellenbeck Rd. | LYNN-ANN DEVELOPMENT LLC | LYNN-ANN DEVELOPMENT LLC | 97528 | Grants Pass | NaN | NaN |
503 | 70M006 | ALF | 96.0 | Summerplace Assisted Living Community | 15727 NE Russell St | SUMMERPLACE ASSISTED LIVING LLC | PRESTIGE SENIOR LIVING LLC | 97230 | Portland | NaN | NaN |
522 | 70M027 | ALF | 44.0 | Oregon Retirement Center | 1010 NE 3rd | EVERGREEN OREGON HEALTHCARE ORCHARDS RETIREMEN... | EVERGREEN HEALTHCARE MANAGEMENT LLC | 97862 | Milton-Freewater | NaN | NaN |
538 | 70M043 | ALF | 30.0 | Willow Place | 1307 N College | ASSISTED LIVING FACILITIES INC | CONCEPTS IN COMMUNITY LIVING INC | 97132 | Newberg | NaN | NaN |
581 | 70M094 | ALF | 87.0 | Fountains At Town Center Village, The | 8607 SE Causey Ave | TCV EMPLOYEES LLC | GENERATIONS LLC | 97086 | Happy Valley | NaN | NaN |
588 | 70M103 | ALF | 60.0 | Wiley Creek Community | 5050 Mountain Fir Street | MID-VALLEY HEALTHCARE | AIDAN HEALTH SERVICES INC | 97386 | Sweet Home | NaN | NaN |
640 | 70M350 | ALF | 119.0 | Village at Keizer Ridge, The | 1165 McGee Court | VKR LLC | KEIZER CARE PROPERTIES LLC | 97303 | Keizer | NaN | NaN |
642 rows × 11 columns
facility_ratio = facilities_merge[["fac_name","comp_rate"]].set_index("fac_name").sort_values("comp_rate",ascending=False)
facility_ratio
comp_rate | |
---|---|
fac_name | |
Brookdale McMinnville Westside | 1.594595 |
Ashley Manor - Roseburg | 1.566667 |
Skylark Memory Care | 1.500000 |
Arbor Oaks Terrace Memory Care | 1.479167 |
Baycrest Memory Care | 1.400000 |
Ashley Manor - Shasta | 1.266667 |
Avamere at St. Helens | 1.178571 |
Fern Gardens Memory Care | 1.154762 |
Elderberry Square Community | 1.083333 |
Prestige Senior Living Arbor Place Memory Care | 1.055556 |
Forest Glen Senior Residence | 1.027778 |
Bee Hive Homes of Baker City | 1.000000 |
Ellendale Residential Care Center | 0.941176 |
Ashley Manor - Athens | 0.933333 |
Ashley Manor - Sage | 0.925000 |
Spruce Point Memory Care | 0.920000 |
Callahan Court Memory Care Comm. | 0.895833 |
River Grove Memory Care | 0.883333 |
Middlefield Oaks Memory Care Community | 0.861111 |
Aspen Ridge Memory Care | 0.833333 |
Ashley Manor - Oak | 0.800000 |
Ashley Manor - Heidi Lane | 0.800000 |
Brookdale Eugene Alpine Court Memory Care | 0.750000 |
Wildflower Lodge | 0.733333 |
Brookdale Eagle Point | 0.714286 |
Cedar Village Memory Care Community | 0.708333 |
Gardens, The | 0.697674 |
Evergreen Court of Molalla | 0.692308 |
Ashley Manor - Meadow Lakes | 0.666667 |
Brookdale Salem | 0.633333 |
... | ... |
Whitewood Gardens | NaN |
Footsteps at Sherwood | NaN |
Footsteps at Carman Oaks | NaN |
Footsteps at Mill Creek | NaN |
Elite Care Oatfield Estates Larch House | NaN |
Elite Care Oatfield Estates Tabor House | NaN |
McKenzie Living | NaN |
Waterford Grand Memory Care | NaN |
Windsong at Eola Hills | NaN |
Clatsop Care Memory Community | NaN |
Bonaventure of Albany Memory Care | NaN |
Bonaventure of Tigard Memory Care | NaN |
Waterhouse Ridge Memory Care Community | NaN |
Premier Living Center | NaN |
Markham House Retirement Community | NaN |
Timberhill Place | NaN |
Bonaventure of Albany Assisted Living | NaN |
Providence Brookside Manor | NaN |
Willow Creek Terrace | NaN |
Springs at Veranda Park, The | NaN |
Bonaventure of Tigard Assisted Living | NaN |
Stafford Assisted Living Facility, The | NaN |
Mirabella at South Waterfront | NaN |
Countryside Village | NaN |
Summerplace Assisted Living Community | NaN |
Oregon Retirement Center | NaN |
Willow Place | NaN |
Fountains At Town Center Village, The | NaN |
Wiley Creek Community | NaN |
Village at Keizer Ridge, The | NaN |
642 rows × 1 columns
First, we'll look at complaints over time.
import matplotlib.pyplot as plt
import pylab
%matplotlib inline
complaints_year.plot()
#pylab.ylim([0,1900])
<matplotlib.axes._subplots.AxesSubplot at 0x111952198>
Cool, we made a picture!
This visualization raises a lot of questions — why so few reports pre-2010? Are 2010 and 2016 really abberations, or is this partial data for some reason? Before you get much further, you're probably going to want to circle back to your data source or another expert to find out exactly how these data were collected. But of course, we don't have time for that today!
Instead, let's filter those older records so we can get a better picture of complaints since 2010 and throw a title on that chart. And heck, for good measure let's export it to a file so we can share it.
complaints_year = complaints_year[complaints_year.index >2009]
complaints_year.plot(title="Oregon Long-Term Care Facility Complaints")
pylab.ylim([0,1900])
#and for good measure, let's save it to a file
plt.savefig("output/plot.png",format="png")
plt.savefig("output/plot.svg",format="svg")
Nice! Now let's see which facilities have the highest ratio of complaints.
facility_ratio[facility_ratio["comp_rate"]>1].plot.barh(title="Oregon Long-Term Care Facilities With the Highest Ratio of Complaints Per Patient").invert_yaxis()
These are not the prettiest charts ever, by a long shot, but in a pinch, they're a very fast way to visualize the data you're working with, and if you're going to end up putting the chart in D3 or handing it off to a designer for print purposes, this may suffice.
If you're looking to up your visualization game, check out seaborn
, which will give you prettier static charts right out of the box, or bokeh
for interactive graphics.