Auditing Federal Contractors Part III

By Leon Yin Last Updated 2018-06-24

View this notebook in NBViewer or Github.

The output is availalble publically on Google Drive and Github.

ICE Contractor Analysis

Following my analysis of federal contracts to the private prisons CoreCivic and Geo Group, this notebook analyzes publically avaiable contracts from the Immigration and Customs Enforcement Agency (ICE). This analysis uses data downloaded from the new USASpending.gov dashboard filtered on "U.S. Immigration and Customs Enforcement (ICE) | Sub-Agency".

In [2]:
import glob
import datetime
import pandas as pd
In [3]:
# data downloaded from USAspending.gov
input_file_pattern = 'ice_data/data_in/*'
In [4]:
files = glob.glob(input_file_pattern)
files
Out[4]:
['ice_data/data_in/all_contracts_subawards_1.csv',
 'ice_data/data_in/all_assistance_prime_transactions_1.csv',
 'ice_data/data_in/all_contracts_prime_transactions_1.csv',
 'ice_data/data_in/all_assistance_subawards_1.csv']

Let's just look at prime contracts today.

In [5]:
df_prime_contracts = pd.read_csv(files[2])
/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py:2698: DtypeWarning: Columns (19,32,38,46,47,48,49,56,65,78,79,87,106,107,135,136,204) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)
In [6]:
# there are a lot of columns!
prime_cols = df_prime_contracts.columns
num_cols = len(prime_cols)
print(f"There are {num_cols} columns in this file!\nHere are some of the column names:")
prime_cols.values.tolist()[:20]
There are 258 columns in this file!
Here are some of the column names:
Out[6]:
['award_id_piid',
 'modification_number',
 'transaction_number',
 'parent_award_agency_id',
 'parent_award_agency_name',
 'parent_award_id',
 'parent_award_modification_number',
 'federal_action_obligation',
 'base_and_exercised_options_value',
 'current_total_value_of_award',
 'base_and_all_options_value',
 'potential_total_value_of_award',
 'action_date',
 'period_of_performance_start_date',
 'period_of_performance_current_end_date',
 'period_of_performance_potential_end_date',
 'ordering_period_end_date',
 'awarding_agency_code',
 'awarding_agency_name',
 'awarding_sub_agency_code']

That is a lot of columns, here are the definitions of the columns we'll be using here:

Glossary of fields used in this Analysis with Definitions from USASpending:

recipient_duns
The unique identification number for the ultimate parent of an awardee or recipient. Currently the identifier is the 9-digit number maintained by Dun & Bradstreet as the global parent DUNSĀ® number.

potential_total_value_of_award
The total amount that could be obligated on a contract. This total includes the base plus options amount. For example, if a recipient is awarded 10M on a base contract with 3 option years at 1M each, the potential award amount is 13M.

current_total_value_of_award
The amount of money that the government has promised (obligated) to pay a recipient for a contract. This means the base amount and any exercised options.

period_of_performance_start_date
The date that the award begins.

period_of_performance_potential_end_date
For procurement, the date on which, for the award referred to by the action being reported if all potential pre-determined or pre-negotiated options were exercised, awardee effort is completed or the award is otherwise ended. Administrative actions related to this award may continue to occur after this date. This date does not apply to procurement indefinite delivery vehicles under which definitive orders may be awarded.


Thanks for providing that glossary USASpending, let's do some minor data wrangling and get some summary statistics.

In [7]:
date_cols = ['period_of_performance_start_date', 'period_of_performance_potential_end_date']
for col in date_cols:
    df_prime_contracts[col] = pd.to_datetime(df_prime_contracts[col])
In [8]:
total_spend = df_prime_contracts['current_total_value_of_award'].sum()
potential_spend = df_prime_contracts['potential_total_value_of_award'].sum()
at_stake =  potential_spend - total_spend

num_contractors = len(df_prime_contracts['recipient_duns'].unique())
first_contract = df_prime_contracts['period_of_performance_start_date'].min().strftime('%Y-%m-%d')
print("ICE has invested ${:,.2f} in contracts with {} Prime contractors "
      "since {}.\nICE has ${:,.2f} in potential payouts with active contracts.".format(
          total_spend, num_contractors, first_contract, at_stake))
ICE has invested $76,815,594,715.20 in contracts with 4294 Prime contractors since 2004-10-01.
ICE has $50,376,993,084.23 in potential payouts with active contracts.

Let's sort this data by the contract end date

In [9]:
df_prime_contracts.sort_values(by=['period_of_performance_potential_end_date'], 
                               ascending=False, 
                               inplace=True)

There are typos in the recipient names, so let's use their DUNs IDs for aggregate analysis.
We can create a lookup_table containing the latest name and contact info

In [10]:
lookup_cols = [
    'recipient_parent_duns',
    'recipient_parent_name', 
    'recipient_address_line_1', 
    'recipient_address_line_2',
    'recipient_city_name', 
    'recipient_state_code', 
    'recipient_state_name',
    'recipient_zip_4_code', 
    'recipient_congressional_district',
    'recipient_phone_number', 
    'recipient_fax_number'
]
In [11]:
lookup_table = df_prime_contracts.drop_duplicates(
    subset = ['recipient_parent_duns']
)[lookup_cols]

Now let's get how much each company has been rewarded, how much they have the potential to be rewarded, how many awards they're been granted, and when their contracts are finished. This code is chained, and kind of messy. Each line aggreagates by the recipient ID, and performs some sort of agg function, and then sorts the values according to the total of the agg function.

In [12]:
contract_amounts = (df_prime_contracts.groupby(
    ['recipient_parent_duns'])
    ['current_total_value_of_award'].sum()
    .sort_values(ascending=False))
In [13]:
contract_potential = (df_prime_contracts.groupby(
    ['recipient_parent_duns'])
    ['potential_total_value_of_award'].sum()
    .sort_values(ascending=False))
In [14]:
num_awards = (df_prime_contracts.groupby(
    ['recipient_parent_duns'])
    ['current_total_value_of_award'].count()
    .sort_values(ascending=False))
In [15]:
contract_enddate = (df_prime_contracts[
    ~df_prime_contracts['period_of_performance_potential_end_date'].isnull()]
    .groupby(['recipient_parent_duns'])
    ['period_of_performance_potential_end_date'].max()
    .sort_values(ascending=False))
In [16]:
contract_startdate = (df_prime_contracts[
    ~df_prime_contracts['period_of_performance_start_date'].isnull()]
    .groupby(['recipient_parent_duns'])
    ['period_of_performance_start_date'].min()
    .sort_values(ascending=False))

For context, let's see what kinds of services each of these companies provide by counting the most frequent words in service descriptions per contractor.

In [24]:
from collections import Counter
from nltk.corpus import stopwords
In [142]:
def count_words(row, counter, sw):
    '''
    Splits sentences, casts words to lowercase, and counts all words that aren't in stopwords (sw)
    '''
    row_clean = str(row).replace('-', ' ').replace('/', ' ').lower()
    counter.update([w for w in row_clean.split() if w not in sw])
In [99]:
sw = stopwords.words('English')
In [143]:
top_words = []
for dun_id_, df_ in df_prime_contracts.groupby('recipient_parent_duns'):
    word_counter_ = Counter()
    df_['product_or_service_code_description'].apply(count_words, args=(word_counter_, sw,))
    
    top_words_ = ', '.join([w for w, c in word_counter_.most_common(5)])
    d_ = dict(top_words_from_serv_desc  = top_words_,
              recipient_parent_duns = dun_id_)
    
    top_words.append(d_)
    
top_words_per_service_description = pd.DataFrame(top_words)

We can join them all together using this craziness below:

In [144]:
output = (contract_amounts.reset_index()
          .merge(
            num_awards.reset_index(), on='recipient_parent_duns')
          .merge(
            contract_potential.reset_index(), on='recipient_parent_duns')
          .merge(
            contract_enddate.reset_index(), on='recipient_parent_duns')
          .merge(
            contract_startdate.reset_index(), on='recipient_parent_duns')
          .merge(
            top_words_per_service_description, on='recipient_parent_duns')
          .merge(
            lookup_table))

The last merge above is the look_up table, containing metadata for each recipient.
Let's change some of the aggregate column names to be more decriptive:

In [145]:
remapping = {
    'current_total_value_of_award_x' : 'current_total_value_of_awards_USD',
    'current_total_value_of_award_y' : 'number_of_prime_awards_USD',
    'potential_total_value_of_award' : 'potential_total_value_of_awards',
    'period_of_performance_potential_end_date' : 'last_contract_end_date',
    'period_of_performance_start_date' : 'first_contract_start_date'
}
In [146]:
output.columns = [remapping.get(c, c) for c in output.columns]

Let's timestamp when this file was made.

In [147]:
output['analysis_date'] = datetime.datetime.now() # when was this file made?
output['raw_data_download_date'] = datetime.datetime(2018,6,24) # when was raw data downloaded?

And lastly, let's re-order the columns so they're readable

In [114]:
order_cols = [
    'recipient_parent_name',
    'top_words_from_serv_desc',
    'number_of_prime_awards',
    'current_total_value_of_awards_USD',
    'potential_total_value_of_awards',
    'first_contract_start_date',
    'last_contract_end_date',
    'recipient_address_line_1',
    'recipient_address_line_2',
    'recipient_city_name', 
    'recipient_state_code', 
    'recipient_state_name',
    'recipient_zip_4_code',
    'recipient_congressional_district',
    'recipient_phone_number', 
    'recipient_fax_number',
    'analysis_date',
    'raw_data_download_date'
]

Here is the output:

In [ ]:
for contractor, df_ in output.groupby('recipient_parent_name'):
    df_['number_of_pride_awards'].sum()
    df_['current_total_value_of_awards_USD'].sum()
    df_['potential_total_value_of_awards'].sum()
In [149]:
output[order_cols].head(50)
Out[149]:
recipient_parent_name top_words_from_serv_desc number_of_prime_awards current_total_value_of_awards_USD potential_total_value_of_awards first_contract_start_date last_contract_end_date recipient_address_line_1 recipient_address_line_2 recipient_city_name recipient_state_code recipient_state_name recipient_zip_4_code recipient_congressional_district recipient_phone_number recipient_fax_number analysis_date raw_data_download_date
0 BATTELLE MEMORIAL INSTITUTE INC oper, govt, r&d, goco, facilities 1 2.258840e+10 2.258840e+10 2016-05-27 00:00:00 2017-09-30 902 BATTELLE BLVD NaN RICHLAND WA WASHINGTON 993521873 NaN NaN NaN 2018-06-24 21:06:38.085717 2018-06-24
1 SPECTRUM SECURITY SERVICES INC. guard, housekeeping, services 128 1.731890e+10 1.902971e+10 2008-09-08 00:00:00 2018-12-11 13967 HIGHWAY 94 STE 101 NaN JAMUL CA CALIFORNIA 919353232 50 NaN NaN 2018-06-24 21:06:38.085717 2018-06-24
2 CSI AVIATION SERVICES INC. travel, air, passenger, transportation, reloca... 63 5.897378e+09 7.426944e+09 2009-03-31 00:00:00 2018-06-30 3700 RIO GRANDE BLVD NW NaN ALBUQUERQUE NM NEW MEXICO 871072876 01 5057619000 5053427377 2018-06-24 21:06:38.085717 2018-06-24
3 INGENESIS INC. medical, general, health, care 20 3.530665e+09 4.874056e+09 2012-07-16 00:00:00 2018-07-15 10231 KOTZEBUE ST NaN SAN ANTONIO TX TEXAS 782174430 21 2103660033 2105684582 2018-06-24 21:06:38.085717 2018-06-24
4 THE GEO GROUP INC guard, housekeeping, support, services, program 115 3.335807e+09 1.663010e+10 2007-06-01 00:00:00 2029-10-11 621 NW 53RD ST STE 700 NaN BOCA RATON FL FLORIDA 334878242 22 5619997359 5619997648 2018-06-24 21:06:38.085717 2018-06-24
5 PHACIL INC. telecom, help, desk 47 2.889115e+09 3.027381e+09 2012-09-10 00:00:00 2018-09-09 800 N GLEBE RD STE 700 NaN ARLINGTON VA VIRGINIA 222032149 8 7035261800 7033662900 2018-06-24 21:06:38.085717 2018-06-24
6 SPECTRUM SECURITY SERVICES INC. housekeeping, guard 13 1.864530e+09 1.997933e+09 2011-12-21 00:00:00 2016-02-29 13967 HIGHWAY 94 STE 101 NaN JAMUL CA CALIFORNIA 919353232 52 NaN NaN 2018-06-24 21:06:38.085717 2018-06-24
7 ASSET PROTECTION & SECURITY SERVICES L.P. guard, housekeeping, oper, govt, misc 19 1.400854e+09 1.633666e+09 2009-09-30 00:00:00 2018-09-30 5502 BURNHAM DR NaN CORPUS CHRISTI TX TEXAS 784133787 27 3619061552 3619061844 2018-06-24 21:06:38.085717 2018-06-24
8 WIDEPOINT CORPORATION support, professional:, program, management 25 1.318187e+09 5.204160e+09 2015-04-09 00:00:00 2018-12-08 7926 JONES BRANCH DR STE 520 NaN MC LEAN VA VIRGINIA 221023371 11 7033495644 7038483560 2018-06-24 21:06:38.085717 2018-06-24
9 TRAILBOSS ENTERPRISES INC. housekeeping, facilities, operations, support,... 13 1.266682e+09 3.871661e+09 2012-06-01 00:00:00 2018-09-30 201 EAST 3RD AVE NaN ANCHORAGE AK ALASKA 995012503 00 9073388243 9073380869 2018-06-24 21:06:38.085717 2018-06-24
10 CORRECTIONS CORPORATION OF AMERICA guard, housekeeping, services, support, facili... 58 1.230570e+09 2.311516e+09 2009-04-01 00:00:00 2019-06-30 10 BURTON HILLS BLVD NaN NASHVILLE TN TENNESSEE 372156105 05 6152636713 6152633090 2018-06-24 21:06:38.085717 2018-06-24
11 AHTNA INCORPORATED guard, housekeeping, services, penal, facilities 29 9.644808e+08 3.692711e+09 2010-02-01 00:00:00 2018-11-30 110 W 38TH AVE STE 200D NaN ANCHORAGE AK ALASKA 995035677 0 9073346477 9073346750 2018-06-24 21:06:38.085717 2018-06-24
12 G4S PLC guard, housekeeping, services, social, rehabil... 181 9.630253e+08 1.101117e+09 2009-08-12 00:00:00 2019-01-31 4200 WACKENHUT DR STE 100 NaN PALM BEACH GARDENS FL FLORIDA 334104243 18 NaN NaN 2018-06-24 21:06:38.085717 2018-06-24
13 CLASSIC AIR CHARTER INC. travel, transportation, relocation, lodging, r... 6 7.483224e+08 3.851914e+09 2017-10-20 00:00:00 2023-06-30 460 NEW YORK AVE NaN HUNTINGTON NY NEW YORK 117433432 03 6315495021 NaN 2018-06-24 21:06:38.085717 2018-06-24
14 NANA REGIONAL CORPORATION INC housekeeping, guard, relocation, services, tra... 22 5.979669e+08 4.151455e+09 2008-09-25 00:00:00 2025-01-31 3901 OLD INTERNATIONAL AIRPORT RD STE 200-G NaN ANCHORAGE AK ALASKA 99502 0 7037666837 7037666906 2018-06-24 21:06:38.085717 2018-06-24
15 SRA INTERNATIONAL INC. support, professional:, communications, progra... 45 5.940203e+08 1.370448e+09 2016-06-20 00:00:00 2022-09-29 4300 FAIR LAKES CT NaN FAIRFAX VA VIRGINIA 220334232 11 NaN NaN 2018-06-24 21:06:38.085717 2018-06-24
16 SCIENCE APPLICATIONS INTERNATIONAL CORPORATION telecom, systems, development, equipment, tele... 4 5.322573e+08 7.048101e+08 2014-09-25 00:00:00 2015-09-24 1710 SAIC DR NaN MCLEAN VA VIRGINIA 22102 NaN 7036765681 7036765063 2018-06-24 21:06:38.085717 2018-06-24
17 M V M INC. guard, housekeeping, support, administrative:,... 41 4.726719e+08 1.767799e+09 2008-06-27 00:00:00 2022-08-03 44620 GUILFORD DR STE 150 NaN ASHBURN VA VIRGINIA 201476063 10 5712234620 5712234487 2018-06-24 21:06:38.085717 2018-06-24
18 UNIVERSAL LANGUAGE CORPORATION translation, admin, svcs, sign, languag 41 4.180052e+08 8.500482e+08 2007-09-10 00:00:00 2014-03-22 97 12 63RD DR STE 2 A NaN REGO PARK NY NEW YORK 113742244 06 7189979369 7184590331 2018-06-24 21:06:38.085717 2018-06-24
19 THE MITRE CORPORATION programming, r&d, research, development, (engi... 27 4.062568e+08 4.108850e+08 2008-08-29 00:00:00 2018-01-12 7515 COLSHIRE DR NaN MC LEAN VA VIRGINIA 221027539 8 7039836474 7039837121 2018-06-24 21:06:38.085717 2018-06-24
20 ENTERPRISE SERVICES LLC telecom, data, centers, storage 9 3.831977e+08 6.851440e+08 2017-06-09 00:00:00 2020-06-26 13600 EDS DR A3S-B53 NaN HERNDON VA VIRGINIA 201713299 10 7037333793 7037422674 2018-06-24 21:06:38.085717 2018-06-24
21 SAVANTAGE FINANCIAL SERVICES INC. telecommunications, telecom, adp, &, svcs 11 3.674644e+08 6.826124e+08 2009-12-08 00:00:00 2020-12-31 1355 PICCARD DR STE 425 NaN ROCKVILLE MD MARYLAND 208504339 8 NaN NaN 2018-06-24 21:06:38.085717 2018-06-24
22 BOOZ ALLEN HAMILTON HOLDING CORPORATION support, professional:, program, management, t... 43 3.664082e+08 1.161358e+09 2009-09-28 00:00:00 2023-01-16 8283 GREENSBORO DR NaN MCLEAN VA VIRGINIA 221023830 11 7033770667 7039023200 2018-06-24 21:06:38.085717 2018-06-24
23 MAXIM HEALTHCARE SERVICES INC. medical, general, health, care 12 3.577394e+08 3.577394e+08 2014-09-11 00:00:00 2018-09-22 7227 LEE DEFOREST DR NaN COLUMBIA MD MARYLAND 210463236 03 4109101553 4109101615 2018-06-24 21:06:38.085717 2018-06-24
24 DELOITTE LLP support, management, professional:, program, f... 70 3.086653e+08 7.260085e+08 2009-08-19 00:00:00 2022-09-22 1725 DUKE ST NaN ALEXANDRIA VA VIRGINIA 223143456 08 5718826190 7039963387 2018-06-24 21:06:38.085717 2018-06-24
25 KNIGHT POINT SYSTEMS LLC telecom, help, desk, telecommunications, support 19 2.907098e+08 8.810253e+08 2011-08-15 00:00:00 2021-06-29 1775 WIEHLE AVE STE 101 NaN RESTON VA VIRGINIA 201905109 11 7036577051 5712663106 2018-06-24 21:06:38.085717 2018-06-24
26 GLOBAL PRECISION SYSTEMS LLC housekeeping, guard 20 2.860719e+08 1.683557e+09 2015-09-23 00:00:00 2019-07-27 4600 DEBARR RD STE 200 NaN ANCHORAGE AK ALASKA 995083103 00 9073348348 9075632742 2018-06-24 21:06:38.085717 2018-06-24
27 CACI INTERNATIONAL INC support, professional:, miscellaneous, communi... 7 2.709178e+08 6.560159e+08 2007-10-22 00:00:00 2021-03-17 14370 NEWBROOK DRIVE NaN CHANTILLY VA VIRGINIA 201512218 10 7036794137 7036793402 2018-06-24 21:06:38.085717 2018-06-24
28 MCKINSEY & COMPANY INC. support, management:, professional: 14 2.189934e+08 2.845323e+08 2015-10-21 00:00:00 2019-05-23 1200 19TH ST NW STE 1100 NaN WASHINGTON DC DISTRICT OF COLUMBIA 200362412 0 2026621185 2026623256 2018-06-24 21:06:38.085717 2018-06-24
29 PALANTIR TECHNOLOGIES INC. data, telecom, information, broadcasting, dist... 17 2.100309e+08 5.111377e+08 2011-02-24 00:00:00 2018-11-27 100 HAMILTON AVE STE 300 NaN PALO ALTO CA CALIFORNIA 943011651 18 6508150200 6506182298 2018-06-24 21:06:38.085717 2018-06-24
30 BEN FITZGERALD REAL ESTATE SERVICES L.L.C. facilities, operations, support, housekeeping,... 165 1.981641e+08 2.393186e+08 2008-09-25 00:00:00 2019-04-30 1530 S SOUTHWEST LOOP 323 NaN TYLER TX TEXAS 757012556 NaN NaN NaN 2018-06-24 21:06:38.085717 2018-06-24
31 FORFEITURE SUPPORT ASSOCIATES LLC support, legal, professional:, services, program 24 1.971348e+08 1.971348e+08 2010-03-23 00:00:00 2016-09-30 20110 ASHBROOK PL STE 220 NaN ASHBURN VA VIRGINIA 201475060 10 5712918900 5712918957 2018-06-24 21:06:38.085717 2018-06-24
32 CAP GEMINI support, professional:, services, professional... 13 1.926498e+08 2.366901e+08 2009-09-25 00:00:00 2023-06-30 1900 CAMPUS COMMONS DR #250 NaN RESTON VA VIRGINIA 201911560 11 5713361618 5713361700 2018-06-24 21:06:38.085717 2018-06-24
33 PHACIL INC. telecom, help, desk 3 1.840695e+08 1.932425e+08 2013-07-04 00:00:00 2018-09-09 950 N GLEBE RD STE 910 NaN ARLINGTON VA VIRGINIA 222034173 08 7035261800 7037387100 2018-06-24 21:06:38.085717 2018-06-24
34 STG INTERNATIONAL INC. support, professional:, program, management, s... 2 1.790737e+08 9.042798e+08 2007-08-22 00:00:00 2023-02-19 99 CANAL CTR PLZ STE 500 NaN ALEXANDRIA VA VIRGINIA 223145504 8 7035786030 5712556545 2018-06-24 21:06:38.085717 2018-06-24
35 DEV TECHNOLOGY GROUP INC. telecom, programming, services, systems, devel... 24 1.648354e+08 4.167145e+08 2011-05-23 00:00:00 2022-02-28 1984 ISAAC NEWTON SQ W STE 302 NaN RESTON VA VIRGINIA 201905041 11 7038608135 7038608138 2018-06-24 21:06:38.085717 2018-06-24
36 GRANT THORNTON INTERNATIONAL LTD support, management, program, professional:, m... 13 1.291985e+08 2.115375e+08 2008-09-17 00:00:00 2023-04-23 333 JOHN CARLYLE ST STE 500 NaN ALEXANDRIA VA VIRGINIA 223145767 08 NaN NaN 2018-06-24 21:06:38.085717 2018-06-24
37 GENERAL DYNAMICS CORPORATION ammunition,, 30mm, equipment, public, relations 24 1.289099e+08 3.557973e+08 2010-05-11 00:00:00 2020-12-31 3170 FAIRVIEW PARK DR NaN FALLS CHURCH VA VIRGINIA 220424516 11 7038761243 NaN 2018-06-24 21:06:38.085717 2018-06-24
38 UNIVERSAL LANGUAGE CORPORATION support, administrative:, translation, interpr... 10 1.186904e+08 2.421312e+08 2011-12-28 00:00:00 2013-09-07 97 12 63RD DR STE 2 A NaN REGO PARK NY NEW YORK 113742244 06 7189979369 7184590331 2018-06-24 21:06:38.085717 2018-06-24
39 ISYS LLC support, professional:, program, management 6 1.099893e+08 4.345540e+08 2014-10-16 00:00:00 2018-12-08 11715 CRIPPEN CT NaN GREAT FALLS VA VIRGINIA 220661624 10 NaN NaN 2018-06-24 21:06:38.085717 2018-06-24
40 AFOGNAK NATIVE CORPORATION support, housekeeping, facilities, operations,... 25 1.065148e+08 1.392211e+08 2011-08-05 00:00:00 2019-10-31 3909 ARCTIC BLVD STE 400 NaN ANCHORAGE AK ALASKA 995030000 NaN 9072229500 9072229501 2018-06-24 21:06:38.085717 2018-06-24
41 360 IT INTEGRATED SOLUTIONS telecom, facility, operation, maintenance, tel... 7 9.905847e+07 1.318722e+08 2014-09-15 00:00:00 2020-09-29 1943 ISAAC NEWTON SQ E #200 NaN RESTON VA VIRGINIA 201905014 11 8884580345 7034373709 2018-06-24 21:06:38.085717 2018-06-24
42 NET DIRECT SYSTEMS LLC equipment, support, adp, information, technology 1261 9.505704e+07 1.345279e+08 2010-02-01 00:00:00 2018-12-13 2524 RELIANCE AVE NaN APEX NC NORTH CAROLINA 275396346 4 NaN NaN 2018-06-24 21:06:38.085717 2018-06-24
43 NATIVE ENERGY & TECHNOLOGY INC. government, maintenance, owned, operated, (gogo) 9 9.187506e+07 3.440581e+08 2017-06-06 08:27:49 2027-06-05 12793 COGBURN AVE NaN SAN ANTONIO TX TEXAS 782492239 20 2102316060 2102316098 2018-06-24 21:06:38.085717 2018-06-24
44 HARRIS CORPORATION equipment, electronic, quick, countermeasures,... 15 8.126279e+07 3.818081e+08 2010-05-03 00:00:00 2022-10-02 2400 PALM BAY RD NE NaN PALM BAY FL FLORIDA 329053377 8 3217276361 NaN 2018-06-24 21:06:38.085717 2018-06-24
45 U.S. BANCORP support, management:, financial 4 8.077518e+07 1.165076e+08 2012-09-28 00:00:00 2023-04-27 800 NICOLLET MALL STE 1500 NaN MINNEAPOLIS MN MINNESOTA 554027000 05 2022610806 2022610800 2018-06-24 21:06:38.085717 2018-06-24
46 AIRCRAFT TRANSPORT SERVICES INC. travel, transportation, relocation, lodging, r... 8 7.895843e+07 1.744191e+08 2016-12-01 00:00:00 2020-11-30 4285 SW MARTIN HWY NaN PALM CITY FL FLORIDA 349908615 18 5612104373 NaN 2018-06-24 21:06:38.085717 2018-06-24
47 JATOM SYSTEMS INC software, information, technology, adp, telephone 5 7.623178e+07 9.659306e+07 2011-07-20 00:00:00 2020-01-19 15004 NORTHRIDGE DR # 100 NaN CHANTILLY VA VIRGINIA 201513817 10 7034665220 7034665299 2018-06-24 21:06:38.085717 2018-06-24
48 XL ASSOCIATES INC. support, professional:, program, management, m... 28 7.325194e+07 8.224773e+07 2014-07-11 00:00:00 2021-11-24 8614 WESTWOOD CENTER DR STE 700 NaN VIENNA VA VIRGINIA 221822264 11 7038480400 7032569508 2018-06-24 21:06:38.085717 2018-06-24
49 NET DIRECT SYSTEMS LLC adp, support, equipment, central, processing 73 7.168369e+07 1.407577e+08 2011-12-28 00:00:00 2016-02-25 2524 RELIANCE AVE NaN APEX NC NORTH CAROLINA 275396346 04 NaN NaN 2018-06-24 21:06:38.085717 2018-06-24

There are still a lot of duplicates, future work will be fuzzy matching these accounts.
Let's save this as a csv.

In [150]:
outfile = 'ice_data/data_out/ice_prime_contractors_aggregated.csv'
output[order_cols].to_csv(outfile, index=False)

Plotting Top Contractors

In [209]:
%matplotlib inline
import math
import numpy as np
import matplotlib.pyplot as plt
In [201]:
millnames = ['',' K',' M',' B',' T']

def millify(n):
    n = float(n)
    millidx = max(0,min(len(millnames)-1,
                        int(math.floor(0 if n == 0 else math.log10(abs(n))/3))))

    return '{:.1f}{}'.format(n / 10**(3 * millidx), millnames[millidx])
In [247]:
ax = output.head(30).plot(x='recipient_parent_name', 
                          y='current_total_value_of_awards_USD', 
                          kind='barh',
                          logx=True,
                          figsize=(8,12),
                          title= "30 Most Valuable ICE Contracts since 2004",
                          legend=False)
# annotate each bar
for rect in ax.patches: 
    width = rect.get_width()
    ax.text(1.35*rect.get_width(), rect.get_y()+ 0.5 * rect.get_height(),
            '$' + millify(width),
            ha='center', va='center')

# remove boarder    
for spine in plt.gca().spines.values():
    spine.set_visible(False)

# remoce axis titles
ax.set_xlabel("")
ax.set_ylabel("")

# remove ticks
ax.tick_params(axis=u'both', which=u'both',length=0)
ax.get_xaxis().set_ticks([])
ax.invert_yaxis();

ax.annotate('Source: USASpending.gov and @leonyin', 
            xy=(1, 0), xycoords='axes fraction',
            xytext=(-20, 10), textcoords='offset pixels',
            horizontalalignment='right',
            verticalalignment='bottom');
In [ ]: