Following my analysis of federal contracts to the private prisons CoreCivic and Geo Group, this notebook analyzes publically avaiable contracts from the Immigration and Customs Enforcement Agency (ICE). This analysis uses data downloaded from the new USASpending.gov dashboard filtered on "U.S. Immigration and Customs Enforcement (ICE) | Sub-Agency".
import glob
import datetime
import pandas as pd
# data downloaded from USAspending.gov
input_file_pattern = 'ice_data/data_in/*'
files = glob.glob(input_file_pattern)
files
['ice_data/data_in/all_contracts_subawards_1.csv', 'ice_data/data_in/all_assistance_prime_transactions_1.csv', 'ice_data/data_in/all_contracts_prime_transactions_1.csv', 'ice_data/data_in/all_assistance_subawards_1.csv']
Let's just look at prime contracts today.
df_prime_contracts = pd.read_csv(files[2])
/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py:2698: DtypeWarning: Columns (19,32,38,46,47,48,49,56,65,78,79,87,106,107,135,136,204) have mixed types. Specify dtype option on import or set low_memory=False. interactivity=interactivity, compiler=compiler, result=result)
# there are a lot of columns!
prime_cols = df_prime_contracts.columns
num_cols = len(prime_cols)
print(f"There are {num_cols} columns in this file!\nHere are some of the column names:")
prime_cols.values.tolist()[:20]
There are 258 columns in this file! Here are some of the column names:
['award_id_piid', 'modification_number', 'transaction_number', 'parent_award_agency_id', 'parent_award_agency_name', 'parent_award_id', 'parent_award_modification_number', 'federal_action_obligation', 'base_and_exercised_options_value', 'current_total_value_of_award', 'base_and_all_options_value', 'potential_total_value_of_award', 'action_date', 'period_of_performance_start_date', 'period_of_performance_current_end_date', 'period_of_performance_potential_end_date', 'ordering_period_end_date', 'awarding_agency_code', 'awarding_agency_name', 'awarding_sub_agency_code']
That is a lot of columns, here are the definitions of the columns we'll be using here:
recipient_duns
The unique identification number for the ultimate parent of an awardee or recipient. Currently the identifier is the 9-digit number maintained by Dun & Bradstreet as the global parent DUNS® number.
potential_total_value_of_award
The total amount that could be obligated on a contract. This total includes the base plus options amount. For example, if a recipient is awarded 10M on a base contract with 3 option years at 1M each, the potential award amount is 13M.
current_total_value_of_award
The amount of money that the government has promised (obligated) to pay a recipient for a contract. This means the base amount and any exercised options.
period_of_performance_start_date
The date that the award begins.
period_of_performance_potential_end_date
For procurement, the date on which, for the award referred to by the action being reported if all potential pre-determined or pre-negotiated options were exercised, awardee effort is completed or the award is otherwise ended. Administrative actions related to this award may continue to occur after this date. This date does not apply to procurement indefinite delivery vehicles under which definitive orders may be awarded.
Thanks for providing that glossary USASpending, let's do some minor data wrangling and get some summary statistics.
date_cols = ['period_of_performance_start_date', 'period_of_performance_potential_end_date']
for col in date_cols:
df_prime_contracts[col] = pd.to_datetime(df_prime_contracts[col])
total_spend = df_prime_contracts['current_total_value_of_award'].sum()
potential_spend = df_prime_contracts['potential_total_value_of_award'].sum()
at_stake = potential_spend - total_spend
num_contractors = len(df_prime_contracts['recipient_duns'].unique())
first_contract = df_prime_contracts['period_of_performance_start_date'].min().strftime('%Y-%m-%d')
print("ICE has invested ${:,.2f} in contracts with {} Prime contractors "
"since {}.\nICE has ${:,.2f} in potential payouts with active contracts.".format(
total_spend, num_contractors, first_contract, at_stake))
ICE has invested $76,815,594,715.20 in contracts with 4294 Prime contractors since 2004-10-01. ICE has $50,376,993,084.23 in potential payouts with active contracts.
Let's sort this data by the contract end date
df_prime_contracts.sort_values(by=['period_of_performance_potential_end_date'],
ascending=False,
inplace=True)
There are typos in the recipient names, so let's use their DUNs IDs for aggregate analysis.
We can create a lookup_table
containing the latest name and contact info
lookup_cols = [
'recipient_parent_duns',
'recipient_parent_name',
'recipient_address_line_1',
'recipient_address_line_2',
'recipient_city_name',
'recipient_state_code',
'recipient_state_name',
'recipient_zip_4_code',
'recipient_congressional_district',
'recipient_phone_number',
'recipient_fax_number'
]
lookup_table = df_prime_contracts.drop_duplicates(
subset = ['recipient_parent_duns']
)[lookup_cols]
Now let's get how much each company has been rewarded, how much they have the potential to be rewarded, how many awards they're been granted, and when their contracts are finished. This code is chained, and kind of messy. Each line aggreagates by the recipient ID, and performs some sort of agg function, and then sorts the values according to the total of the agg function.
contract_amounts = (df_prime_contracts.groupby(
['recipient_parent_duns'])
['current_total_value_of_award'].sum()
.sort_values(ascending=False))
contract_potential = (df_prime_contracts.groupby(
['recipient_parent_duns'])
['potential_total_value_of_award'].sum()
.sort_values(ascending=False))
num_awards = (df_prime_contracts.groupby(
['recipient_parent_duns'])
['current_total_value_of_award'].count()
.sort_values(ascending=False))
contract_enddate = (df_prime_contracts[
~df_prime_contracts['period_of_performance_potential_end_date'].isnull()]
.groupby(['recipient_parent_duns'])
['period_of_performance_potential_end_date'].max()
.sort_values(ascending=False))
contract_startdate = (df_prime_contracts[
~df_prime_contracts['period_of_performance_start_date'].isnull()]
.groupby(['recipient_parent_duns'])
['period_of_performance_start_date'].min()
.sort_values(ascending=False))
For context, let's see what kinds of services each of these companies provide by counting the most frequent words in service descriptions per contractor.
from collections import Counter
from nltk.corpus import stopwords
def count_words(row, counter, sw):
'''
Splits sentences, casts words to lowercase, and counts all words that aren't in stopwords (sw)
'''
row_clean = str(row).replace('-', ' ').replace('/', ' ').lower()
counter.update([w for w in row_clean.split() if w not in sw])
sw = stopwords.words('English')
top_words = []
for dun_id_, df_ in df_prime_contracts.groupby('recipient_parent_duns'):
word_counter_ = Counter()
df_['product_or_service_code_description'].apply(count_words, args=(word_counter_, sw,))
top_words_ = ', '.join([w for w, c in word_counter_.most_common(5)])
d_ = dict(top_words_from_serv_desc = top_words_,
recipient_parent_duns = dun_id_)
top_words.append(d_)
top_words_per_service_description = pd.DataFrame(top_words)
We can join them all together using this craziness below:
output = (contract_amounts.reset_index()
.merge(
num_awards.reset_index(), on='recipient_parent_duns')
.merge(
contract_potential.reset_index(), on='recipient_parent_duns')
.merge(
contract_enddate.reset_index(), on='recipient_parent_duns')
.merge(
contract_startdate.reset_index(), on='recipient_parent_duns')
.merge(
top_words_per_service_description, on='recipient_parent_duns')
.merge(
lookup_table))
The last merge above is the look_up table, containing metadata for each recipient.
Let's change some of the aggregate column names to be more decriptive:
remapping = {
'current_total_value_of_award_x' : 'current_total_value_of_awards_USD',
'current_total_value_of_award_y' : 'number_of_prime_awards_USD',
'potential_total_value_of_award' : 'potential_total_value_of_awards',
'period_of_performance_potential_end_date' : 'last_contract_end_date',
'period_of_performance_start_date' : 'first_contract_start_date'
}
output.columns = [remapping.get(c, c) for c in output.columns]
Let's timestamp when this file was made.
output['analysis_date'] = datetime.datetime.now() # when was this file made?
output['raw_data_download_date'] = datetime.datetime(2018,6,24) # when was raw data downloaded?
And lastly, let's re-order the columns so they're readable
order_cols = [
'recipient_parent_name',
'top_words_from_serv_desc',
'number_of_prime_awards',
'current_total_value_of_awards_USD',
'potential_total_value_of_awards',
'first_contract_start_date',
'last_contract_end_date',
'recipient_address_line_1',
'recipient_address_line_2',
'recipient_city_name',
'recipient_state_code',
'recipient_state_name',
'recipient_zip_4_code',
'recipient_congressional_district',
'recipient_phone_number',
'recipient_fax_number',
'analysis_date',
'raw_data_download_date'
]
Here is the output:
for contractor, df_ in output.groupby('recipient_parent_name'):
df_['number_of_pride_awards'].sum()
df_['current_total_value_of_awards_USD'].sum()
df_['potential_total_value_of_awards'].sum()
output[order_cols].head(50)
recipient_parent_name | top_words_from_serv_desc | number_of_prime_awards | current_total_value_of_awards_USD | potential_total_value_of_awards | first_contract_start_date | last_contract_end_date | recipient_address_line_1 | recipient_address_line_2 | recipient_city_name | recipient_state_code | recipient_state_name | recipient_zip_4_code | recipient_congressional_district | recipient_phone_number | recipient_fax_number | analysis_date | raw_data_download_date | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | BATTELLE MEMORIAL INSTITUTE INC | oper, govt, r&d, goco, facilities | 1 | 2.258840e+10 | 2.258840e+10 | 2016-05-27 00:00:00 | 2017-09-30 | 902 BATTELLE BLVD | NaN | RICHLAND | WA | WASHINGTON | 993521873 | NaN | NaN | NaN | 2018-06-24 21:06:38.085717 | 2018-06-24 |
1 | SPECTRUM SECURITY SERVICES INC. | guard, housekeeping, services | 128 | 1.731890e+10 | 1.902971e+10 | 2008-09-08 00:00:00 | 2018-12-11 | 13967 HIGHWAY 94 STE 101 | NaN | JAMUL | CA | CALIFORNIA | 919353232 | 50 | NaN | NaN | 2018-06-24 21:06:38.085717 | 2018-06-24 |
2 | CSI AVIATION SERVICES INC. | travel, air, passenger, transportation, reloca... | 63 | 5.897378e+09 | 7.426944e+09 | 2009-03-31 00:00:00 | 2018-06-30 | 3700 RIO GRANDE BLVD NW | NaN | ALBUQUERQUE | NM | NEW MEXICO | 871072876 | 01 | 5057619000 | 5053427377 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
3 | INGENESIS INC. | medical, general, health, care | 20 | 3.530665e+09 | 4.874056e+09 | 2012-07-16 00:00:00 | 2018-07-15 | 10231 KOTZEBUE ST | NaN | SAN ANTONIO | TX | TEXAS | 782174430 | 21 | 2103660033 | 2105684582 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
4 | THE GEO GROUP INC | guard, housekeeping, support, services, program | 115 | 3.335807e+09 | 1.663010e+10 | 2007-06-01 00:00:00 | 2029-10-11 | 621 NW 53RD ST STE 700 | NaN | BOCA RATON | FL | FLORIDA | 334878242 | 22 | 5619997359 | 5619997648 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
5 | PHACIL INC. | telecom, help, desk | 47 | 2.889115e+09 | 3.027381e+09 | 2012-09-10 00:00:00 | 2018-09-09 | 800 N GLEBE RD STE 700 | NaN | ARLINGTON | VA | VIRGINIA | 222032149 | 8 | 7035261800 | 7033662900 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
6 | SPECTRUM SECURITY SERVICES INC. | housekeeping, guard | 13 | 1.864530e+09 | 1.997933e+09 | 2011-12-21 00:00:00 | 2016-02-29 | 13967 HIGHWAY 94 STE 101 | NaN | JAMUL | CA | CALIFORNIA | 919353232 | 52 | NaN | NaN | 2018-06-24 21:06:38.085717 | 2018-06-24 |
7 | ASSET PROTECTION & SECURITY SERVICES L.P. | guard, housekeeping, oper, govt, misc | 19 | 1.400854e+09 | 1.633666e+09 | 2009-09-30 00:00:00 | 2018-09-30 | 5502 BURNHAM DR | NaN | CORPUS CHRISTI | TX | TEXAS | 784133787 | 27 | 3619061552 | 3619061844 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
8 | WIDEPOINT CORPORATION | support, professional:, program, management | 25 | 1.318187e+09 | 5.204160e+09 | 2015-04-09 00:00:00 | 2018-12-08 | 7926 JONES BRANCH DR STE 520 | NaN | MC LEAN | VA | VIRGINIA | 221023371 | 11 | 7033495644 | 7038483560 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
9 | TRAILBOSS ENTERPRISES INC. | housekeeping, facilities, operations, support,... | 13 | 1.266682e+09 | 3.871661e+09 | 2012-06-01 00:00:00 | 2018-09-30 | 201 EAST 3RD AVE | NaN | ANCHORAGE | AK | ALASKA | 995012503 | 00 | 9073388243 | 9073380869 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
10 | CORRECTIONS CORPORATION OF AMERICA | guard, housekeeping, services, support, facili... | 58 | 1.230570e+09 | 2.311516e+09 | 2009-04-01 00:00:00 | 2019-06-30 | 10 BURTON HILLS BLVD | NaN | NASHVILLE | TN | TENNESSEE | 372156105 | 05 | 6152636713 | 6152633090 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
11 | AHTNA INCORPORATED | guard, housekeeping, services, penal, facilities | 29 | 9.644808e+08 | 3.692711e+09 | 2010-02-01 00:00:00 | 2018-11-30 | 110 W 38TH AVE STE 200D | NaN | ANCHORAGE | AK | ALASKA | 995035677 | 0 | 9073346477 | 9073346750 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
12 | G4S PLC | guard, housekeeping, services, social, rehabil... | 181 | 9.630253e+08 | 1.101117e+09 | 2009-08-12 00:00:00 | 2019-01-31 | 4200 WACKENHUT DR STE 100 | NaN | PALM BEACH GARDENS | FL | FLORIDA | 334104243 | 18 | NaN | NaN | 2018-06-24 21:06:38.085717 | 2018-06-24 |
13 | CLASSIC AIR CHARTER INC. | travel, transportation, relocation, lodging, r... | 6 | 7.483224e+08 | 3.851914e+09 | 2017-10-20 00:00:00 | 2023-06-30 | 460 NEW YORK AVE | NaN | HUNTINGTON | NY | NEW YORK | 117433432 | 03 | 6315495021 | NaN | 2018-06-24 21:06:38.085717 | 2018-06-24 |
14 | NANA REGIONAL CORPORATION INC | housekeeping, guard, relocation, services, tra... | 22 | 5.979669e+08 | 4.151455e+09 | 2008-09-25 00:00:00 | 2025-01-31 | 3901 OLD INTERNATIONAL AIRPORT RD STE 200-G | NaN | ANCHORAGE | AK | ALASKA | 99502 | 0 | 7037666837 | 7037666906 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
15 | SRA INTERNATIONAL INC. | support, professional:, communications, progra... | 45 | 5.940203e+08 | 1.370448e+09 | 2016-06-20 00:00:00 | 2022-09-29 | 4300 FAIR LAKES CT | NaN | FAIRFAX | VA | VIRGINIA | 220334232 | 11 | NaN | NaN | 2018-06-24 21:06:38.085717 | 2018-06-24 |
16 | SCIENCE APPLICATIONS INTERNATIONAL CORPORATION | telecom, systems, development, equipment, tele... | 4 | 5.322573e+08 | 7.048101e+08 | 2014-09-25 00:00:00 | 2015-09-24 | 1710 SAIC DR | NaN | MCLEAN | VA | VIRGINIA | 22102 | NaN | 7036765681 | 7036765063 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
17 | M V M INC. | guard, housekeeping, support, administrative:,... | 41 | 4.726719e+08 | 1.767799e+09 | 2008-06-27 00:00:00 | 2022-08-03 | 44620 GUILFORD DR STE 150 | NaN | ASHBURN | VA | VIRGINIA | 201476063 | 10 | 5712234620 | 5712234487 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
18 | UNIVERSAL LANGUAGE CORPORATION | translation, admin, svcs, sign, languag | 41 | 4.180052e+08 | 8.500482e+08 | 2007-09-10 00:00:00 | 2014-03-22 | 97 12 63RD DR STE 2 A | NaN | REGO PARK | NY | NEW YORK | 113742244 | 06 | 7189979369 | 7184590331 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
19 | THE MITRE CORPORATION | programming, r&d, research, development, (engi... | 27 | 4.062568e+08 | 4.108850e+08 | 2008-08-29 00:00:00 | 2018-01-12 | 7515 COLSHIRE DR | NaN | MC LEAN | VA | VIRGINIA | 221027539 | 8 | 7039836474 | 7039837121 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
20 | ENTERPRISE SERVICES LLC | telecom, data, centers, storage | 9 | 3.831977e+08 | 6.851440e+08 | 2017-06-09 00:00:00 | 2020-06-26 | 13600 EDS DR A3S-B53 | NaN | HERNDON | VA | VIRGINIA | 201713299 | 10 | 7037333793 | 7037422674 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
21 | SAVANTAGE FINANCIAL SERVICES INC. | telecommunications, telecom, adp, &, svcs | 11 | 3.674644e+08 | 6.826124e+08 | 2009-12-08 00:00:00 | 2020-12-31 | 1355 PICCARD DR STE 425 | NaN | ROCKVILLE | MD | MARYLAND | 208504339 | 8 | NaN | NaN | 2018-06-24 21:06:38.085717 | 2018-06-24 |
22 | BOOZ ALLEN HAMILTON HOLDING CORPORATION | support, professional:, program, management, t... | 43 | 3.664082e+08 | 1.161358e+09 | 2009-09-28 00:00:00 | 2023-01-16 | 8283 GREENSBORO DR | NaN | MCLEAN | VA | VIRGINIA | 221023830 | 11 | 7033770667 | 7039023200 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
23 | MAXIM HEALTHCARE SERVICES INC. | medical, general, health, care | 12 | 3.577394e+08 | 3.577394e+08 | 2014-09-11 00:00:00 | 2018-09-22 | 7227 LEE DEFOREST DR | NaN | COLUMBIA | MD | MARYLAND | 210463236 | 03 | 4109101553 | 4109101615 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
24 | DELOITTE LLP | support, management, professional:, program, f... | 70 | 3.086653e+08 | 7.260085e+08 | 2009-08-19 00:00:00 | 2022-09-22 | 1725 DUKE ST | NaN | ALEXANDRIA | VA | VIRGINIA | 223143456 | 08 | 5718826190 | 7039963387 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
25 | KNIGHT POINT SYSTEMS LLC | telecom, help, desk, telecommunications, support | 19 | 2.907098e+08 | 8.810253e+08 | 2011-08-15 00:00:00 | 2021-06-29 | 1775 WIEHLE AVE STE 101 | NaN | RESTON | VA | VIRGINIA | 201905109 | 11 | 7036577051 | 5712663106 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
26 | GLOBAL PRECISION SYSTEMS LLC | housekeeping, guard | 20 | 2.860719e+08 | 1.683557e+09 | 2015-09-23 00:00:00 | 2019-07-27 | 4600 DEBARR RD STE 200 | NaN | ANCHORAGE | AK | ALASKA | 995083103 | 00 | 9073348348 | 9075632742 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
27 | CACI INTERNATIONAL INC | support, professional:, miscellaneous, communi... | 7 | 2.709178e+08 | 6.560159e+08 | 2007-10-22 00:00:00 | 2021-03-17 | 14370 NEWBROOK DRIVE | NaN | CHANTILLY | VA | VIRGINIA | 201512218 | 10 | 7036794137 | 7036793402 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
28 | MCKINSEY & COMPANY INC. | support, management:, professional: | 14 | 2.189934e+08 | 2.845323e+08 | 2015-10-21 00:00:00 | 2019-05-23 | 1200 19TH ST NW STE 1100 | NaN | WASHINGTON | DC | DISTRICT OF COLUMBIA | 200362412 | 0 | 2026621185 | 2026623256 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
29 | PALANTIR TECHNOLOGIES INC. | data, telecom, information, broadcasting, dist... | 17 | 2.100309e+08 | 5.111377e+08 | 2011-02-24 00:00:00 | 2018-11-27 | 100 HAMILTON AVE STE 300 | NaN | PALO ALTO | CA | CALIFORNIA | 943011651 | 18 | 6508150200 | 6506182298 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
30 | BEN FITZGERALD REAL ESTATE SERVICES L.L.C. | facilities, operations, support, housekeeping,... | 165 | 1.981641e+08 | 2.393186e+08 | 2008-09-25 00:00:00 | 2019-04-30 | 1530 S SOUTHWEST LOOP 323 | NaN | TYLER | TX | TEXAS | 757012556 | NaN | NaN | NaN | 2018-06-24 21:06:38.085717 | 2018-06-24 |
31 | FORFEITURE SUPPORT ASSOCIATES LLC | support, legal, professional:, services, program | 24 | 1.971348e+08 | 1.971348e+08 | 2010-03-23 00:00:00 | 2016-09-30 | 20110 ASHBROOK PL STE 220 | NaN | ASHBURN | VA | VIRGINIA | 201475060 | 10 | 5712918900 | 5712918957 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
32 | CAP GEMINI | support, professional:, services, professional... | 13 | 1.926498e+08 | 2.366901e+08 | 2009-09-25 00:00:00 | 2023-06-30 | 1900 CAMPUS COMMONS DR #250 | NaN | RESTON | VA | VIRGINIA | 201911560 | 11 | 5713361618 | 5713361700 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
33 | PHACIL INC. | telecom, help, desk | 3 | 1.840695e+08 | 1.932425e+08 | 2013-07-04 00:00:00 | 2018-09-09 | 950 N GLEBE RD STE 910 | NaN | ARLINGTON | VA | VIRGINIA | 222034173 | 08 | 7035261800 | 7037387100 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
34 | STG INTERNATIONAL INC. | support, professional:, program, management, s... | 2 | 1.790737e+08 | 9.042798e+08 | 2007-08-22 00:00:00 | 2023-02-19 | 99 CANAL CTR PLZ STE 500 | NaN | ALEXANDRIA | VA | VIRGINIA | 223145504 | 8 | 7035786030 | 5712556545 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
35 | DEV TECHNOLOGY GROUP INC. | telecom, programming, services, systems, devel... | 24 | 1.648354e+08 | 4.167145e+08 | 2011-05-23 00:00:00 | 2022-02-28 | 1984 ISAAC NEWTON SQ W STE 302 | NaN | RESTON | VA | VIRGINIA | 201905041 | 11 | 7038608135 | 7038608138 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
36 | GRANT THORNTON INTERNATIONAL LTD | support, management, program, professional:, m... | 13 | 1.291985e+08 | 2.115375e+08 | 2008-09-17 00:00:00 | 2023-04-23 | 333 JOHN CARLYLE ST STE 500 | NaN | ALEXANDRIA | VA | VIRGINIA | 223145767 | 08 | NaN | NaN | 2018-06-24 21:06:38.085717 | 2018-06-24 |
37 | GENERAL DYNAMICS CORPORATION | ammunition,, 30mm, equipment, public, relations | 24 | 1.289099e+08 | 3.557973e+08 | 2010-05-11 00:00:00 | 2020-12-31 | 3170 FAIRVIEW PARK DR | NaN | FALLS CHURCH | VA | VIRGINIA | 220424516 | 11 | 7038761243 | NaN | 2018-06-24 21:06:38.085717 | 2018-06-24 |
38 | UNIVERSAL LANGUAGE CORPORATION | support, administrative:, translation, interpr... | 10 | 1.186904e+08 | 2.421312e+08 | 2011-12-28 00:00:00 | 2013-09-07 | 97 12 63RD DR STE 2 A | NaN | REGO PARK | NY | NEW YORK | 113742244 | 06 | 7189979369 | 7184590331 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
39 | ISYS LLC | support, professional:, program, management | 6 | 1.099893e+08 | 4.345540e+08 | 2014-10-16 00:00:00 | 2018-12-08 | 11715 CRIPPEN CT | NaN | GREAT FALLS | VA | VIRGINIA | 220661624 | 10 | NaN | NaN | 2018-06-24 21:06:38.085717 | 2018-06-24 |
40 | AFOGNAK NATIVE CORPORATION | support, housekeeping, facilities, operations,... | 25 | 1.065148e+08 | 1.392211e+08 | 2011-08-05 00:00:00 | 2019-10-31 | 3909 ARCTIC BLVD STE 400 | NaN | ANCHORAGE | AK | ALASKA | 995030000 | NaN | 9072229500 | 9072229501 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
41 | 360 IT INTEGRATED SOLUTIONS | telecom, facility, operation, maintenance, tel... | 7 | 9.905847e+07 | 1.318722e+08 | 2014-09-15 00:00:00 | 2020-09-29 | 1943 ISAAC NEWTON SQ E #200 | NaN | RESTON | VA | VIRGINIA | 201905014 | 11 | 8884580345 | 7034373709 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
42 | NET DIRECT SYSTEMS LLC | equipment, support, adp, information, technology | 1261 | 9.505704e+07 | 1.345279e+08 | 2010-02-01 00:00:00 | 2018-12-13 | 2524 RELIANCE AVE | NaN | APEX | NC | NORTH CAROLINA | 275396346 | 4 | NaN | NaN | 2018-06-24 21:06:38.085717 | 2018-06-24 |
43 | NATIVE ENERGY & TECHNOLOGY INC. | government, maintenance, owned, operated, (gogo) | 9 | 9.187506e+07 | 3.440581e+08 | 2017-06-06 08:27:49 | 2027-06-05 | 12793 COGBURN AVE | NaN | SAN ANTONIO | TX | TEXAS | 782492239 | 20 | 2102316060 | 2102316098 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
44 | HARRIS CORPORATION | equipment, electronic, quick, countermeasures,... | 15 | 8.126279e+07 | 3.818081e+08 | 2010-05-03 00:00:00 | 2022-10-02 | 2400 PALM BAY RD NE | NaN | PALM BAY | FL | FLORIDA | 329053377 | 8 | 3217276361 | NaN | 2018-06-24 21:06:38.085717 | 2018-06-24 |
45 | U.S. BANCORP | support, management:, financial | 4 | 8.077518e+07 | 1.165076e+08 | 2012-09-28 00:00:00 | 2023-04-27 | 800 NICOLLET MALL STE 1500 | NaN | MINNEAPOLIS | MN | MINNESOTA | 554027000 | 05 | 2022610806 | 2022610800 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
46 | AIRCRAFT TRANSPORT SERVICES INC. | travel, transportation, relocation, lodging, r... | 8 | 7.895843e+07 | 1.744191e+08 | 2016-12-01 00:00:00 | 2020-11-30 | 4285 SW MARTIN HWY | NaN | PALM CITY | FL | FLORIDA | 349908615 | 18 | 5612104373 | NaN | 2018-06-24 21:06:38.085717 | 2018-06-24 |
47 | JATOM SYSTEMS INC | software, information, technology, adp, telephone | 5 | 7.623178e+07 | 9.659306e+07 | 2011-07-20 00:00:00 | 2020-01-19 | 15004 NORTHRIDGE DR # 100 | NaN | CHANTILLY | VA | VIRGINIA | 201513817 | 10 | 7034665220 | 7034665299 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
48 | XL ASSOCIATES INC. | support, professional:, program, management, m... | 28 | 7.325194e+07 | 8.224773e+07 | 2014-07-11 00:00:00 | 2021-11-24 | 8614 WESTWOOD CENTER DR STE 700 | NaN | VIENNA | VA | VIRGINIA | 221822264 | 11 | 7038480400 | 7032569508 | 2018-06-24 21:06:38.085717 | 2018-06-24 |
49 | NET DIRECT SYSTEMS LLC | adp, support, equipment, central, processing | 73 | 7.168369e+07 | 1.407577e+08 | 2011-12-28 00:00:00 | 2016-02-25 | 2524 RELIANCE AVE | NaN | APEX | NC | NORTH CAROLINA | 275396346 | 04 | NaN | NaN | 2018-06-24 21:06:38.085717 | 2018-06-24 |
There are still a lot of duplicates, future work will be fuzzy matching these accounts.
Let's save this as a csv.
outfile = 'ice_data/data_out/ice_prime_contractors_aggregated.csv'
output[order_cols].to_csv(outfile, index=False)
%matplotlib inline
import math
import numpy as np
import matplotlib.pyplot as plt
millnames = ['',' K',' M',' B',' T']
def millify(n):
n = float(n)
millidx = max(0,min(len(millnames)-1,
int(math.floor(0 if n == 0 else math.log10(abs(n))/3))))
return '{:.1f}{}'.format(n / 10**(3 * millidx), millnames[millidx])
ax = output.head(30).plot(x='recipient_parent_name',
y='current_total_value_of_awards_USD',
kind='barh',
logx=True,
figsize=(8,12),
title= "30 Most Valuable ICE Contracts since 2004",
legend=False)
# annotate each bar
for rect in ax.patches:
width = rect.get_width()
ax.text(1.35*rect.get_width(), rect.get_y()+ 0.5 * rect.get_height(),
'$' + millify(width),
ha='center', va='center')
# remove boarder
for spine in plt.gca().spines.values():
spine.set_visible(False)
# remoce axis titles
ax.set_xlabel("")
ax.set_ylabel("")
# remove ticks
ax.tick_params(axis=u'both', which=u'both',length=0)
ax.get_xaxis().set_ticks([])
ax.invert_yaxis();
ax.annotate('Source: USASpending.gov and @leonyin',
xy=(1, 0), xycoords='axes fraction',
xytext=(-20, 10), textcoords='offset pixels',
horizontalalignment='right',
verticalalignment='bottom');