This is Exploratory Descriptive Analisys of expendure with local transportation. So we will create three notebook to explore the data of the follows subquotas: 'Taxi, toll and parking', 'Automotive vehicle renting or charter' and 'Fuels and lubricants'. We basically used the same analysis used by Irio in his descriptive analysis of all dataset in https://github.com/datasciencebr/serenata-de-amor/blob/master/develop/2016-08-13-irio-descriptive-analysis.ipynb
The anlisys of others subquota ara in:
'Automotive vehicle renting or charter': work in progress
'Fuels and lubricants': work in progress
Importing the dataset
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import math
data = pd.read_csv('../data/2017-03-14-reimbursements.xz',
parse_dates=[16],
low_memory=False,
dtype={'document_id': np.str,
'congressperson_id': np.str,
'congressperson_document': np.str,
'term_id': np.str,
'cnpj_cpf': np.str,
'reimbursement_number': np.str,})
All reimbursement of subquota 'Taxi, toll and parking' will be set in data_taxi variable, this is the main data who we will will use in this notebook.
data_taxi = data[data['subquota_description'] == 'Taxi, toll and parking']
Since 2013 was 70.088 documents of reimbursement.
print(data_taxi.shape)
(70088, 31)
data_taxi.head()
year | applicant_id | document_id | reimbursement_value_total | total_net_value | reimbursement_numbers | congressperson_name | congressperson_id | congressperson_document | term | ... | issue_date | document_value | remark_value | net_values | month | installment | passenger | leg_of_the_trip | batch_number | reimbursement_values | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
787345 | 2013 | 1001 | 5279620 | NaN | 462.98 | 4455 | DILCEU SPERAFICO | 73768 | 444 | 2015.0 | ... | 2013-11-21T00:00:00 | 462.98 | 0.0 | 462.98 | 11 | 0 | NaN | NaN | 1074463 | NaN |
787413 | 2013 | 1001 | 5322894 | NaN | 463.48 | 4530 | DILCEU SPERAFICO | 73768 | 444 | 2015.0 | ... | 2013-12-23T00:00:00 | 463.48 | 0.0 | 463.48 | 12 | 0 | NaN | NaN | 1086947 | NaN |
788066 | 2013 | 1003 | 5284413 | NaN | 80.00 | 4460 | DOMINGOS DUTRA | 74197 | 72 | 2011.0 | ... | 2013-11-25T00:00:00 | 80.00 | 0.0 | 80.00 | 11 | 0 | NaN | NaN | 1075791 | NaN |
788102 | 2013 | 1003 | 5292845 | NaN | 46.00 | 4465 | DOMINGOS DUTRA | 74197 | 72 | 2011.0 | ... | 2013-12-02T00:00:00 | 46.00 | 0.0 | 46.00 | 12 | 0 | NaN | NaN | 1078244 | NaN |
789539 | 2013 | 1006 | 5286676 | NaN | 18.00 | 4461 | EDINHO ARAÚJO | 73447 | 344 | 2015.0 | ... | 2013-12-02T00:00:00 | 18.00 | 0.0 | 18.00 | 12 | 0 | NaN | NaN | 1076442 | NaN |
5 rows × 31 columns
data_taxi.iloc[0]
year 2013 applicant_id 1001 document_id 5279620 reimbursement_value_total NaN total_net_value 462.98 reimbursement_numbers 4455 congressperson_name DILCEU SPERAFICO congressperson_id 73768 congressperson_document 444 term 2015 state PR party PP term_id 55 subquota_number 122 subquota_description Taxi, toll and parking subquota_group_id 0 subquota_group_description nan supplier VIA FACIL - SEM PARAR cnpj_cpf 04088208000165 document_number 131051827 document_type 0 issue_date 2013-11-21T00:00:00 document_value 462.98 remark_value 0 net_values 462.98 month 11 installment 0 passenger NaN leg_of_the_trip NaN batch_number 1074463 reimbursement_values NaN Name: 787345, dtype: object
All the expedures sums more than R$2.2 Million
data_taxi['net_values'].sum()
2224660.8799999999
data_taxi['net_values'].describe()
count 70088.000000 mean 31.740967 std 88.476322 min 0.300000 25% 5.500000 50% 10.300000 75% 26.000000 max 2500.000000 Name: net_values, dtype: float64
sns.distplot(data_taxi['net_values'])
<matplotlib.axes._subplots.AxesSubplot at 0x1428206b940>
The most expensive reimbursement was R$2.500
most_expensive_reimbursement = \
data_taxi[data_taxi['net_values'] == data_taxi['net_values'].max()].iloc[0]
most_expensive_reimbursement
year 2014 applicant_id 1167 document_id 5563397 reimbursement_value_total NaN total_net_value 2500 reimbursement_numbers 4850 congressperson_name SALVADOR ZIMBALDI congressperson_id 73559 congressperson_document 391 term 2011 state SP party PROS term_id 54 subquota_number 122 subquota_description Taxi, toll and parking subquota_group_id 0 subquota_group_description nan supplier CENTRO DE GESTÃO DE MEIOS DE PGTO. S/A cnpj_cpf 04088208000165 document_number 161776103 document_type 0 issue_date 2014-09-23T00:00:00 document_value 3206.2 remark_value 706.2 net_values 2500 month 9 installment 0 passenger NaN leg_of_the_trip NaN batch_number 1155664 reimbursement_values NaN Name: 993866, dtype: object
Here we considered outliers all 0,5% biggest reimbursement values. We found 350 outliers in a total of R$338,060.46.
data_taxi = data_taxi.sort_values('net_values', ascending=False)
outliers = data_taxi.head(math.floor(len(data_taxi)*0.005))
sns.distplot(outliers['net_values'])
<matplotlib.axes._subplots.AxesSubplot at 0x142a5eec390>
len(outliers)
350
outliers['net_values'].sum()
338060.45999999996
outliers
year | applicant_id | document_id | reimbursement_value_total | total_net_value | reimbursement_numbers | congressperson_name | congressperson_id | congressperson_document | term | ... | issue_date | document_value | remark_value | net_values | month | installment | passenger | leg_of_the_trip | batch_number | reimbursement_values | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1036569 | 2014 | 1815 | 5533245 | NaN | 2500.00 | 4808 | DR. UBIALI | 141498 | 541 | 2011.0 | ... | 2014-09-18T00:00:00 | 2823.86 | 323.86 | 2500.00 | 9 | 0 | NaN | NaN | 1146085 | NaN |
993866 | 2014 | 1167 | 5563397 | NaN | 2500.00 | 4850 | SALVADOR ZIMBALDI | 73559 | 391 | 2011.0 | ... | 2014-09-23T00:00:00 | 3206.20 | 706.20 | 2500.00 | 9 | 0 | NaN | NaN | 1155664 | NaN |
1497821 | 2016 | 3013 | 5972150 | NaN | 2413.85 | 5418 | MARCELO SQUASSONI | 118594 | 369 | 2015.0 | ... | 2016-03-31T00:00:00 | 2413.85 | 0.00 | 2413.85 | 3 | 0 | NaN | NaN | 1282605 | NaN |
1483507 | 2016 | 2973 | 6051588 | NaN | 2400.00 | 5512 | JOAQUIM PASSARINHO | 178910 | 33 | 2015.0 | ... | 2016-06-20T00:00:00 | 2400.00 | 0.00 | 2400.00 | 7 | 0 | NaN | NaN | 1307898 | NaN |
1497918 | 2016 | 3013 | 6095827 | NaN | 2317.56 | 5546 | MARCELO SQUASSONI | 118594 | 369 | 2015.0 | ... | 2016-07-30T00:00:00 | 2317.56 | 0.00 | 2317.56 | 7 | 0 | NaN | NaN | 1321471 | NaN |
1497919 | 2016 | 3013 | 6095852 | NaN | 2305.90 | 5547 | MARCELO SQUASSONI | 118594 | 369 | 2015.0 | ... | 2016-08-30T00:00:00 | 2305.90 | 0.00 | 2305.90 | 8 | 0 | NaN | NaN | 1321481 | NaN |
1413679 | 2016 | 1947 | 6101806 | NaN | 2203.74 | 5554 | VANDERLEI MACRIS | 141553 | 394 | 2015.0 | ... | 2016-09-17T00:00:00 | 2203.74 | 0.00 | 2203.74 | 9 | 0 | NaN | NaN | 1323386 | NaN |
1459941 | 2016 | 2895 | 5981974 | NaN | 2173.19 | 5435 | VALMIR PRASCIDELLI | 113247 | 393 | 2015.0 | ... | 2016-04-08T00:00:00 | 2173.19 | 0.00 | 2173.19 | 4 | 0 | NaN | NaN | 1285617 | NaN |
1497876 | 2016 | 3013 | 6037369 | NaN | 2002.41 | 5490 | MARCELO SQUASSONI | 118594 | 369 | 2015.0 | ... | 2016-05-31T00:00:00 | 2002.41 | 0.00 | 2002.41 | 5 | 0 | NaN | NaN | 1303026 | NaN |
1413585 | 2016 | 1947 | 6077062 | NaN | 1986.91 | 5529 | VANDERLEI MACRIS | 141553 | 394 | 2015.0 | ... | 2016-08-17T00:00:00 | 1986.91 | 0.00 | 1986.91 | 8 | 0 | NaN | NaN | 1315693 | NaN |
993867 | 2014 | 1167 | 5564907 | NaN | 1952.66 | 4850 | SALVADOR ZIMBALDI | 73559 | 391 | 2011.0 | ... | 2014-10-23T00:00:00 | 1952.66 | 0.00 | 1952.66 | 10 | 0 | NaN | NaN | 1155664 | NaN |
1497842 | 2016 | 3013 | 5992542 | NaN | 1890.04 | 5441 | MARCELO SQUASSONI | 118594 | 369 | 2015.0 | ... | 2016-04-30T00:00:00 | 1890.04 | 0.00 | 1890.04 | 4 | 0 | NaN | NaN | 1289215 | NaN |
1082179 | 2014 | 2268 | 5495314 | NaN | 1839.93 | 4754 | NEWTON LIMA | 160657 | 377 | 2011.0 | ... | 2014-08-09T00:00:00 | 1868.75 | 28.82 | 1839.93 | 8 | 0 | NaN | NaN | 1135781 | NaN |
1082278 | 2014 | 2268 | 5546709 | NaN | 1830.54 | 4823 | NEWTON LIMA | 160657 | 377 | 2011.0 | ... | 2014-11-08T00:00:00 | 1830.54 | 0.00 | 1830.54 | 11 | 0 | NaN | NaN | 1150135 | NaN |
1451936 | 2016 | 2437 | 5995280 | NaN | 1800.00 | 5447 | MAGDA MOFATTO | 166402 | 431 | 2015.0 | ... | 2016-04-04T00:00:00 | 1800.00 | 0.00 | 1800.00 | 4 | 0 | NaN | NaN | 1290174 | NaN |
1413751 | 2016 | 1947 | 6128503 | NaN | 1740.94 | 5586 | VANDERLEI MACRIS | 141553 | 394 | 2015.0 | ... | 2016-10-17T00:00:00 | 1740.94 | 0.00 | 1740.94 | 10 | 0 | NaN | NaN | 1331927 | NaN |
1082251 | 2014 | 2268 | 5524456 | NaN | 1719.14 | 4800 | NEWTON LIMA | 160657 | 377 | 2011.0 | ... | 2014-10-08T00:00:00 | 1719.14 | 0.00 | 1719.14 | 10 | 0 | NaN | NaN | 1143704 | NaN |
1036391 | 2014 | 1815 | 5495120 | NaN | 1681.90 | 4754 | DR. UBIALI | 141498 | 541 | 2011.0 | ... | 2014-06-18T00:00:00 | 1681.90 | 0.00 | 1681.90 | 6 | 0 | NaN | NaN | 1135727 | NaN |
1237164 | 2015 | 2268 | 5581444 | NaN | 1681.89 | 4882 | NEWTON LIMA | 160657 | 377 | 2011.0 | ... | 2015-01-08T00:00:00 | 1681.89 | 0.00 | 1681.89 | 1 | 0 | NaN | NaN | 1160683 | NaN |
1323695 | 2015 | 3013 | 5861827 | NaN | 1660.15 | 5249 | MARCELO SQUASSONI | 118594 | 369 | 2015.0 | ... | 2015-09-30T00:00:00 | 1660.15 | 0.00 | 1660.15 | 9 | 0 | NaN | NaN | 1247498 | NaN |
1476896 | 2016 | 2950 | 6126561 | NaN | 1658.61 | 5582 | CAPITÃO AUGUSTO | 178829 | 346 | 2015.0 | ... | 2016-10-12T00:00:00 | 1658.61 | 0.00 | 1658.61 | 10 | 0 | NaN | NaN | 1331197 | NaN |
1508873 | 2016 | 3052 | 6132061 | NaN | 1616.48 | 5586 | LUIZ LAURO FILHO | 178982 | 365 | 2015.0 | ... | 2016-10-17T00:00:00 | 1616.48 | 0.00 | 1616.48 | 10 | 0 | NaN | NaN | 1332896 | NaN |
1036297 | 2014 | 1815 | 5461040 | NaN | 1613.90 | 4717 | DR. UBIALI | 141498 | 541 | 2011.0 | ... | 2014-04-18T00:00:00 | 1613.90 | 0.00 | 1613.90 | 4 | 0 | NaN | NaN | 1126706 | NaN |
1267400 | 2015 | 2437 | 5889725 | NaN | 1600.00 | 5303 | MAGDA MOFATTO | 166402 | 431 | 2015.0 | ... | 2015-12-02T00:00:00 | 1600.00 | 0.00 | 1600.00 | 12 | 0 | NaN | NaN | 1256339 | NaN |
1082108 | 2014 | 2268 | 5458401 | NaN | 1588.52 | 4716 | NEWTON LIMA | 160657 | 377 | 2011.0 | ... | 2014-06-08T00:00:00 | 1650.23 | 61.71 | 1588.52 | 6 | 0 | NaN | NaN | 1125994 | NaN |
1082211 | 2014 | 2268 | 5507431 | NaN | 1555.57 | 4770 | NEWTON LIMA | 160657 | 377 | 2011.0 | ... | 2014-09-08T00:00:00 | 1596.05 | 40.48 | 1555.57 | 9 | 0 | NaN | NaN | 1139094 | NaN |
1413384 | 2016 | 1947 | 6027361 | NaN | 1548.71 | 5478 | VANDERLEI MACRIS | 141553 | 394 | 2015.0 | ... | 2016-06-18T00:00:00 | 1548.71 | 0.00 | 1548.71 | 6 | 0 | NaN | NaN | 1300241 | NaN |
1267258 | 2015 | 2437 | 5809088 | NaN | 1542.90 | 5192 | MAGDA MOFATTO | 166402 | 431 | 2015.0 | ... | 2015-09-01T00:00:00 | 1600.00 | 57.10 | 1542.90 | 8 | 0 | NaN | NaN | 1231280 | NaN |
1297319 | 2015 | 2950 | 5740573 | NaN | 1537.74 | 5100 | CAPITÃO AUGUSTO | 178829 | 346 | 2015.0 | ... | 2015-07-13T00:00:00 | 1537.74 | 0.00 | 1537.74 | 7 | 0 | NaN | NaN | 1210845 | NaN |
1082058 | 2014 | 2268 | 5431456 | NaN | 1528.13 | 4690 | NEWTON LIMA | 160657 | 377 | 2011.0 | ... | 2014-05-08T00:00:00 | 1528.13 | 0.00 | 1528.13 | 5 | 0 | NaN | NaN | 1118329 | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1184933 | 2015 | 1669 | 5593964 | NaN | 636.29 | 4902 | JEFFERSON CAMPOS | 74273 | 349 | 2015.0 | ... | 2015-01-18T00:00:00 | 636.29 | 0.00 | 636.29 | 1 | 0 | NaN | NaN | 1164495 | NaN |
1200426 | 2015 | 1846 | 5589815 | NaN | 635.60 | 4894 | GUILHERME CAMPOS | 141447 | 357 | 2011.0 | ... | 2015-01-23T00:00:00 | 635.60 | 0.00 | 635.60 | 1 | 0 | NaN | NaN | 1163195 | NaN |
1020920 | 2014 | 1669 | 5404151 | NaN | 631.06 | 4659 | JEFFERSON CAMPOS | 74273 | 349 | 2015.0 | ... | 2014-04-18T00:00:00 | 631.06 | 0.00 | 631.06 | 4 | 0 | NaN | NaN | 1110963 | NaN |
1444625 | 2016 | 2388 | 5947278 | NaN | 630.27 | 5393 | MISSIONÁRIO JOSÉ OLIMPIO | 160561 | 375 | 2015.0 | ... | 2016-03-03T00:00:00 | 630.27 | 0.00 | 630.27 | 2 | 0 | NaN | NaN | 1275037 | NaN |
982269 | 2014 | 1001 | 5527835 | NaN | 628.68 | 4805 | DILCEU SPERAFICO | 73768 | 444 | 2015.0 | ... | 2014-10-23T00:00:00 | 628.68 | 0.00 | 628.68 | 10 | 0 | NaN | NaN | 1144671 | NaN |
1077175 | 2014 | 2253 | 5372507 | NaN | 627.48 | 4626 | RICARDO IZAR | 160655 | 383 | 2015.0 | ... | 2014-03-18T00:00:00 | 627.48 | 0.00 | 627.48 | 3 | 0 | NaN | NaN | 1101693 | NaN |
1082209 | 2014 | 2268 | 5506689 | NaN | 626.70 | 4769 | NEWTON LIMA | 160657 | 377 | 2011.0 | ... | 2014-09-03T00:00:00 | 626.70 | 0.00 | 626.70 | 9 | 0 | NaN | NaN | 1138904 | NaN |
993524 | 2014 | 116 | 5465220 | NaN | 626.61 | 4719 | ROBERTO FREIRE | 73805 | 546 | 2015.0 | ... | 2014-06-18T00:00:00 | 626.61 | 0.00 | 626.61 | 6 | 0 | NaN | NaN | 1127708 | NaN |
1484654 | 2016 | 2977 | 5929792 | NaN | 626.45 | 5379 | EVANDRO ROMAN | 178930 | 452 | 2015.0 | ... | 2016-02-03T00:00:00 | 626.45 | 0.00 | 626.45 | 2 | 0 | NaN | NaN | 1269219 | NaN |
1483261 | 2016 | 2971 | 6129110 | NaN | 625.81 | 5584 | FLAVINHO | 178978 | 357 | 2015.0 | ... | 2016-10-07T00:00:00 | 625.81 | 0.00 | 625.81 | 10 | 0 | NaN | NaN | 1331868 | NaN |
1430000 | 2016 | 2277 | 6063426 | NaN | 625.80 | 5517 | SANDRO ALEX | 160621 | 465 | 2015.0 | ... | 2016-07-18T00:00:00 | 625.80 | 0.00 | 625.80 | 7 | 0 | NaN | NaN | 1311292 | NaN |
1238618 | 2015 | 2277 | 5806722 | NaN | 625.40 | 5175 | SANDRO ALEX | 160621 | 465 | 2015.0 | ... | 2015-09-18T00:00:00 | 625.40 | 0.00 | 625.40 | 9 | 0 | NaN | NaN | 1230782 | NaN |
1456171 | 2016 | 288 | 6018956 | NaN | 625.30 | 5468 | ANTONIO CARLOS MENDES THAME | 73584 | 519 | 2015.0 | ... | 2016-05-31T00:00:00 | 625.30 | 0.00 | 625.30 | 5 | 0 | NaN | NaN | 1297863 | NaN |
1006960 | 2014 | 1455 | 5528411 | NaN | 624.80 | 4805 | LUIS CARLOS HEINZE | 73483 | 500 | 2015.0 | ... | 2014-10-18T00:00:00 | 624.80 | 0.00 | 624.80 | 10 | 0 | NaN | NaN | 1144678 | NaN |
1185077 | 2015 | 1669 | 5767750 | NaN | 623.90 | 5129 | JEFFERSON CAMPOS | 74273 | 349 | 2015.0 | ... | 2015-08-19T00:00:00 | 623.90 | 0.00 | 623.90 | 8 | 0 | NaN | NaN | 1219056 | NaN |
1414320 | 2016 | 1951 | 6129130 | NaN | 622.39 | 5579 | VINICIUS CARVALHO | 141555 | 397 | 2015.0 | ... | 2016-10-02T00:00:00 | 622.39 | 0.00 | 622.39 | 10 | 0 | NaN | NaN | 1331878 | NaN |
1445049 | 2016 | 2388 | 6113697 | NaN | 614.91 | 5568 | MISSIONÁRIO JOSÉ OLIMPIO | 160561 | 375 | 2015.0 | ... | 2016-09-02T00:00:00 | 614.91 | 0.00 | 614.91 | 8 | 0 | NaN | NaN | 1327277 | NaN |
1429916 | 2016 | 2277 | 5978074 | NaN | 612.80 | 5430 | SANDRO ALEX | 160621 | 465 | 2015.0 | ... | 2016-04-18T00:00:00 | 612.80 | 0.00 | 612.80 | 4 | 0 | NaN | NaN | 1285026 | NaN |
1079538 | 2014 | 2261 | 5517003 | NaN | 610.00 | 4792 | KEIKO OTA | 160659 | 356 | 2015.0 | ... | 2014-08-06T00:00:00 | 610.00 | 0.00 | 610.00 | 8 | 0 | NaN | NaN | 1141599 | NaN |
1381959 | 2016 | 1467 | 6129413 | NaN | 608.41 | 5584 | CELSO JACOB | 73568 | 537 | 2015.0 | ... | 2016-10-02T00:00:00 | 608.41 | 0.00 | 608.41 | 9 | 0 | NaN | NaN | 1331957 | NaN |
1414888 | 2016 | 1963 | 5958647 | NaN | 605.26 | 5407 | CARLOS ZARATTINI | 141398 | 398 | 2015.0 | ... | 2016-02-23T00:00:00 | 605.26 | 0.00 | 605.26 | 2 | 0 | NaN | NaN | 1278127 | NaN |
1077357 | 2014 | 2253 | 5427810 | NaN | 603.48 | 4683 | RICARDO IZAR | 160655 | 383 | 2015.0 | ... | 2014-05-18T00:00:00 | 603.48 | 0.00 | 603.48 | 5 | 0 | NaN | NaN | 1117447 | NaN |
1303750 | 2015 | 2965 | 5874991 | NaN | 602.70 | 5303 | SAMUEL MOREIRA | 178990 | 389 | 2015.0 | ... | 2015-12-08T00:00:00 | 602.70 | 0.00 | 602.70 | 12 | 0 | NaN | NaN | 1251641 | NaN |
1097024 | 2014 | 2332 | 5381192 | NaN | 602.68 | 4637 | JUNJI ABE | 160544 | 355 | 2011.0 | ... | 2014-03-08T00:00:00 | 602.68 | 0.00 | 602.68 | 3 | 0 | NaN | NaN | 1104103 | NaN |
904291 | 2013 | 2277 | 5304094 | NaN | 602.63 | 4479 | SANDRO ALEX | 160621 | 465 | 2015.0 | ... | 2013-11-18T00:00:00 | 602.63 | 0.00 | 602.63 | 11 | 0 | NaN | NaN | 1082019 | NaN |
1084481 | 2014 | 2277 | 5357580 | NaN | 602.33 | 4626 | SANDRO ALEX | 160621 | 465 | 2015.0 | ... | 2014-02-18T00:00:00 | 602.33 | 0.00 | 602.33 | 2 | 0 | NaN | NaN | 1097320 | NaN |
1175258 | 2015 | 1467 | 5687061 | NaN | 602.24 | 5039 | CELSO JACOB | 73568 | 537 | 2015.0 | ... | 2015-05-03T00:00:00 | 602.24 | 0.00 | 602.24 | 5 | 0 | NaN | NaN | 1193458 | NaN |
1444879 | 2016 | 2388 | 6042414 | NaN | 601.92 | 5492 | MISSIONÁRIO JOSÉ OLIMPIO | 160561 | 375 | 2015.0 | ... | 2016-06-03T00:00:00 | 601.92 | 0.00 | 601.92 | 6 | 0 | NaN | NaN | 1305168 | NaN |
1142623 | 2014 | 288 | 5539620 | NaN | 601.38 | 4820 | ANTONIO CARLOS MENDES THAME | 73584 | 519 | 2015.0 | ... | 2014-10-31T00:00:00 | 601.38 | 0.00 | 601.38 | 10 | 0 | NaN | NaN | 1149401 | NaN |
1081768 | 2014 | 2267 | 5569694 | NaN | 601.14 | 4850 | FRANCISCO FLORIANO | 160677 | 297 | 2015.0 | ... | 2014-12-13T00:00:00 | 601.14 | 0.00 | 601.14 | 12 | 0 | NaN | NaN | 1156989 | NaN |
350 rows × 31 columns
In total 509 congressperson ask by reimbursement and most of them (75%) received R$5,364.90.
But some congresperson received more them R$50.000,00 in the same perriod.
len(data_taxi['applicant_id'].unique())
509
applicants_by_net_value = \
pd.DataFrame(data_taxi.groupby(['applicant_id'], as_index=False).sum()[['applicant_id', 'net_values']])
congressperson_list = data_taxi[
['applicant_id', 'congressperson_name', 'party', 'state']]
congressperson_list = congressperson_list. \
drop_duplicates('applicant_id', keep='first')
ranking = pd.merge(applicants_by_net_value,
congressperson_list,
how='left',
on='applicant_id').sort_values('net_values', ascending=False)
ranking.head(10)
applicant_id | net_values | congressperson_name | party | state | |
---|---|---|---|---|---|
215 | 2238 | 53715.68 | ZECA DIRCEU | PT | PR |
217 | 2240 | 47110.69 | ALESSANDRO MOLON | REDE | RJ |
162 | 1854 | 45732.20 | JÔ MORAES | PCdoB | MG |
185 | 1947 | 41641.57 | VANDERLEI MACRIS | PSDB | SP |
308 | 2409 | 41414.50 | NELSON MARCHEZAN JUNIOR | PSDB | RS |
145 | 1799 | 32989.44 | CHICO D'ANGELO | PT | RJ |
321 | 2437 | 31355.95 | MAGDA MOFATTO | PR | GO |
464 | 3043 | 28171.36 | DIEGO GARCIA | PHS | PR |
233 | 2268 | 26985.88 | NEWTON LIMA | PT | SP |
358 | 2887 | 26589.28 | DR. SINVAL MALHEIROS | PTN | SP |
ranking['net_values'].describe()
count 509.000000 mean 4370.650059 std 7104.358064 min 2.500000 25% 322.500000 50% 1492.000000 75% 5364.900000 max 53715.680000 Name: net_values, dtype: float64
graph = sns.barplot(x='congressperson_name',
y='net_values',
data=ranking)
graph.axes.get_xaxis().set_ticks([]); None
There are 1.797 suppliers, two of them are more than R$480.000. They are SINPETAXI, syndicate of taxi from Brasilia and CENTRO DE GESTAO DE MEIOS DE PAGAMENTO S.A., owner of SEM PARAR, company of payment of toll.
suppliers_list = pd.DataFrame(data_taxi.groupby(['cnpj_cpf'], as_index=False)['net_values'].sum())
suppliers_list = suppliers_list.sort_values('net_values', ascending=False)
suppliers_list
cnpj_cpf | net_values | |
---|---|---|
2 | 00031708000100 | 521118.19 |
402 | 04088208000165 | 487818.29 |
672 | 07424109000103 | 45667.10 |
1394 | 37990298000134 | 41582.23 |
1476 | 52378239000101 | 39189.57 |
251 | 02451848000162 | 37425.90 |
995 | 12623662000105 | 37156.15 |
1588 | 60537263089981 | 33757.50 |
1401 | 38069159000135 | 32617.62 |
4 | 00031708000371 | 30692.50 |
34 | 00521294000105 | 24681.63 |
714 | 08046163000117 | 22973.87 |
950 | 11385706000135 | 20797.00 |
360 | 03669310000191 | 17488.00 |
1115 | 16809351000188 | 16381.58 |
1451 | 46553947000120 | 15459.80 |
1375 | 35919919000168 | 15000.00 |
162 | 02103836000147 | 14219.00 |
172 | 02104236010247 | 13271.00 |
201 | 02221358000170 | 13127.20 |
1128 | 16987837000106 | 12984.40 |
387 | 03960014000145 | 11965.00 |
504 | 04994250000145 | 11831.00 |
1101 | 15695357000109 | 11055.18 |
882 | 10490515000170 | 10940.26 |
256 | 02509491000126 | 10689.60 |
124 | 01654604000386 | 10245.10 |
854 | 09704248334 | 9913.00 |
258 | 02511048000190 | 9863.60 |
625 | 06794783000109 | 9531.00 |
... | ... | ... |
1776 | 93015006004968 | 5.00 |
1534 | 60537263011877 | 5.00 |
1150 | 17559790000142 | 5.00 |
608 | 06249384000167 | 5.00 |
761 | 08718043000623 | 5.00 |
1582 | 60537263082200 | 5.00 |
288 | 02865183000133 | 5.00 |
1022 | 13540559000156 | 5.00 |
1561 | 60537263053529 | 5.00 |
71 | 01045156000151 | 5.00 |
1062 | 14753717000119 | 5.00 |
589 | 06070835000285 | 5.00 |
250 | 02451202000185 | 4.90 |
1023 | 13540559001128 | 4.50 |
803 | 09109567001902 | 4.05 |
1010 | 13220737000161 | 4.00 |
1780 | 93785210009501 | 4.00 |
975 | 11964158000289 | 4.00 |
74 | 01086379000167 | 4.00 |
1271 | 22468215000174 | 4.00 |
361 | 03684434000227 | 3.50 |
176 | 02110429000167 | 3.00 |
29 | 00452590000193 | 3.00 |
661 | 07281356000351 | 3.00 |
1293 | 24144040000175 | 2.40 |
154 | 02017619001610 | 2.35 |
873 | 10387986000158 | 2.30 |
181 | 02169187000187 | 2.00 |
133 | 01808151002268 | 1.25 |
143 | 01839320000100 | 1.10 |
1797 rows × 2 columns
suppliers_list['net_values'].describe()
count 1797.000000 mean 1234.805965 std 17086.954787 min 1.100000 25% 20.000000 50% 52.000000 75% 181.000000 max 521118.190000 Name: net_values, dtype: float64
sns.distplot(suppliers_list['net_values'])
<matplotlib.axes._subplots.AxesSubplot at 0x142a660c668>
Here we considered outliers all 1% biggest suppliers. We found 17 outliers in a total of R$1.439.806.92
R$1.008.936,47 from SINPETAXI and SEM PARAR
R$430.870,44 from other companies
outliers_suppliers = suppliers_list.head(math.floor(len(suppliers_list)*0.01))
sns.distplot(outliers_suppliers['net_values'])
<matplotlib.axes._subplots.AxesSubplot at 0x142883311d0>
outliers_suppliers['net_values'].describe()
count 17.000000 mean 84694.525294 std 158404.535386 min 15000.000000 25% 20797.000000 50% 32617.620000 75% 39189.570000 max 521118.190000 Name: net_values, dtype: float64
outliers_suppliers['net_values'].sum()
1439806.9299999997
outliers_suppliers['net_values'].head(2).sum()
1008936.4799999999
outliers_suppliers['net_values'].tail(len(outliers_suppliers)-2).sum()
430870.44999999984
outliers_suppliers = suppliers_list[~suppliers_list.isin(data_wo_outliers_suppliers)['cnpj_cpf']]
print(len(outliers_suppliers), len(outliers_suppliers) / len(suppliers_list))
2 0.0011129660545353367
We could note there a long tail in the reimbursements, there a group of congresspeople who expend much more then others. We could observe too, two big suppliers, a taxi syndicate from Brasilia and a company of payment of toll. The sugestion of next stepes are: