First we import agate. Then we create an agate Table by loading data from a CSV file.
import agate
table = agate.Table.from_csv('examples/realdata/ks_1033_data.csv')
print(table)
<agate.table.Table object at 0x106a929b0>
To answer this question, we first must filter the table to only those rows which refer to a Kansas City area county
.
kansas_city = table.where(lambda r: r['county'] in ('JACKSON', 'CLAY', 'CASS', 'PLATTE'))
print(len(table.rows))
print(len(kansas_city.rows))
1575 15
We can then print the Sum of the costs of all those rows. (The cost column is named total_cost
.)
print('$%d' % kansas_city.columns['total_cost'].aggregate(agate.Sum()))
$3716
This question is more complicated. First we group the data by county
, which gives us a TableSet named counties
. A TableSet is a group of tables with the same columns.
# Group by county
counties = table.group_by('county')
print(counties.keys())
('ALLEN', 'ANDERSON', 'BARTON', 'BOURBON', 'BROWN', 'BUTLER', 'CHAUTAUQUA', 'CHEROKEE', 'CHEYENNE', 'CLAY', 'COFFEY', 'COMANCHE', 'COWLEY', 'CRAWFORD', 'DECATUR', 'DICKINSON', 'DOUGLAS', 'ELLIS', 'ELLSWORTH', 'FINNEY', 'FORD', 'FRANKLIN', 'GEARY', 'PRATT', 'GRAHAM', 'GRANT', 'GRAY', 'GREELEY', 'GREENWOOD', 'HAMILTON', 'HARVEY', 'HASKELL', 'JACKSON', 'JEFFERSON', 'JOHNSON', 'KIOWA', 'LABETTE', 'LEAVENWORTH', 'LINN', 'LOGAN', 'LYON', 'MARION', 'MARSHALL', 'MCPHERSON', 'MEADE', 'MIAMI', 'MONTGOMERY', 'NEMAHA', 'NEOSHO', 'NORTON', 'OSAGE', 'OTTAWA', 'PAWNEE', 'POTTAWATOMIE', 'RAWLINS', 'RENO', 'RICE', 'RILEY', 'ROOKS', 'SALINE', 'SCOTT', 'SEDGWICK', 'SHAWNEE', 'SHERMAN', 'SMITH', 'STAFFORD', 'SUMNER', 'THOMAS', 'WABAUNSEE', 'WICHITA', 'WILSON', 'WOODSON', 'WYANDOTTE')
We then use the aggregate
function to sum the total_cost
column for each table in the group. The resulting values are collapsed into a new table, totals
, which has a row for each county and a column named total_cost_sum
containing the new total.
# Aggregate totals for all counties
totals = counties.aggregate([
('total_cost', agate.Sum(), 'total_cost_sum')
])
print(totals.column_names)
('county', 'total_cost_sum')
Finally, we sort the counties by their total cost, limit the results to the top 10 and then print the results as a text bar chart.
totals.order_by('total_cost_sum', reverse=True).limit(20).print_bars('county', 'total_cost_sum', width=100)
county total_cost_sum SEDGWICK 977,174.45 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ COFFEY 691,749.03 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ MONTGOMERY 447,581.20 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ JOHNSON 420,628.00 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ SALINE 245,450.24 ▓░░░░░░░░░░░░░░░░░░ FINNEY 171,862.20 ▓░░░░░░░░░░░░░ BROWN 145,254.96 ▓░░░░░░░░░░░ KIOWA 97,974.00 ▓░░░░░░░ WILSON 74,747.10 ▓░░░░░ FORD 70,780.00 ▓░░░░░ GREENWOOD 69,722.00 ▓░░░░░ DOUGLAS 68,069.42 ▓░░░░░ MIAMI 64,691.09 ▓░░░░░ LYON 51,236.00 ▓░░░░ HAMILTON 47,989.00 ▓░░░░ WYANDOTTE 45,259.50 ▓░░░ CHEYENNE 35,970.92 ▓░░░ PRATT 35,745.60 ▓░░░ FRANKLIN 34,309.00 ▓░░░ HARVEY 27,387.42 ▓░░ +-----------------+-----------------+------------------+-----------------+ 0 250,000 500,000 750,000 1,000,000