Using agate in a Jupyter notebook

First we import agate. Then we create an agate Table by loading data from a CSV file.

In [2]:
import agate

table = agate.Table.from_csv('examples/realdata/ks_1033_data.csv')

print(table)
<agate.table.Table object at 0x106a929b0>

Question 1: What was the total cost to Kansas City area counties?

To answer this question, we first must filter the table to only those rows which refer to a Kansas City area county.

In [3]:
kansas_city = table.where(lambda r: r['county'] in ('JACKSON', 'CLAY', 'CASS', 'PLATTE'))

print(len(table.rows))
print(len(kansas_city.rows))
1575
15

We can then print the Sum of the costs of all those rows. (The cost column is named total_cost.)

In [4]:
print('$%d' % kansas_city.columns['total_cost'].aggregate(agate.Sum()))
$3716

Question 2: Which counties spent the most?

This question is more complicated. First we group the data by county, which gives us a TableSet named counties. A TableSet is a group of tables with the same columns.

In [5]:
# Group by county
counties = table.group_by('county')

print(counties.keys())
('ALLEN', 'ANDERSON', 'BARTON', 'BOURBON', 'BROWN', 'BUTLER', 'CHAUTAUQUA', 'CHEROKEE', 'CHEYENNE', 'CLAY', 'COFFEY', 'COMANCHE', 'COWLEY', 'CRAWFORD', 'DECATUR', 'DICKINSON', 'DOUGLAS', 'ELLIS', 'ELLSWORTH', 'FINNEY', 'FORD', 'FRANKLIN', 'GEARY', 'PRATT', 'GRAHAM', 'GRANT', 'GRAY', 'GREELEY', 'GREENWOOD', 'HAMILTON', 'HARVEY', 'HASKELL', 'JACKSON', 'JEFFERSON', 'JOHNSON', 'KIOWA', 'LABETTE', 'LEAVENWORTH', 'LINN', 'LOGAN', 'LYON', 'MARION', 'MARSHALL', 'MCPHERSON', 'MEADE', 'MIAMI', 'MONTGOMERY', 'NEMAHA', 'NEOSHO', 'NORTON', 'OSAGE', 'OTTAWA', 'PAWNEE', 'POTTAWATOMIE', 'RAWLINS', 'RENO', 'RICE', 'RILEY', 'ROOKS', 'SALINE', 'SCOTT', 'SEDGWICK', 'SHAWNEE', 'SHERMAN', 'SMITH', 'STAFFORD', 'SUMNER', 'THOMAS', 'WABAUNSEE', 'WICHITA', 'WILSON', 'WOODSON', 'WYANDOTTE')

We then use the aggregate function to sum the total_cost column for each table in the group. The resulting values are collapsed into a new table, totals, which has a row for each county and a column named total_cost_sum containing the new total.

In [6]:
# Aggregate totals for all counties
totals = counties.aggregate([
    ('total_cost', agate.Sum(), 'total_cost_sum')
])

print(totals.column_names)
('county', 'total_cost_sum')

Finally, we sort the counties by their total cost, limit the results to the top 10 and then print the results as a text bar chart.

In [8]:
totals.order_by('total_cost_sum', reverse=True).limit(20).print_bars('county', 'total_cost_sum', width=100)
county     total_cost_sum
SEDGWICK       977,174.45 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  
COFFEY         691,749.03 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                       
MONTGOMERY     447,581.20 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                                        
JOHNSON        420,628.00 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                                          
SALINE         245,450.24 ▓░░░░░░░░░░░░░░░░░░                                                       
FINNEY         171,862.20 ▓░░░░░░░░░░░░░                                                            
BROWN          145,254.96 ▓░░░░░░░░░░░                                                              
KIOWA           97,974.00 ▓░░░░░░░                                                                  
WILSON          74,747.10 ▓░░░░░                                                                    
FORD            70,780.00 ▓░░░░░                                                                    
GREENWOOD       69,722.00 ▓░░░░░                                                                    
DOUGLAS         68,069.42 ▓░░░░░                                                                    
MIAMI           64,691.09 ▓░░░░░                                                                    
LYON            51,236.00 ▓░░░░                                                                     
HAMILTON        47,989.00 ▓░░░░                                                                     
WYANDOTTE       45,259.50 ▓░░░                                                                      
CHEYENNE        35,970.92 ▓░░░                                                                      
PRATT           35,745.60 ▓░░░                                                                      
FRANKLIN        34,309.00 ▓░░░                                                                      
HARVEY          27,387.42 ▓░░                                                                       
                          +-----------------+-----------------+------------------+-----------------+
                          0              250,000           500,000            750,000      1,000,000