Functions can be used to compute new columns in a table based on existing column values. For example, we can transform the codes in the SEX column to strings that are easier to interpret.

In [3]:

def male_female(code):
    if code == 0:
        return 'Total'
    elif code == 1:
        return 'Male'
    elif code == 2:
        return 'Female'

This function takes an individual code — 0, 1, or 2 — and returns a string that describes its meaning.

In [4]:

male_female(0)

Out[4]:

'Total'

In [5]:

male_female(1)

Out[5]:

'Male'

In [6]:

male_female(2)

Out[6]:

'Female'

We could also transform ages into age categories.

In [7]:

def age_group(age):
    if age < 2:
        return 'Baby'
    elif age < 13:
        return 'Child'
    elif age < 20:
        return 'Teen'
    else:
        return 'Adult'
    
age_group(15)

Out[7]:

'Teen'

Apply. The apply method of a table calls a function on each element of a column, forming a new array of return values. To indicate which function to call, just name it (without quotation marks). The name of the column of input values must still appear within quotation marks.

In [8]:

census_2014.apply(male_female, 'SEX')

Out[8]:

array(['Total', 'Total', 'Total', ..., 'Female', 'Female', 'Female'], 
      dtype='<U6')

This array, which has the same length as the original SEX column of the population table, can be used as the values in a new column called Male/Female alongside the existing AGE and Population columns.

In [9]:

population = Table().with_columns([
        'Male/Female', census_2014.apply(male_female, 'SEX'),
        'Age Group',   census_2014.apply(age_group, 'AGE'),
        'Population',  census_2014.column('2014')
    ])
population

Out[9]:

Male/Female	Age Group	Population
Total	Baby	3948350
Total	Baby	3962123
Total	Child	3957772
Total	Child	4005190
Total	Child	4003448
Total	Child	4004858
Total	Child	4134352
Total	Child	4154000
Total	Child	4119524
Total	Child	4106832

... (293 rows omitted)

Groups. The group method with a single argument counts the number of rows for each category in a column. The result contains one row per unique value in the grouped column.

In [10]:

population.group('Age Group')

Out[10]:

Age Group	count
Adult	243
Baby	6
Child	33
Teen	21

The optional second argument names the function that will be used to aggregate values in other columns for all of those rows. For instance, sum will sum up the populations in all rows that match each category. This result also contains one row per unique value in the grouped column, but it has the same number of columns as the original table.

In [11]:

totals = population.where(0, 'Total').select(['Age Group', 'Population'])
totals

Out[11]:

Age Group	Population
Baby	3948350
Baby	3962123
Child	3957772
Child	4005190
Child	4003448
Child	4004858
Child	4134352
Child	4154000
Child	4119524
Child	4106832

... (91 rows omitted)

In [12]:

totals.group('Age Group', sum)

Out[12]:

Age Group	Population sum
Adult	236721454
Baby	7910473
Child	44755656
Teen	29469473

The groups method behaves in the same way, but accepts a list of columns as its first argument. The resulting table has one row for every unique combination of values that appear together in the grouped columns. Again, a single argument (a list, in this case) gives row counts.

In [13]:

population.groups(['Male/Female', 'Age Group'])

Out[13]:

Male/Female	Age Group	count
Female	Adult	81
Female	Baby	2
Female	Child	11
Female	Teen	7
Male	Adult	81
Male	Baby	2
Male	Child	11
Male	Teen	7
Total	Adult	81
Total	Baby	2

... (2 rows omitted)

A second argument to groups aggregates all other columns that do not appear in the list of grouped columns.

In [14]:

population.groups(['Male/Female', 'Age Group'], sum)

Out[14]:

Male/Female	Age Group	Population sum
Female	Adult	121754366
Female	Baby	3869363
Female	Child	21903805
Female	Teen	14393035
Male	Adult	114967088
Male	Baby	4041110
Male	Child	22851851
Male	Teen	15076438
Total	Adult	236721454
Total	Baby	7910473

... (2 rows omitted)

Pivot. The pivot method is closely related to the groups method: it groups together rows that share a combination of values. It differs from groups because it organizes the resulting values in a grid. The first argument to pivot is a column that contains the values that will be used to form new columns in the result. The second argument is a column used for grouping rows. The result gives the count of all rows that share the combination of column and row values.

In [15]:

population.pivot('Male/Female', 'Age Group')

Out[15]:

Age Group	Female	Male	Total
Adult	81	81	81
Baby	2	2	2
Child	11	11	11
Teen	7	7	7

An optional third argument indicates a column of values that will replace the counts in each cell of the grid. The fourth argument indicates how to aggregate all of the values that match the combination of column and row values.

In [17]:

pivoted = population.pivot('Male/Female', 'Age Group', 'Population', sum)
pivoted

Out[17]:

Age Group	Female	Male	Total
Adult	121754366	114967088	236721454
Baby	3869363	4041110	7910473
Child	21903805	22851851	44755656
Teen	14393035	15076438	29469473

The advantage of pivot is that it places grouped values into adjacent columns, so that they can be combined. For instance, this pivoted table allows us to compute the proportion of each age group that is male. We find the surprising result that younger age groups are predominantly male, but among adults there are substantially more females.

In [21]:

pivoted.with_column('Male Percentage', pivoted.column('Male')/pivoted.column('Total')).set_format(4, PercentFormatter)

Out[21]:

Age Group	Female	Male	Total	Male Percentage
Adult	121754366	114967088	236721454	48.57%
Baby	3869363	4041110	7910473	51.09%
Child	21903805	22851851	44755656	51.06%
Teen	14393035	15076438	29469473	51.16%

SEX	AGE	2014
0	0	3,948,350
0	1	3,962,123
0	2	3,957,772
0	3	4,005,190
0	4	4,003,448
0	5	4,004,858
0	6	4,134,352
0	7	4,154,000
0	8	4,119,524
0	9	4,106,832