A common need is to grab a subset of records that meet a certain criteria. You can do this by indexing the DataFrame
much like you've seen done with a NumPy.ndarray
.
import os
import pandas as pd
users = pd.read_csv(os.path.join('data', 'users.csv'), index_col=0)
# Pop out a quick sanity check
len(users)
CashBox uses a referral system, everyone you refer will earn you $5 credit. Let's see if we can find everyone who has not yet taken advantage of that deal. The number of referrals a user has made is defined in the referral_count
column.
# This vectorized comparison returns a new `Series`, which we are naming so we can use it later
no_referrals_index = users['referral_count'] < 1
# See how the boolean `Series` returned includes all rows from the `DataFrame`.
# The value is the result of each comparison
no_referrals_index.head()
Using the boolean Series
we just created, no_referrals_index
, we can retrieve all rows where that comparison was True.
users[no_referrals_index].head()
A handy shortcut is to prefix the index with a ~
(tilde). This returns the inverse of the boolean Series
. While I wish that the ~
was called "the opposite day" operator, it is in fact called bitwise not
operator.
# Careful, double negative here. We don't need no education.
~no_referrals_index.head()
# Use the inverse of the index to find where referral values DO NOT equal zero
users[~no_referrals_index].head()
loc
¶Boolean Series
as an index may also be used as an index the DataFrame.loc
object.
# Select rows where there are no referrals, and select only the following ordered columns
users.loc[no_referrals_index, ['balance', 'email']].head()
It is also possible to do the comparison inline, without storing the index in a variable.
users[users['referral_count'] == 0].head()
Just like a NumPy ndarray
, it's possible for a boolean Series
to be compared to another boolean Series
using bitwise operators.
Don't forget to surround your expressions with parenthesis to control the order of operations.
# Select all users where they haven't made a referral AND their email has been verified
users[(users['referral_count'] == 0) & (users['email_verified'] == True)].head()