Series Vectorization and Broadcasting

Just like NumPy, pandas offers powerful vectorized methods. It also leans on broadcasting.

Let's explore!

In [1]:
import pandas as pd

test_balance_data = {
    'pasan': 20.00,
    'treasure': 20.18,
    'ashley': 1.05,
    'craig': 42.42,
}

test_deposit_data = {
    'pasan': 20,
    'treasure': 10,
    'ashley': 100,
    'craig': 55,   
}

balances = pd.Series(test_balance_data)
deposits = pd.Series(test_deposit_data)

Vectorization

While it is indeed possible to loop through each item and apply it to another...

In [2]:
for label, value in deposits.iteritems():
    balances[label] += value
balances
Out[2]:
pasan        40.00
treasure     30.18
ashley      101.05
craig        97.42
dtype: float64

...it's important to remember to lean on vectorization and skip the loops altogether. Vectorization is faster and as you can see, easier to read and write.

In [3]:
# Undo the change using inplace subtraction
balances -= deposits

# This is the same as the loop above using inplace addition
balances += deposits
balances
Out[3]:
pasan        40.00
treasure     30.18
ashley      101.05
craig        97.42
dtype: float64

Broadcasting

Broadcasting a Scalar

Also just like NumPy arrays, the mathematical operators have been overridden to use the vectorized versions of the same operation.

In [4]:
# 5 is brodacsted and added to each and every value. This returns a new Series.
balances + 5
Out[4]:
pasan        45.00
treasure     35.18
ashley      106.05
craig       102.42
dtype: float64

Broadcasting a Series

Labels are used to line up entries. When the label only exists in one side, a np.nan (not a number ) is put in place.

CashBox is giving out free coupons that user's can scan into the app to get $1 added to their accounts.

In [5]:
coupons = pd.Series(1, ['craig', 'ashley', 'james'])
coupons
Out[5]:
craig     1
ashley    1
james     1
dtype: int64

Now we are going to add the coupons to people who cashed them in. This addition will return a new Series.

In [6]:
# Returns a new Series
balances + coupons
Out[6]:
ashley      102.05
craig        98.42
james          NaN
pasan          NaN
treasure       NaN
dtype: float64

Notice how values that are not in both Series are set to np.nan. This isn't what we want! Pasan had $45.00 and now he has nothing. He is going to be so bummed!

Also take note that James is not in the balances Series but he is in the coupons Series. Note how he is now added to the new Series, but his value is also set to np.nan.

Using the fill_value parameter

It is possible to fill missing values so that everything aligns. The concept is to use the add method directly along with the the keyword argument fill_value.

In [7]:
# Returns a new Series
balances.add(coupons, fill_value=0)
Out[7]:
ashley      102.05
craig        98.42
james         1.00
pasan        40.00
treasure     30.18
dtype: float64