Series Vectorization and Broadcasting

Just like NumPy, pandas offers powerful vectorized methods and leans on broadcasting.

Let's explore!

In [1]:
import pandas as pd

test_balance_data = {
    'pasan': 20.00,
    'treasure': 20.18,
    'ashley': 1.05,
    'craig': 42.42,

test_deposit_data = {
    'pasan': 20,
    'treasure': 10,
    'ashley': 100,
    'craig': 55,   

balances = pd.Series(test_balance_data)
deposits = pd.Series(test_deposit_data)


While it is indeed possible to loop through each item and apply it to another...

In [2]:
for label, value in deposits.iteritems():
    balances[label] += value
pasan        40.00
treasure     30.18
ashley      101.05
craig        97.42
dtype: float64's important to remember to lean on vectorization and skip the loops altogether.

In [3]:
# Undo the change using inplace subtraction
balances -= deposits

# This is the same as the loop above using inplace addition
balances += deposits
pasan        40.00
treasure     30.18
ashley      101.05
craig        97.42
dtype: float64


Broadcasting a Scalar

Also just like NumPy arrays, the mathematical operators have been overridden to use the vectorized versions of the same opration.

In [4]:
# 5 is brodacsted and added to each and every value. This returns a new Series.
balances + 5
pasan        45.00
treasure     35.18
ashley      106.05
craig       102.42
dtype: float64

Broadcasting a Series

Labels are used to line up entries. When the label only exists in one side, a np.nan (not a number ) is put in place.

CashBox is giving out free coupons that user's can scan into the app to get $1 added to their accounts.

In [5]:
coupons = pd.Series(1, ['craig', 'ashley', 'james'])
craig     1
ashley    1
james     1
dtype: int64

Now we are going to add the coupons to people who cashed them in. This addition will return a new Series.

In [6]:
# Returns a new Series
balances + coupons
ashley      102.05
craig        98.42
james          NaN
pasan          NaN
treasure       NaN
dtype: float64

Notice how values that are not in both Series are set to np.nan. This isn't what we want! Pasan had $45.00 and now he has nothing. He is going to be so bummed!

Also take note that James is not in the balances Series but he is in the coupons Series. Note how he is now added to the new Series, but his value is also set to np.nan.

Using the fill_value

It is possible to fill missing values so that everything aligns. The concept is to use the add method directly along with the the keyword argument fill_value.

In [7]:
# Returns a new Series
balances.add(coupons, fill_value=0)
ashley      102.05
craig        98.42
james         1.00
pasan        40.00
treasure     30.18
dtype: float64