Just like NumPy, pandas offers powerful vectorized methods and leans on broadcasting.
Let's explore!
import pandas as pd
test_balance_data = {
'pasan': 20.00,
'treasure': 20.18,
'ashley': 1.05,
'craig': 42.42,
}
test_deposit_data = {
'pasan': 20,
'treasure': 10,
'ashley': 100,
'craig': 55,
}
balances = pd.Series(test_balance_data)
deposits = pd.Series(test_deposit_data)
While it is indeed possible to loop through each item and apply it to another...
for label, value in deposits.iteritems():
balances[label] += value
balances
pasan 40.00 treasure 30.18 ashley 101.05 craig 97.42 dtype: float64
...it's important to remember to lean on vectorization and skip the loops altogether.
# Undo the change using inplace subtraction
balances -= deposits
# This is the same as the loop above using inplace addition
balances += deposits
balances
pasan 40.00 treasure 30.18 ashley 101.05 craig 97.42 dtype: float64
Also just like NumPy arrays, the mathematical operators have been overridden to use the vectorized versions of the same opration.
# 5 is brodacsted and added to each and every value. This returns a new Series.
balances + 5
pasan 45.00 treasure 35.18 ashley 106.05 craig 102.42 dtype: float64
Labels are used to line up entries. When the label only exists in one side, a np.nan
(not a number ) is put in place.
CashBox is giving out free coupons that user's can scan into the app to get $1 added to their accounts.
coupons = pd.Series(1, ['craig', 'ashley', 'james'])
coupons
craig 1 ashley 1 james 1 dtype: int64
Now we are going to add the coupons to people who cashed them in. This addition will return a new Series
.
# Returns a new Series
balances + coupons
ashley 102.05 craig 98.42 james NaN pasan NaN treasure NaN dtype: float64
Notice how values that are not in both Series
are set to np.nan
. This isn't what we want! Pasan had $45.00 and now he has nothing. He is going to be so bummed!
Also take note that James is not in the balances
Series
but he is in the coupons
Series
. Note how he is now added to the new Series
, but his value is also set to np.nan
.
fill_value
¶It is possible to fill missing values so that everything aligns. The concept is to use the add
method directly along with the the keyword argument fill_value
.
# Returns a new Series
balances.add(coupons, fill_value=0)
ashley 102.05 craig 98.42 james 1.00 pasan 40.00 treasure 30.18 dtype: float64