Given two dataframes df1 and df2, each with two columns a and b, the idea is to create a new dataframe with values in
import pandas as pd
df1 = pd.DataFrame({'a': [1,2,3,4,5], 'b': [18, 19, 20, 21, 22]})
print('df1')
print(df1)
df2 = pd.DataFrame({'a': [5,4,6], 'b': [23, 24, 25]})
print('df2')
print(df2)
df1 a b 0 1 18 1 2 19 2 3 20 3 4 21 4 5 22 df2 a b 0 5 23 1 4 24 2 6 25
df3 = pd.merge(df1, df2, how='outer', on='a')
df3
a | b_x | b_y | |
---|---|---|---|
0 | 1 | 18.0 | NaN |
1 | 2 | 19.0 | NaN |
2 | 3 | 20.0 | NaN |
3 | 4 | 21.0 | 24.0 |
4 | 5 | 22.0 | 23.0 |
5 | 6 | NaN | 25.0 |
df3.loc[df3['b_y'].isna(), 'b_y'] = df3['b_x']
df3.drop(['b_x'], axis=1, inplace=True)
df3.rename(columns={'b_y':'b'}, inplace=True)
print('df3')
print(df3)
df3 a b 0 1 18.0 1 2 19.0 2 3 20.0 3 4 24.0 4 5 23.0 5 6 25.0