Given a table that initially looks like the output of cell 2, we can use pandas to transform the data into the output of cell 11:
import pandas as pd
dictionary = {'VendorID': [6, 7, 8, 9],
'VendorTagID': ['1,2',
'2,3,4',
'1',
'1,2,3,4']
}
given_df = pd.DataFrame.from_dict(dictionary)
given_df
VendorID | VendorTagID | |
---|---|---|
0 | 6 | 1,2 |
1 | 7 | 2,3,4 |
2 | 8 | 1 |
3 | 9 | 1,2,3,4 |
tag_qty = given_df['VendorTagID'].apply(lambda l: len(l.split(',')))
max_cols_needed = max(tag_qty)
print(max_cols_needed)
4
for index in given_df.index:
tags = given_df.loc[index, 'VendorTagID'].split(',')
for col, tag in enumerate(tags):
given_df.loc[index, 'Tag ' + str(col)] = tag
given_df
VendorID | VendorTagID | Tag 0 | Tag 1 | Tag 2 | Tag 3 | |
---|---|---|---|---|---|---|
0 | 6 | 1,2 | 1 | 2 | NaN | NaN |
1 | 7 | 2,3,4 | 2 | 3 | 4 | NaN |
2 | 8 | 1 | 1 | NaN | NaN | NaN |
3 | 9 | 1,2,3,4 | 1 | 2 | 3 | 4 |
output_df = given_df.drop(columns='VendorTagID')
output_df
VendorID | Tag 0 | Tag 1 | Tag 2 | Tag 3 | |
---|---|---|---|---|---|
0 | 6 | 1 | 2 | NaN | NaN |
1 | 7 | 2 | 3 | 4 | NaN |
2 | 8 | 1 | NaN | NaN | NaN |
3 | 9 | 1 | 2 | 3 | 4 |
output_df = output_df.melt(id_vars='VendorID')
output_df
VendorID | variable | value | |
---|---|---|---|
0 | 6 | Tag 0 | 1 |
1 | 7 | Tag 0 | 2 |
2 | 8 | Tag 0 | 1 |
3 | 9 | Tag 0 | 1 |
4 | 6 | Tag 1 | 2 |
5 | 7 | Tag 1 | 3 |
6 | 8 | Tag 1 | NaN |
7 | 9 | Tag 1 | 2 |
8 | 6 | Tag 2 | NaN |
9 | 7 | Tag 2 | 4 |
10 | 8 | Tag 2 | NaN |
11 | 9 | Tag 2 | 3 |
12 | 6 | Tag 3 | NaN |
13 | 7 | Tag 3 | NaN |
14 | 8 | Tag 3 | NaN |
15 | 9 | Tag 3 | 4 |
At this point, we need to:
variable
column containing the original temporary column headersvalue
is NaNvalue
column to the original `VendorTagID'output_df.drop(columns='variable', inplace=True)
output_df.rename(columns={'value': 'VendorTagID'}, inplace=True)
output_df.dropna(inplace=True)
output_df.sort_values('VendorID', inplace=True)
output_df
VendorID | VendorTagID | |
---|---|---|
0 | 6 | 1 |
4 | 6 | 2 |
1 | 7 | 2 |
5 | 7 | 3 |
9 | 7 | 4 |
2 | 8 | 1 |
3 | 9 | 1 |
7 | 9 | 2 |
11 | 9 | 3 |
15 | 9 | 4 |