Skip to content
Advertisement

How can I consolidate multiple rows into a single row based off their values in a Pandas Dataframe?

I have a dataframe called Traffic:

   Source   Method    Destination    Weight
0     LA    Ground             NY        20 
1     LA    Ground             NY        15
2     LA    Ground             LV        10
3     LA       Air             LV         5

I’d like to end up with a dataframe like so:

   Source            Ground        Air
0      LA    LV: 10, NY: 35      LV: 5

Where the 4 rows are combined into 1 based off the Source. The traffic methods are then further broken up by their destinations in ascending order. If there are multiple entries from say LA->NY of type Ground, add the weights.

Ground/Air columns would be strings following the format of “Destination:TotalWeight”.

This pattern would continue all the way down for any other Sources such as ‘WA’…

Advertisement

Answer

initialize

df = pd.DataFrame(data= [["LA","Ground","NY",20],
                         ["LA","Ground","NY",15],
                         ["LA","Ground","LV",10],
                         ["LA","Air","LV",5]],
                  columns=["Source","Method","Destination","Weight"])

do stuff

# add weights when same source, method & destination
df_ = df.groupby(["Source","Method","Destination"], as_index=False)['Weight'].sum()
# conatenate destination and weight strings
df_['Dest_Wt'] = df_['Destination'].str.cat(df_['Weight'].astype(str), sep = ': ')

shipping_summary = (df_.groupby(["Source","Method"])['Dest_Wt']
           .apply(', '.join)
           .unstack(level=1)
           .reset_index())
shipping_summary.columns.name = None

print(shipping_summary)

output

   Source      Air            Ground
0      LA    LV: 5    LV: 10, NY: 35
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement