Here is my dataframe following the merging of two separate dataframes.
In the [order]
column, orders 2, 4 and 5, have multiple corresponding amounts in the [value]
column. This is replicated elsewhere in the dataframe at random points. What can I do to merge these amounts so that I am left with one row in the dataframe per order:
Essentially, how do I sum the values against each order line leaving me with one amount in the [value]
columns for each row of the dataframe?
So my desired output will look something like:
order customer value ordered delivered 1 832 3450.0 2015-01-01 2015-01-02 2 803 607.0 2015-01-01 2015-01-02 3 105 135.0 2015-01-01 2015-01-02 4 57 667.8 2015-01-01 2015-01-02 5 905 1056.0 2015-01-01 2015-01-02
Advertisement
Answer
Use if order
is column use GroupBy.transform
with DataFrame.drop_duplicates
:
df['value'] = df.groupby(df['order'].replace('', np.nan).ffill())['value'].transform('sum') df = df.drop_duplicates('order')
If order
is level of MultiIndex
:
df['value'] = df.groupby(level=0)['value'].transform('sum') df = df[~df.index.get_level_values('order').duplicated()]