Here is my dataframe following the merging of two separate dataframes.
In the [order]
column, orders 2, 4 and 5, have multiple corresponding amounts in the [value]
column. This is replicated elsewhere in the dataframe at random points. What can I do to merge these amounts so that I am left with one row in the dataframe per order:
Essentially, how do I sum the values against each order line leaving me with one amount in the [value]
columns for each row of the dataframe?
So my desired output will look something like:
JavaScript
x
7
1
order customer value ordered delivered
2
1 832 3450.0 2015-01-01 2015-01-02
3
2 803 607.0 2015-01-01 2015-01-02
4
3 105 135.0 2015-01-01 2015-01-02
5
4 57 667.8 2015-01-01 2015-01-02
6
5 905 1056.0 2015-01-01 2015-01-02
7
Advertisement
Answer
Use if order
is column use GroupBy.transform
with DataFrame.drop_duplicates
:
JavaScript
1
4
1
df['value'] = df.groupby(df['order'].replace('', np.nan).ffill())['value'].transform('sum')
2
3
df = df.drop_duplicates('order')
4
If order
is level of MultiIndex
:
JavaScript
1
4
1
df['value'] = df.groupby(level=0)['value'].transform('sum')
2
3
df = df[~df.index.get_level_values('order').duplicated()]
4