I have a data frame that looks something like this:
df = date name val1 val2 ----------------------------------- 14:55:00 name1 1 2 14:55:00 name1 2 4 15:00:00 name2 3 6 15:00:00 name3 4 8 15:05:00 name4 5 10 15:05:00 name5 6 12
What I would like to do is aggregate the data if the dates are the same – but only if the name is different. So the above data frame should actually become:
df_new = date name val1 val2 ----------------------------------------- 15:00:00 name2+name3 7 14 15:05:00 name4+name5 11 22
Currently I am almost doing it with:
df_new = df.groupby("date", as_index=False).agg({"name" : "+".join, "val1" : "sum", "val2" : "sum"})
However, this will also aggregate the ones where the name
is the same, which it shouldn’t. EDIT: It should also be noted that there are only a few different names. The names will be repeated in each date-interval. It’s just that when the dates are aggregated the names can’t be the same.
Can this be fixed ?
Advertisement
Answer
Look for the duplicates, drop them, and then aggregate on the date column:
(df.drop_duplicates(subset=['date', 'name'], keep=False) .groupby('date') .sum() ) val1 val2 date 15:00:00 7 14 15:05:00 11 22