Skip to content
Advertisement

Aggregate data with two conditions

I have a data frame that looks something like this:

df =

date        name      val1     val2
-----------------------------------
14:55:00    name1     1        2
14:55:00    name1     2        4
15:00:00    name2     3        6
15:00:00    name3     4        8
15:05:00    name4     5        10
15:05:00    name5     6        12

What I would like to do is aggregate the data if the dates are the same – but only if the name is different. So the above data frame should actually become:

df_new =

date        name            val1     val2
-----------------------------------------
15:00:00    name2+name3     7        14
15:05:00    name4+name5     11       22

Currently I am almost doing it with:

df_new = df.groupby("date", as_index=False).agg({"name" : "+".join, "val1" : "sum", "val2" : "sum"})

However, this will also aggregate the ones where the name is the same, which it shouldn’t. EDIT: It should also be noted that there are only a few different names. The names will be repeated in each date-interval. It’s just that when the dates are aggregated the names can’t be the same.

Can this be fixed ?

Advertisement

Answer

Look for the duplicates, drop them, and then aggregate on the date column:

(df.drop_duplicates(subset=['date', 'name'], 
                    keep=False)
   .groupby('date')
   .sum()
  )
 
          val1  val2
date                
15:00:00     7    14
15:05:00    11    22
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement