How to merge rows in a Dataframe based on a previous row?

I have a sequentially ordered dataframe that represent two events measured over time – the measurements are the start and end times of the event. They should be ordered in an ABABAB sequence, but in some cases I may have consecutive events of the same type (i.e. ABABAABABB). I am looking for a way to check the event label (A or B) in each row with the previous event label, and if they are the same to merge the rows in such a way that I maintain the start time of the first event and the end time of the second event. Consider the following:

myDF = pd.DataFrame({"Event": ["A","B","A","A","B","B","A"], 
                 "Start": [1,3,5,7,9,11,13], 
                 "End": [2,4,6,8,10,12,14]})

JavaScript
​x
 
myDF = pd.DataFrame({"Event": ["A","B","A","A","B","B","A"], 
                 "Start": [1,3,5,7,9,11,13], 
                 "End": [2,4,6,8,10,12,14]})
​

What I currently have…

==============================
  Event      Start      End
==============================
    A          1         2
    B          3         4
    A          5         6
    A          7         8
    B          9         10
    B          11        12
    A          13        14
==============================

JavaScript
 
==============================
  Event      Start      End
==============================
    A          1         2
    B          3         4
    A          5         6
    A          7         8
    B          9         10
    B          11        12
    A          13        14
==============================
​

What I need…

Note: The two A events at index position 2-3 have been merged into one, as have the two B events originally at positions 4-5.

==============================
  Event      Start      End
==============================
    A          1         2
    B          3         4
    A          5         8
    B          9         12
    A          13        14
==============================

JavaScript
 
==============================
  Event      Start      End
==============================
    A          1         2
    B          3         4
    A          5         8
    B          9         12
    A          13        14
==============================
​

I had initially thought to use groupby but I don’t think this right as this will group over the entire dataframe. Similarly I have tried using iteritems but have not had any success. Apologies for the lack of code but I’m at a loss as to how to approach the problem.

Answer

You can use GroupBy.agg with first and last.

g = df["Event"].ne(df["Event"].shift()).cumsum()
df.groupby(g, as_index = False).agg({
  "Event": "first",
  "Start": "first",
  "End": "last"
})

  Event  Start  End
0     A      1    2
1     B      3    4
2     A      5    8
3     B      9   12
4     A     13   14

JavaScript
 
g = df["Event"].ne(df["Event"].shift()).cumsum()
df.groupby(g, as_index = False).agg({
  "Event": "first",
  "Start": "first",
  "End": "last"
})
​
  Event  Start  End
0     A      1    2
1     B      3    4
2     A      5    8
3     B      9   12
4     A     13   14
​

Advertisement

Answer