I have a sequentially ordered dataframe that represent two events measured over time – the measurements are the start and end times of the event. They should be ordered in an ABABAB sequence, but in some cases I may have consecutive events of the same type (i.e. ABABAABABB). I am looking for a way to check the event label (A or B) in each row with the previous event label, and if they are the same to merge the rows in such a way that I maintain the start time of the first event and the end time of the second event. Consider the following:
myDF = pd.DataFrame({"Event": ["A","B","A","A","B","B","A"], "Start": [1,3,5,7,9,11,13], "End": [2,4,6,8,10,12,14]})
What I currently have…
============================== Event Start End ============================== A 1 2 B 3 4 A 5 6 A 7 8 B 9 10 B 11 12 A 13 14 ==============================
What I need…
Note: The two A events at index position 2-3 have been merged into one, as have the two B events originally at positions 4-5.
============================== Event Start End ============================== A 1 2 B 3 4 A 5 8 B 9 12 A 13 14 ==============================
I had initially thought to use groupby
but I don’t think this right as this will group over the entire dataframe. Similarly I have tried using iteritems
but have not had any success. Apologies for the lack of code but I’m at a loss as to how to approach the problem.
Advertisement
Answer
You can use GroupBy.agg
with first
and last
.
g = df["Event"].ne(df["Event"].shift()).cumsum() df.groupby(g, as_index = False).agg({ "Event": "first", "Start": "first", "End": "last" }) Event Start End 0 A 1 2 1 B 3 4 2 A 5 8 3 B 9 12 4 A 13 14