Skip to content
Advertisement

Panel data: take first observation of each group, repeat row and adjust certain values

I have a large Pandas dataframe that looks as follows (85k rows):

JavaScript

My goal is the following: For the first observation of each ID for which the BEGDT > Inception, copy the row and change the BEGDT to Inception and the ENDDT to BEGDT - 1 day of the initially copied row.

Accordingly, the final output should look as follows:

JavaScript

I assume that first, I have to group the data with df1.groupby("ID").first(), next do the calculations and finally, insert these rows into df1. However, I am not sure if this is the best way to do it.

Any help would be appreciated.

Advertisement

Answer

Editing the values can be done on a copy of the dataframe (we’ll call it tmp) to expedite things, rather than within the groupby on each individual group. We can then filter by BEGDT > Inception, groupby.first, like you said, get the index values, fetch those rows from our copy and combine the two:

JavaScript
Advertisement