I have the below df
:
ID Number Number 2 Number 3 1 10001 NaN NaN 5 2 10001 25 NaN 12 3 10001 78 4 NaN 4 10002 3 NaN NaN 5 10002 234 201 NaN 6 10002 NaN 510 34 7 10003 NaN 765 NaN 8 10003 NaN 422 NaN 9 10003 NaN 753 56 10 10003 231 7587 2345
I want to structure the data so the first NaN rows are deleted by column.
Resuling df
:
ID Number Number 2 Number 3 1 10001 25 4 5 2 10001 78 NaN 12 3 10001 NaN NaN NaN 4 10002 3 201 34 5 10002 234 510 NaN 6 10002 NaN NaN NaN 7 10003 231 765 56 8 10003 NaN 422 2345 9 10003 NaN 753 NaN 10 10003 NaN 7587 NaN
I’m essentially trying to shift the column data up by n rows depending on where the data starts for that column, so at the first rows of ID
there is always data in at least 1 of the Number
columns.
I’ve tried first_row_index
but this doesn’t work by individual column
I’ve tried dropna
but I can’t find a solution where I’m defining what number of rows to drop per column.
Advertisement
Answer
df1 = df.melt('ID').dropna() df1['var1'] = df1.groupby(['variable', 'ID']).cumcount() df1.pivot(['ID', 'var1'], 'variable', 'value').reset_index(0)
variable ID Number Number 2 Number 3 var1 0 10001 25.0 4.0 5.0 1 10001 78.0 NaN 12.0 0 10002 3.0 201.0 34.0 1 10002 234.0 510.0 NaN 0 10003 231.0 765.0 56.0 1 10003 NaN 422.0 2345.0 2 10003 NaN 753.0 NaN 3 10003 NaN 7587.0 NaN