Skip to content
Advertisement

Drop first nan rows in multiple columns

I have the below df:

    ID      Number  Number 2  Number 3
1   10001   NaN     NaN       5
2   10001   25      NaN       12
3   10001   78      4         NaN
4   10002   3       NaN       NaN
5   10002   234     201       NaN
6   10002   NaN     510       34
7   10003   NaN     765       NaN
8   10003   NaN     422       NaN
9   10003   NaN     753       56
10  10003   231     7587      2345

I want to structure the data so the first NaN rows are deleted by column.

Resuling df:

    ID      Number Number 2  Number 3
1   10001   25     4         5
2   10001   78     NaN       12
3   10001   NaN    NaN       NaN
4   10002   3      201       34
5   10002   234    510       NaN
6   10002   NaN    NaN       NaN
7   10003   231    765       56
8   10003   NaN    422       2345
9   10003   NaN    753       NaN
10  10003   NaN    7587      NaN

I’m essentially trying to shift the column data up by n rows depending on where the data starts for that column, so at the first rows of ID there is always data in at least 1 of the Number columns.

I’ve tried first_row_index but this doesn’t work by individual column

I’ve tried dropna but I can’t find a solution where I’m defining what number of rows to drop per column.

Advertisement

Answer

df1 = df.melt('ID').dropna()
df1['var1'] = df1.groupby(['variable', 'ID']).cumcount()
df1.pivot(['ID', 'var1'], 'variable', 'value').reset_index(0)

variable     ID  Number  Number 2  Number 3
var1                                       
0         10001    25.0       4.0       5.0
1         10001    78.0       NaN      12.0
0         10002     3.0     201.0      34.0
1         10002   234.0     510.0       NaN
0         10003   231.0     765.0      56.0
1         10003     NaN     422.0    2345.0
2         10003     NaN     753.0       NaN
3         10003     NaN    7587.0       NaN
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement