I have the below df
:
JavaScript
x
12
12
1
ID Number Number 2 Number 3
2
1 10001 NaN NaN 5
3
2 10001 25 NaN 12
4
3 10001 78 4 NaN
5
4 10002 3 NaN NaN
6
5 10002 234 201 NaN
7
6 10002 NaN 510 34
8
7 10003 NaN 765 NaN
9
8 10003 NaN 422 NaN
10
9 10003 NaN 753 56
11
10 10003 231 7587 2345
12
I want to structure the data so the first NaN rows are deleted by column.
Resuling df
:
JavaScript
1
12
12
1
ID Number Number 2 Number 3
2
1 10001 25 4 5
3
2 10001 78 NaN 12
4
3 10001 NaN NaN NaN
5
4 10002 3 201 34
6
5 10002 234 510 NaN
7
6 10002 NaN NaN NaN
8
7 10003 231 765 56
9
8 10003 NaN 422 2345
10
9 10003 NaN 753 NaN
11
10 10003 NaN 7587 NaN
12
I’m essentially trying to shift the column data up by n rows depending on where the data starts for that column, so at the first rows of ID
there is always data in at least 1 of the Number
columns.
I’ve tried first_row_index
but this doesn’t work by individual column
I’ve tried dropna
but I can’t find a solution where I’m defining what number of rows to drop per column.
Advertisement
Answer
JavaScript
1
4
1
df1 = df.melt('ID').dropna()
2
df1['var1'] = df1.groupby(['variable', 'ID']).cumcount()
3
df1.pivot(['ID', 'var1'], 'variable', 'value').reset_index(0)
4
JavaScript
1
11
11
1
variable ID Number Number 2 Number 3
2
var1
3
0 10001 25.0 4.0 5.0
4
1 10001 78.0 NaN 12.0
5
0 10002 3.0 201.0 34.0
6
1 10002 234.0 510.0 NaN
7
0 10003 231.0 765.0 56.0
8
1 10003 NaN 422.0 2345.0
9
2 10003 NaN 753.0 NaN
10
3 10003 NaN 7587.0 NaN
11