I have an excel file which has multiple title names as columns within the same row where the data is given, I need to sort the data and convert the column names to rows and assign it to the data under the “column names”
enter image description here
My expected output is for it to turn out like this:
enter image description here
Im not sure how to start with this, can someone help out here?
Edit: sorry about the img description, im new here
Advertisement
Answer
You can check rows with names of new column values by column b
testing missing values, replace non matched a
column values to missing values by Series.where
and forward filling missing values, last filter with inverted mask and columns a,c
in DataFrame.loc
:
JavaScript
x
21
21
1
df = pd.read_excel('file.xlsx')
2
3
#sample data
4
print (df)
5
a b c
6
0 IT NaN NaN
7
1 User 1.0 user1@gmail.com
8
2 Data NaN NaN
9
3 User 1.0 user5@gmail.com
10
4 User 1.0 user2@gmail.com
11
12
m = df['c'].isna()
13
df['a'] = df['a'].where(m).ffill()
14
15
df = df.loc[~m, ['a','c']]
16
print (df)
17
a c
18
1 IT user1@gmail.com
19
3 Data user5@gmail.com
20
4 Data user2@gmail.com
21