I have an excel file which has multiple title names as columns within the same row where the data is given, I need to sort the data and convert the column names to rows and assign it to the data under the “column names”
enter image description here
My expected output is for it to turn out like this:
enter image description here
Im not sure how to start with this, can someone help out here?
Edit: sorry about the img description, im new here
Advertisement
Answer
You can check rows with names of new column values by column b
testing missing values, replace non matched a
column values to missing values by Series.where
and forward filling missing values, last filter with inverted mask and columns a,c
in DataFrame.loc
:
df = pd.read_excel('file.xlsx') #sample data print (df) a b c 0 IT NaN NaN 1 User 1.0 user1@gmail.com 2 Data NaN NaN 3 User 1.0 user5@gmail.com 4 User 1.0 user2@gmail.com m = df['c'].isna() df['a'] = df['a'].where(m).ffill() df = df.loc[~m, ['a','c']] print (df) a c 1 IT user1@gmail.com 3 Data user5@gmail.com 4 Data user2@gmail.com