I’m a beginner with python in combination with pandas, and I understand the basics. But I received a couple days ago 3 strange datasets in excel. As image below:
import pandas as pd dfinput = pd.DataFrame([ ["uuid", "79876081-099b-474f-9e8f-ff917fd7394c", "uuid", "a96bc7cb-02b1-4d13-823a-908531cda095", "uuid", "38bc7d20-10be-4774-973c-b3b00234a645", "uuid", "e7b12da6-a47f-4c24-8545-faa24e249a03", "uuid", "6b2c9426-bd6f-4bda-9c53-a86200e051f8"], ["variable 1", "value", "variable 1", "value", "variable 1", "value", "variable 1", "value", "variable 1", "value"], ["variable 2", "value", "variable 2", "value", "variable 2", "value", "variable 2", "value", "variable 2", "value"], ["variable 3", "value", "variable 3", "value", "variable 3", "value", "variable 3", "value", "variable 3", "value"], ["variable 4", "value", "variable 4", "value", "variable 4", "value", "variable 4", "value", "variable 4", "value"], ["variable 5", "value", "variable 5", "value", "variable 5", "value", "variable 5", "value", "variable 5", "value"], ["variable 6", "value", "variable 6", "value", "variable 6", "value", "variable 6", "value", "variable 6", "value"], ["variable 7", "value", "variable 7", "value", "variable 7", "value", "variable 7", "value", "variable 7", "value"], ["variable 8", "value", "variable 8", "value", "variable 8", "value", "variable 8", "value", "variable 8", "value"], ["variable 9", "value", "variable 9", "value", "variable 9", "value", "variable 9", "value", "variable 9", "value"], ["variable 10", "value", "variable 10", "value", "variable 10", "value", "variable 10", "value", "variable 10", "value"], ["variable A", "value", "variable B", "value", "variable A", "value", "variable A", "value", "variable A", "value"], ["variable B", "value", "variable C", "value", "variable C", "value", "variable B", "value", "variable B", "value"], ["variable C", "value", "variable D", "value", "variable D", "value", "variable D", "value", "variable C", "value"], ["variable D", "value", "Variable E", "value", "Variable E", "value", "Variable F", "value", "Variable E", "value"], ["Variable E", "value", "Variable F", "value", "Variable H", "value", "Variable G", "value", "Variable F", "value"], ["Variable F", "value", "Variable H", "value", "", "", "Variable H", "value", "Variable G", "value"], ["Variable G", "value", "", "", "", "", "", "", "Variable H", "value"] ])
dfoutput = pd.DataFrame([["value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "null"], ["value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "null", "value", "value", "value", "value", "value", "null", "value"], ["value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "null", "value", "value", "value", "null", "null", "value"], ["value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "null", "value", "null", "value", "value", "value"], ["value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "null", "value", "value", "value", "value"]], index=['79876081-099b-474f-9e8f-ff917fd7394c', 'a96bc7cb-02b1-4d13-823a-908531cda095', '38bc7d20-10be-4774-973c-b3b00234a645', 'e7b12da6-a47f-4c24-8545-faa24e249a03', '6b2c9426-bd6f-4bda-9c53-a86200e051f8'], columns=['variable 1', 'variable 2', 'variable 3', 'variable 4', 'variable 5', 'variable 6', 'variable 7', 'variable 8', 'variable 9', 'variable 10', 'variable A', 'variable B', 'variable C', 'variable D', 'Variable E', 'Variable F', 'Variable G', 'Variable H'])
I did try to loop the columns and create a new dataframe, but got stuck and think I make it unnecessary complex. I can’t get my head around it. Someone dealt with this before? and have a useful direction for me to go?
Advertisement
Answer
You can re-structure your data to your desired outcome with a rather simple manipulation. Note that I am using the dataframe
(dfinput) you posted:
# Change first row to headers and Transpose headers = dfinput.iloc[0] one = (pd.DataFrame(dfinput.values[1:], columns=headers)).T # Change first row to headers again one.columns = one.iloc[0] # Keep only odd indexed rows res = one.iloc[1::2, :]
res uuid variable 1 variable 2 variable 3 variable 4 variable 5 variable 6 variable 7 variable 8 variable 9 variable 10 variable A variable B variable C variable D Variable E Variable F Variable G 79876081-099b-474f-9e8f-ff917fd7394c value value value value value value value value value value value value value value value value value a96bc7cb-02b1-4d13-823a-908531cda095 value value value value value value value value value value value value value value value value 38bc7d20-10be-4774-973c-b3b00234a645 value value value value value value value value value value value value value value value e7b12da6-a47f-4c24-8545-faa24e249a03 value value value value value value value value value value value value value value value value 6b2c9426-bd6f-4bda-9c53-a86200e051f8 value value value value value value value value value value value value value value value value value