Dataframe’s with strange structure with variables in even columns

Tags: , ,



I’m a beginner with python in combination with pandas, and I understand the basics. But I received a couple days ago 3 strange datasets in excel. As image below: enter image description here

import pandas as pd

dfinput = pd.DataFrame([
    ["uuid", "79876081-099b-474f-9e8f-ff917fd7394c", "uuid", "a96bc7cb-02b1-4d13-823a-908531cda095", "uuid",
        "38bc7d20-10be-4774-973c-b3b00234a645", "uuid", "e7b12da6-a47f-4c24-8545-faa24e249a03", "uuid", "6b2c9426-bd6f-4bda-9c53-a86200e051f8"],
    ["variable 1", "value", "variable 1", "value", "variable 1",
        "value", "variable 1", "value", "variable 1", "value"],
    ["variable 2", "value", "variable 2", "value", "variable 2",
        "value", "variable 2", "value", "variable 2", "value"],
    ["variable 3", "value", "variable 3", "value", "variable 3",
        "value", "variable 3", "value", "variable 3", "value"],
    ["variable 4", "value", "variable 4", "value", "variable 4",
        "value", "variable 4", "value", "variable 4", "value"],
    ["variable 5", "value", "variable 5", "value", "variable 5",
        "value", "variable 5", "value", "variable 5", "value"],
    ["variable 6", "value", "variable 6", "value", "variable 6",
        "value", "variable 6", "value", "variable 6", "value"],
    ["variable 7", "value", "variable 7", "value", "variable 7",
        "value", "variable 7", "value", "variable 7", "value"],
    ["variable 8", "value", "variable 8", "value", "variable 8",
        "value", "variable 8", "value", "variable 8", "value"],
    ["variable 9", "value", "variable 9", "value", "variable 9",
        "value", "variable 9", "value", "variable 9", "value"],
    ["variable 10", "value", "variable 10", "value", "variable 10",
        "value", "variable 10", "value", "variable 10", "value"],
    ["variable A", "value", "variable B", "value", "variable A",
        "value", "variable A", "value", "variable A", "value"],
    ["variable B", "value", "variable C", "value", "variable C",
        "value", "variable B", "value", "variable B", "value"],
    ["variable C", "value", "variable D", "value", "variable D",
        "value", "variable D", "value", "variable C", "value"],
    ["variable D", "value", "Variable E", "value", "Variable E",
        "value", "Variable F", "value", "Variable E", "value"],
    ["Variable E", "value", "Variable F", "value", "Variable H",
        "value", "Variable G", "value", "Variable F", "value"],
    ["Variable F", "value", "Variable H", "value", "",
        "", "Variable H", "value", "Variable G", "value"],
    ["Variable G", "value", "", "", "", "", "", "", "Variable H", "value"]
])

I want the following result: enter image description here

dfoutput = pd.DataFrame([["value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "null"],
                         ["value", "value", "value", "value", "value", "value", "value", "value", "value",
                             "value", "null", "value", "value", "value", "value", "value", "null", "value"],
                         ["value", "value", "value", "value", "value", "value", "value", "value", "value",
                             "value", "value", "null", "value", "value", "value", "null", "null", "value"],
                         ["value", "value", "value", "value", "value", "value", "value", "value", "value",
                             "value", "value", "value", "null", "value", "null", "value", "value", "value"],
                         ["value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "null", "value", "value", "value", "value"]],
                        index=['79876081-099b-474f-9e8f-ff917fd7394c', 'a96bc7cb-02b1-4d13-823a-908531cda095',
                               '38bc7d20-10be-4774-973c-b3b00234a645', 'e7b12da6-a47f-4c24-8545-faa24e249a03', '6b2c9426-bd6f-4bda-9c53-a86200e051f8'],
                        columns=['variable 1', 'variable 2', 'variable 3', 'variable 4', 'variable 5', 'variable 6', 'variable 7', 'variable 8', 'variable 9', 'variable 10', 'variable A', 'variable B', 'variable C', 'variable D', 'Variable E', 'Variable F', 'Variable G', 'Variable H'])

I did try to loop the columns and create a new dataframe, but got stuck and think I make it unnecessary complex. I can’t get my head around it. Someone dealt with this before? and have a useful direction for me to go?

Answer

You can re-structure your data to your desired outcome with a rather simple manipulation. Note that I am using the dataframe (dfinput) you posted:

# Change first row to headers and Transpose
headers = dfinput.iloc[0]
one  = (pd.DataFrame(dfinput.values[1:], columns=headers)).T

# Change first row to headers again
one.columns = one.iloc[0]

# Keep only odd indexed rows
res = one.iloc[1::2, :]

res

uuid                                 variable 1 variable 2 variable 3 variable 4 variable 5 variable 6 variable 7 variable 8 variable 9 variable 10 variable A variable B variable C variable D Variable E Variable F Variable G
                                                                                                                                                                                                                            
79876081-099b-474f-9e8f-ff917fd7394c      value      value      value      value      value      value      value      value      value       value      value      value      value      value      value      value      value
a96bc7cb-02b1-4d13-823a-908531cda095      value      value      value      value      value      value      value      value      value       value      value      value      value      value      value      value           
38bc7d20-10be-4774-973c-b3b00234a645      value      value      value      value      value      value      value      value      value       value      value      value      value      value      value                      
e7b12da6-a47f-4c24-8545-faa24e249a03      value      value      value      value      value      value      value      value      value       value      value      value      value      value      value      value           
6b2c9426-bd6f-4bda-9c53-a86200e051f8      value      value      value      value      value      value      value      value      value       value      value      value      value      value      value      value      value


Source: stackoverflow