After an data frame aggregation with group by I'm trying to "flatten" the headers into one to properly export the data as CSV: The output looks like that: If I call the data frame directly, I get a different information: Output: It seems like pandas merged the column names, but I still have two levels of column description. If I

Pandas: Cannot address column from previously merged multi level data frame

After an data frame aggregation with group by I’m trying to “flatten” the headers into one to properly export the data as CSV:

df.columns = [' '.join(col).strip() for col in df..columns.values]
df.columns

JavaScript
​x
 
df.columns = [' '.join(col).strip() for col in df..columns.values]
df.columns
​

The output looks like that:

Index(['count', 'average', 'mean',
       'sum'],
      dtype='object')

JavaScript
 
Index(['count', 'average', 'mean',
       'sum'],
      dtype='object')
​

If I call the data frame directly, I get a different information:

df

JavaScript
 
df
​

Output:

                 count average mean sum
col1 col2 col3 
...

JavaScript
 
                 count average mean sum
col1 col2 col3 
...
​

It seems like pandas merged the column names, but I still have two levels of column description. If I try to address 2nd level columns, it raises an error:

df.drop('col1', axis = 'columns', level = 0)

JavaScript
 
df.drop('col1', axis = 'columns', level = 0)
​

Output:

AssertionError: axis must be a MultiIndex

JavaScript
 
AssertionError: axis must be a MultiIndex
​

df.drop('col1', axis = 'columns')

JavaScript
 
df.drop('col1', axis = 'columns')
​

Output

KeyError: "['col1'] not found in axis"

JavaScript
 
KeyError: "['col1'] not found in axis"
​

So it seems like I’m stuck with something in between. If I export the data frame to CSV and import it again, everything is fine:

df.to_csv('data.csv')

JavaScript
 
df.to_csv('data.csv')
​

And

df = df.load_csv('data.csv')
df.drop('col1', axis = 'columns')

JavaScript
 
df = df.load_csv('data.csv')
df.drop('col1', axis = 'columns')
​

So, what am I misunderstanding and doing wrong here?

Answer

You probably want to do df.reset_index() after the df.groupby statement, to “flatten” the headers as requested. See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.reset_index.html

Advertisement

Answer