After an data frame aggregation with group by I’m trying to “flatten” the headers into one to properly export the data as CSV:
df.columns = [' '.join(col).strip() for col in df..columns.values] df.columns
The output looks like that:
Index(['count', 'average', 'mean', 'sum'], dtype='object')
If I call the data frame directly, I get a different information:
df
Output:
count average mean sum col1 col2 col3 ...
It seems like pandas merged the column names, but I still have two levels of column description. If I try to address 2nd level columns, it raises an error:
df.drop('col1', axis = 'columns', level = 0)
Output:
AssertionError: axis must be a MultiIndex
Or
df.drop('col1', axis = 'columns')
Output
KeyError: "['col1'] not found in axis"
So it seems like I’m stuck with something in between. If I export the data frame to CSV and import it again, everything is fine:
df.to_csv('data.csv')
And
df = df.load_csv('data.csv') df.drop('col1', axis = 'columns')
So, what am I misunderstanding and doing wrong here?
Advertisement
Answer
You probably want to do df.reset_index()
after the df.groupby
statement, to “flatten” the headers as requested. See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.reset_index.html