I really struggle with tidying up the table into a “normal” dataframe again after having aggregated something. I had a table like that (columns):
JavaScript
x
2
1
RnnSize EmbSize RnnLayer Epochs Alpha Eval Run Result
2
So I calculated average and std of the Result column over multiple runs using that command:
JavaScript
1
2
1
df.groupby(["RnnSize", "EmbSize", "RnnLayer", "Epochs", "Alpha", "Eval"]).agg({'Result': ['mean', 'std']})
2
The output is a DataFrame like that:
JavaScript
1
4
1
Result
2
mean std
3
RnnSize EmbSize RnnLayer Epochs Alpha Eval
4
It looks a bit like three levels.
df.columns outputs the following multiindex:
JavaScript
1
11
11
1
MultiIndex([( 'index', ''),
2
( 'RnnSize', ''),
3
( 'EmbSize', ''),
4
('RnnLayer', ''),
5
( 'Epochs', ''),
6
( 'Alpha', ''),
7
( 'Eval', ''),
8
( 'Result', 'std'),
9
( 'Result', 'std')],
10
)
11
How do I flatten that again, removing “Result” and putting mean and std into the same “level” as the rest? There are so many commands like reset_index, drop_level and so on, but I did not find out yet how to fix that. It quite confuses me.
Edit: For reproducability, here is my entire code:
JavaScript
1
10
10
1
import numpy as np
2
import pandas as pd
3
import matplotlib.pyplot as plt
4
5
dfRuns = pd.read_csv("Results.csv", encoding="utf-8")
6
dfRuns
7
8
dfAv = dfRuns.copy()
9
dfAv = dfAv.groupby(["RnnSize", "EmbSize", "RnnLayer", "Epochs", "Alpha", "Eval"]).agg({'Result': ['mean', 'std']})
10
And the (shortened) csv file Results.csv:
JavaScript
1
8
1
RnnSize,EmbSize,RnnLayer,Epochs,Alpha,Eval,Run,Result
2
128,200,2,150,0.1,Precision,1,0.5940
3
128,200,2,150,0.1,Recall,1,0.5038
4
128,200,2,150,0.1,F1,1,0.5144
5
128,200,2,150,0.1,Precision,2,0.5851
6
128,200,2,150,0.1,Recall,2,0.4995
7
128,200,2,150,0.1,F1,2,0.5082
8
Advertisement
Answer
Use reset_index()
and then flatten the indexes:
JavaScript
1
3
1
df = df.reset_index()
2
df.columns = [' '.join(col).rstrip() for col in df.columns.to_numpy()]
3