I am trying to transform a DataFrame by combining extra columns into a dictionary.
my DataFrame will always have four columns, at least: record, yhat, residual, and hat, with additional columns in different cases. My current df head looks like this:
JavaScript
x
5
1
record yhat residual hat RinvRes AOMstat
2
0 1 6.7272 -0.57130 0.04985 0.009825 0.02041
3
1 2 6.5568 0.19460 0.09771 -0.014930 -0.03078
4
2 3 6.5457 0.16190 0.09765 0.272800 0.56260
5
If we look at the top column, we see that there are 2 additional columns, RinvRes and AOMstat
JavaScript
1
4
1
record yhat residual hat RinvRes AOMstat
2
0 1 6.7272 -0.57130 0.04985 0.009825 0.02041
3
4
I would like to combine those columns into a dictionary, where the column name is a key in a dictionary, eg :
JavaScript
1
3
1
record yhat residual hat additional
2
0 1 6.7272 -0.57130 0.04985 {“RinvRes“: “0.2291E-01“, “AOMstat“ : “0.3224E-01“}
3
Advertisement
Answer
in one step with .join
, .agg(dict)
and .drop
first create your list of aggregate columns
JavaScript
1
5
1
agg_cols = ['RinvRes', 'AOMstat']
2
3
df1 = df.join(df[agg_cols].agg(dict,axis=1)
4
.to_frame('additional')).drop(agg_cols,1)
5
print(df1)
JavaScript
1
5
1
record yhat residual hat additional
2
0 1 6.7272 -0.5713 0.04985 {'RinvRes': 0.009825, 'AOMstat': 0.02041}
3
1 2 6.5568 0.1946 0.09771 {'RinvRes': -0.01493, 'AOMstat': -0.03078}
4
2 3 6.5457 0.1619 0.09765 {'RinvRes': 0.2728, 'AOMstat': 0.5626}
5