I am trying to transform a DataFrame by combining extra columns into a dictionary.
my DataFrame will always have four columns, at least: record, yhat, residual, and hat, with additional columns in different cases. My current df head looks like this:
record yhat residual hat RinvRes AOMstat 0 1 6.7272 -0.57130 0.04985 0.009825 0.02041 1 2 6.5568 0.19460 0.09771 -0.014930 -0.03078 2 3 6.5457 0.16190 0.09765 0.272800 0.56260
If we look at the top column, we see that there are 2 additional columns, RinvRes and AOMstat
record yhat residual hat RinvRes AOMstat 0 1 6.7272 -0.57130 0.04985 0.009825 0.02041
I would like to combine those columns into a dictionary, where the column name is a key in a dictionary, eg :
record yhat residual hat additional 0 1 6.7272 -0.57130 0.04985 {“RinvRes“: “0.2291E-01“, “AOMstat“ : “0.3224E-01“}
Advertisement
Answer
in one step with .join
, .agg(dict)
and .drop
first create your list of aggregate columns
agg_cols = ['RinvRes', 'AOMstat'] df1 = df.join(df[agg_cols].agg(dict,axis=1) .to_frame('additional')).drop(agg_cols,1)
print(df1)
record yhat residual hat additional 0 1 6.7272 -0.5713 0.04985 {'RinvRes': 0.009825, 'AOMstat': 0.02041} 1 2 6.5568 0.1946 0.09771 {'RinvRes': -0.01493, 'AOMstat': -0.03078} 2 3 6.5457 0.1619 0.09765 {'RinvRes': 0.2728, 'AOMstat': 0.5626}