I’m trying to get from a to b. I got a Pandas data frame similar to the a below.
data={'col1':['N1','N1','N2','N2', 'N2','N3'],
        'col2':['DE','NO','DE','NO', 'IT','DE'],
        'col3':[7, 5, 4, 1, 2, 8],
        'col3_sum':[12, 12, 7, 7, 7, 8],
        'col4':[0.6, 0.2, 0.7, 0.1, 0.2, 0.6],
        'col4_sum':[0.8, 0.8, 1.0, 1.0, 1.0, 0.6],
        'col5':[1,2,3,4,5,6]}
a=pd.DataFrame(data)
print(a)
  col1 col2  col3  col3_sum  col4  col4_sum  col5
0   N1   DE     7        12   0.6       0.8     1
1   N1   NO     5        12   0.2       0.8     2
2   N2   DE     4         7   0.7       1.0     3
3   N2   NO     1         7   0.1       1.0     4
4   N2   IT     2         7   0.2       1.0     5
5   N3   DE     8         8   0.6       0.6     6
I realize I’ve backed myself into a corner by computing sums in a flat file. I’m new to Python. I guess I should create the sums when I’m done pivoting?
What I am stuck in is this wrong b struggle,
b = df.pivot_table(index=['col1'], 
                    values=['col3', 'col3_sum','col4', 'col4_sum'],
                    columns='col2')
# or 
b = pd.pivot_table(a,index=['col1', 'col2'], columns=['col3', 'col4'],
                    aggfunc=len).reset_index()
# this makes senst to me, but not to python
a.pivot(index=['col1', 'col2'], columns='col2', values=['col3', 'col4'])
# print(b) # where I'm stuck at ... 
I would like to get to something like this b,
print(b) # my goal col1 var var_sum DE NO IT N1 col3 12 7 5 N1 col4 0.8 0.6 0.2 N2 col3 7 4 1 2 N2 col4 1.0 0.7 0.1 0.2 N3 col3 8 8 N3 col4 0.6 0.6
I’m not sure what to search for (some of the maybe relevant questions has way too much complexity for my to be able to extract what I need, at least at the moment). I’ve looked a lot at this answer, maybe I should find a way using .groupby()
Advertisement
Answer
Maybe you can compute the sum afterwards:
out = pd.melt(a, ["col1", "col2"], ["col3", "col4"]).pivot(
    ["col1", "variable"], "col2"
)
out["var_sum"] = out.sum(axis=1)
out = out.reset_index()
out.index.name, out.columns.name = None, None
out.columns = [
    f"{a}_{b}".replace("value", "").strip("_") for a, b in out.columns
]
print(out)
Prints:
col1 variable DE IT NO var_sum 0 N1 col3 7.0 NaN 5.0 12.0 1 N1 col4 0.6 NaN 0.2 0.8 2 N2 col3 4.0 2.0 1.0 7.0 3 N2 col4 0.7 0.2 0.1 1.0 4 N3 col3 8.0 NaN NaN 8.0 5 N3 col4 0.6 NaN NaN 0.6