Skip to content
Advertisement

Python (Pandas) pivot datframe, some sums keeping the order

I’m trying to get from a to b. I got a Pandas data frame similar to the a below.

data={'col1':['N1','N1','N2','N2', 'N2','N3'],
        'col2':['DE','NO','DE','NO', 'IT','DE'],
        'col3':[7, 5, 4, 1, 2, 8],
        'col3_sum':[12, 12, 7, 7, 7, 8],
        'col4':[0.6, 0.2, 0.7, 0.1, 0.2, 0.6],
        'col4_sum':[0.8, 0.8, 1.0, 1.0, 1.0, 0.6],
        'col5':[1,2,3,4,5,6]}
a=pd.DataFrame(data)
print(a)
  col1 col2  col3  col3_sum  col4  col4_sum  col5
0   N1   DE     7        12   0.6       0.8     1
1   N1   NO     5        12   0.2       0.8     2
2   N2   DE     4         7   0.7       1.0     3
3   N2   NO     1         7   0.1       1.0     4
4   N2   IT     2         7   0.2       1.0     5
5   N3   DE     8         8   0.6       0.6     6

I realize I’ve backed myself into a corner by computing sums in a flat file. I’m new to Python. I guess I should create the sums when I’m done pivoting?

What I am stuck in is this wrong b struggle,

b = df.pivot_table(index=['col1'], 
                    values=['col3', 'col3_sum','col4', 'col4_sum'],
                    columns='col2')
# or 
b = pd.pivot_table(a,index=['col1', 'col2'], columns=['col3', 'col4'],
                    aggfunc=len).reset_index()
# this makes senst to me, but not to python
a.pivot(index=['col1', 'col2'], columns='col2', values=['col3', 'col4'])
# print(b) # where I'm stuck at ... 

I would like to get to something like this b,

print(b) # my goal
col1    var     var_sum     DE      NO      IT
N1      col3    12          7       5   
N1      col4    0.8         0.6     0.2 
N2      col3    7           4       1       2
N2      col4    1.0         0.7     0.1     0.2
N3      col3    8           8       
N3      col4    0.6         0.6     

I’m not sure what to search for (some of the maybe relevant questions has way too much complexity for my to be able to extract what I need, at least at the moment). I’ve looked a lot at this answer, maybe I should find a way using .groupby()

Advertisement

Answer

Maybe you can compute the sum afterwards:

out = pd.melt(a, ["col1", "col2"], ["col3", "col4"]).pivot(
    ["col1", "variable"], "col2"
)
out["var_sum"] = out.sum(axis=1)

out = out.reset_index()
out.index.name, out.columns.name = None, None
out.columns = [
    f"{a}_{b}".replace("value", "").strip("_") for a, b in out.columns
]

print(out)

Prints:

  col1 variable   DE   IT   NO  var_sum
0   N1     col3  7.0  NaN  5.0     12.0
1   N1     col4  0.6  NaN  0.2      0.8
2   N2     col3  4.0  2.0  1.0      7.0
3   N2     col4  0.7  0.2  0.1      1.0
4   N3     col3  8.0  NaN  NaN      8.0
5   N3     col4  0.6  NaN  NaN      0.6
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement