I’m trying to get from a
to b
. I got a Pandas data frame similar to the a below.
data={'col1':['N1','N1','N2','N2', 'N2','N3'], 'col2':['DE','NO','DE','NO', 'IT','DE'], 'col3':[7, 5, 4, 1, 2, 8], 'col3_sum':[12, 12, 7, 7, 7, 8], 'col4':[0.6, 0.2, 0.7, 0.1, 0.2, 0.6], 'col4_sum':[0.8, 0.8, 1.0, 1.0, 1.0, 0.6], 'col5':[1,2,3,4,5,6]} a=pd.DataFrame(data) print(a) col1 col2 col3 col3_sum col4 col4_sum col5 0 N1 DE 7 12 0.6 0.8 1 1 N1 NO 5 12 0.2 0.8 2 2 N2 DE 4 7 0.7 1.0 3 3 N2 NO 1 7 0.1 1.0 4 4 N2 IT 2 7 0.2 1.0 5 5 N3 DE 8 8 0.6 0.6 6
I realize I’ve backed myself into a corner by computing sums in a flat file. I’m new to Python. I guess I should create the sums when I’m done pivoting?
What I am stuck in is this wrong b
struggle,
b = df.pivot_table(index=['col1'], values=['col3', 'col3_sum','col4', 'col4_sum'], columns='col2') # or b = pd.pivot_table(a,index=['col1', 'col2'], columns=['col3', 'col4'], aggfunc=len).reset_index() # this makes senst to me, but not to python a.pivot(index=['col1', 'col2'], columns='col2', values=['col3', 'col4']) # print(b) # where I'm stuck at ...
I would like to get to something like this b
,
print(b) # my goal col1 var var_sum DE NO IT N1 col3 12 7 5 N1 col4 0.8 0.6 0.2 N2 col3 7 4 1 2 N2 col4 1.0 0.7 0.1 0.2 N3 col3 8 8 N3 col4 0.6 0.6
I’m not sure what to search for (some of the maybe relevant questions has way too much complexity for my to be able to extract what I need, at least at the moment). I’ve looked a lot at this answer, maybe I should find a way using .groupby()
Advertisement
Answer
Maybe you can compute the sum afterwards:
out = pd.melt(a, ["col1", "col2"], ["col3", "col4"]).pivot( ["col1", "variable"], "col2" ) out["var_sum"] = out.sum(axis=1) out = out.reset_index() out.index.name, out.columns.name = None, None out.columns = [ f"{a}_{b}".replace("value", "").strip("_") for a, b in out.columns ] print(out)
Prints:
col1 variable DE IT NO var_sum 0 N1 col3 7.0 NaN 5.0 12.0 1 N1 col4 0.6 NaN 0.2 0.8 2 N2 col3 4.0 2.0 1.0 7.0 3 N2 col4 0.7 0.2 0.1 1.0 4 N3 col3 8.0 NaN NaN 8.0 5 N3 col4 0.6 NaN NaN 0.6