I have dataframe_a and dataframe_b filled with an variable number of columns but the same number of rows.
I need to subtract each column of dfb from all dfa columns and create a new dataframe containing the subtracted values.
Right now I’m doing this manually:
sub1 = dfa.subtract(dfb[0], axis = 0) sub2 = dfa.subtract(dfb[1], axis = 0) sub3 = dfa.subtract(dfb[2], axis = 0) etc
then I’m using the concat function to concatenate all the columns:
subbed = pd.concat([sub1, sub2, sub3],axis=1,ignore_index=True) subbed = pd.concat([dfa, subbed),axis = 1)
This all seems horribly inefficient and makes me feel quite bad a programming lol. How would you do this without having to subtract each column manually and directly write the results to a new dataframe?
Advertisement
Answer
Setup
import pandas as pd import numpy as np from itertools import product dfa = pd.DataFrame([[8, 7, 6]], range(5), [*'ABC']) dfb = pd.DataFrame([[1, 2, 3, 4]], range(5), [*'DEFG'])
Pandas’ concat
I use the operator method rsub with the axis=0 argument. See this Q&A for more information
pd.concat({c: dfb.rsub(s, axis=0) for c, s in dfa.items()}, axis=1)
A B C
D E F G D E F G D E F G
0 7 6 5 4 6 5 4 3 5 4 3 2
1 7 6 5 4 6 5 4 3 5 4 3 2
2 7 6 5 4 6 5 4 3 5 4 3 2
3 7 6 5 4 6 5 4 3 5 4 3 2
4 7 6 5 4 6 5 4 3 5 4 3 2
Numpy’s broadcasting
You can play around with it and learn how it works
a = dfa.to_numpy()
b = dfb.to_numpy()
c = a[..., None] - b[:, None]
df = pd.DataFrame(dict(zip(
product(dfa, dfb),
c.reshape(5, -1).transpose()
)))
df
A B C
D E F G D E F G D E F G
0 7 6 5 4 6 5 4 3 5 4 3 2
1 7 6 5 4 6 5 4 3 5 4 3 2
2 7 6 5 4 6 5 4 3 5 4 3 2
3 7 6 5 4 6 5 4 3 5 4 3 2
4 7 6 5 4 6 5 4 3 5 4 3 2