I have dataframe_a and dataframe_b filled with an variable number of columns but the same number of rows.
I need to subtract each column of dfb from all dfa columns and create a new dataframe containing the subtracted values.
Right now I’m doing this manually:
sub1 = dfa.subtract(dfb[0], axis = 0) sub2 = dfa.subtract(dfb[1], axis = 0) sub3 = dfa.subtract(dfb[2], axis = 0) etc
then I’m using the concat function to concatenate all the columns:
subbed = pd.concat([sub1, sub2, sub3],axis=1,ignore_index=True) subbed = pd.concat([dfa, subbed),axis = 1)
This all seems horribly inefficient and makes me feel quite bad a programming lol. How would you do this without having to subtract each column manually and directly write the results to a new dataframe?
Advertisement
Answer
Setup
import pandas as pd import numpy as np from itertools import product dfa = pd.DataFrame([[8, 7, 6]], range(5), [*'ABC']) dfb = pd.DataFrame([[1, 2, 3, 4]], range(5), [*'DEFG'])
Pandas’ concat
I use the operator method rsub
with the axis=0
argument. See this Q&A for more information
pd.concat({c: dfb.rsub(s, axis=0) for c, s in dfa.items()}, axis=1) A B C D E F G D E F G D E F G 0 7 6 5 4 6 5 4 3 5 4 3 2 1 7 6 5 4 6 5 4 3 5 4 3 2 2 7 6 5 4 6 5 4 3 5 4 3 2 3 7 6 5 4 6 5 4 3 5 4 3 2 4 7 6 5 4 6 5 4 3 5 4 3 2
Numpy’s broadcasting
You can play around with it and learn how it works
a = dfa.to_numpy() b = dfb.to_numpy() c = a[..., None] - b[:, None] df = pd.DataFrame(dict(zip( product(dfa, dfb), c.reshape(5, -1).transpose() ))) df A B C D E F G D E F G D E F G 0 7 6 5 4 6 5 4 3 5 4 3 2 1 7 6 5 4 6 5 4 3 5 4 3 2 2 7 6 5 4 6 5 4 3 5 4 3 2 3 7 6 5 4 6 5 4 3 5 4 3 2 4 7 6 5 4 6 5 4 3 5 4 3 2