I have dataframe_a and dataframe_b filled with an variable number of columns but the same number of rows.
I need to subtract each column of dfb from all dfa columns and create a new dataframe containing the subtracted values.
Right now I’m doing this manually:
JavaScript
x
5
1
sub1 = dfa.subtract(dfb[0], axis = 0)
2
sub2 = dfa.subtract(dfb[1], axis = 0)
3
sub3 = dfa.subtract(dfb[2], axis = 0)
4
etc
5
then I’m using the concat function to concatenate all the columns:
JavaScript
1
3
1
subbed = pd.concat([sub1, sub2, sub3],axis=1,ignore_index=True)
2
subbed = pd.concat([dfa, subbed),axis = 1)
3
This all seems horribly inefficient and makes me feel quite bad a programming lol. How would you do this without having to subtract each column manually and directly write the results to a new dataframe?
Advertisement
Answer
Setup
JavaScript
1
7
1
import pandas as pd
2
import numpy as np
3
from itertools import product
4
5
dfa = pd.DataFrame([[8, 7, 6]], range(5), [*'ABC'])
6
dfb = pd.DataFrame([[1, 2, 3, 4]], range(5), [*'DEFG'])
7
Pandas’ concat
I use the operator method rsub
with the axis=0
argument. See this Q&A for more information
JavaScript
1
10
10
1
pd.concat({c: dfb.rsub(s, axis=0) for c, s in dfa.items()}, axis=1)
2
3
A B C
4
D E F G D E F G D E F G
5
0 7 6 5 4 6 5 4 3 5 4 3 2
6
1 7 6 5 4 6 5 4 3 5 4 3 2
7
2 7 6 5 4 6 5 4 3 5 4 3 2
8
3 7 6 5 4 6 5 4 3 5 4 3 2
9
4 7 6 5 4 6 5 4 3 5 4 3 2
10
Numpy’s broadcasting
You can play around with it and learn how it works
JavaScript
1
19
19
1
a = dfa.to_numpy()
2
b = dfb.to_numpy()
3
c = a[ , None] - b[:, None]
4
5
df = pd.DataFrame(dict(zip(
6
product(dfa, dfb),
7
c.reshape(5, -1).transpose()
8
)))
9
10
df
11
12
A B C
13
D E F G D E F G D E F G
14
0 7 6 5 4 6 5 4 3 5 4 3 2
15
1 7 6 5 4 6 5 4 3 5 4 3 2
16
2 7 6 5 4 6 5 4 3 5 4 3 2
17
3 7 6 5 4 6 5 4 3 5 4 3 2
18
4 7 6 5 4 6 5 4 3 5 4 3 2
19