I have a dataset, df, where I would like to create columns that display the output of a subtraction calculation:
Data
count power id p_q122 p_q222 c_q122 c_q222 100 1000 aa 200 300 10 20 100 2000 bb 400 500 5 10
Desired
cnt pwr id p_q122 avail1 p_q222 avail2 c_q122 count1 c_q222 count2 100 1000 aa 200 800 300 700 10 90 20 80 100 2000 bb 400 1600 500 1500 5 95 10 90
Doing
a = df['avail1'] = + df['pwr'] - df['p_q122'] b = df['avail2'] = + df['pwr'] - df['p_q222']
I am looking for a more elegant way that provides the desire output. Any suggestion is appreciated.
Advertisement
Answer
We can perform 2D subtraction with numpy:
pd.DataFrame( df['power'].to_numpy()[:, None] - df.filter(like='p_').to_numpy() ).rename(columns=lambda i: f'avail{i + 1}') avail1 avail2 0 800 700 1 1600 1500
Benefit here is that, no matter how many p_
columns there are, all will be subtracted from the power
column.
We can concat
all of the computations with df like:
df = pd.concat([ df, # power calculations pd.DataFrame( df['power'].to_numpy()[:, None] - df.filter(like='p_').to_numpy() ).rename(columns=lambda i: f'avail{i + 1}'), # Count calculations pd.DataFrame( df['count'].to_numpy()[:, None] - df.filter(like='c_').to_numpy() ).rename(columns=lambda i: f'count{i + 1}'), ], axis=1)
which gives df
:
count power id p_q122 p_q222 ... c_q222 avail1 avail2 count1 count2 0 100 1000 aa 200 300 ... 20 800 700 90 80 1 100 2000 bb 400 500 ... 10 1600 1500 95 90 [2 rows x 11 columns]
If we have many column groups to do, we can build the list of DataFrames programmatically as well:
df = pd.concat([df, *( pd.DataFrame( df[col].to_numpy()[:, None] - df.filter(like=filter_prefix).to_numpy() ).rename(columns=lambda i: f'{new_prefix}{i + 1}') for col, filter_prefix, new_prefix in [ ('power', 'p_', 'avail'), ('count', 'c_', 'count') ] )], axis=1)
Setup and imports:
import pandas as pd df = pd.DataFrame({ 'count': [100, 100], 'power': [1000, 2000], 'id': ['aa', 'bb'], 'p_q122': [200, 400], 'p_q222': [300, 500], 'c_q122': [10, 5], 'c_q222': [20, 10] })