I have a dataset, df, where I would like to create columns that display the output of a subtraction calculation:
Data
count power id p_q122 p_q222 c_q122 c_q222 100 1000 aa 200 300 10 20 100 2000 bb 400 500 5 10
Desired
cnt pwr id p_q122 avail1 p_q222 avail2 c_q122 count1 c_q222 count2 100 1000 aa 200 800 300 700 10 90 20 80 100 2000 bb 400 1600 500 1500 5 95 10 90
Doing
a = df['avail1'] = + df['pwr'] - df['p_q122'] b = df['avail2'] = + df['pwr'] - df['p_q222']
I am looking for a more elegant way that provides the desire output. Any suggestion is appreciated.
Advertisement
Answer
We can perform 2D subtraction with numpy:
pd.DataFrame(
    df['power'].to_numpy()[:, None] - df.filter(like='p_').to_numpy()
).rename(columns=lambda i: f'avail{i + 1}')
   avail1  avail2
0     800     700
1    1600    1500
Benefit here is that, no matter how many p_ columns there are, all will be subtracted from the power column.
We can concat all of the computations with df like:
df = pd.concat([
    df,
    # power calculations
    pd.DataFrame(
        df['power'].to_numpy()[:, None] - df.filter(like='p_').to_numpy()
    ).rename(columns=lambda i: f'avail{i + 1}'),
    # Count calculations
    pd.DataFrame(
        df['count'].to_numpy()[:, None] - df.filter(like='c_').to_numpy()
    ).rename(columns=lambda i: f'count{i + 1}'),
], axis=1)
which gives df:
count power id p_q122 p_q222 ... c_q222 avail1 avail2 count1 count2 0 100 1000 aa 200 300 ... 20 800 700 90 80 1 100 2000 bb 400 500 ... 10 1600 1500 95 90 [2 rows x 11 columns]
If we have many column groups to do, we can build the list of DataFrames programmatically as well:
df = pd.concat([df, *(
    pd.DataFrame(
        df[col].to_numpy()[:, None] - df.filter(like=filter_prefix).to_numpy()
    ).rename(columns=lambda i: f'{new_prefix}{i + 1}')
    for col, filter_prefix, new_prefix in [
        ('power', 'p_', 'avail'),
        ('count', 'c_', 'count')
    ]
)], axis=1)
Setup and imports:
import pandas as pd
df = pd.DataFrame({
    'count': [100, 100], 'power': [1000, 2000], 'id': ['aa', 'bb'],
    'p_q122': [200, 400], 'p_q222': [300, 500], 'c_q122': [10, 5],
    'c_q222': [20, 10]
})