Say I have a pandas DataFrame like so:
import pandas as pd df = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6], 'add': [10, 20, 30]})
I would like to perform an operation on each cell in columns ‘a’ and ‘b’ that includes both the cell value and the value of the ‘add’ column for that row. Here’s an example operation:
def add_vals(val, adder): if adder % val == 0: return val + adder else: return val + (val / adder)
I know I can do this with df.apply
, but I haven’t been able to figure out how to add the value in the add
column to the function. My guess is the syntax is close to this, but I haven’t gotten it to work:
df.apply(lambda x: x.apply(add_vals, args=(x['add'])))
What’s the best way to do this in pandas? It doesn’t have to be the most efficient, but I would like it to be good pandas code.
EDIT:
The output should look like this:
output = pd.DataFrame({'a': [11,22,33], 'b': [4.4,25,36]})
Advertisement
Answer
Vectorize add_vals
method with numpy.where
:
import numpy as np def add_vals(vals, adders): return np.where(adders % vals == 0, vals + adders, vals + (vals / adders))
The method gives the transformation of a single column if you pass in a
or b
with the add
column as 2nd parameter:
add_vals(df['a'], df['add']) # [11. 22. 33.]
And then you can apply the method to each column (a
and b
) you want to transform:
df[['a', 'b']].apply(add_vals, adders=df['add']) # a b #0 11.0 4.4 #1 22.0 25.0 #2 33.0 36.0