Let’s say input was
d = {'col1': [1,2,3,4,5,6,7,8,9,10], 'col2': [1,2,3,4,5,6,7,8,9,10], 'col3': [1,2,3,4,5,6,7,8,9,10], 'offset': [1,2,3,1,2,3,1,2,3,1]} df = pd.DataFrame(data=d)
I want to create an additional column that looks like this:
df['output'] = [1, 4, 9, 4, 10, 18, 7, 16, 27, 10]
Basically each number in offset
is telling you the number of columns to sum over (from col1
as ref point).
Is there a vectorized way to do this without iterating through each value in offset
?
Advertisement
Answer
You use np.select. To use it, create each of the column sum
(1, 2, 3 … as needed) as the possible choices, and create a boolean masks for each value in offset column as the possible conditons.
# get all possible values from offset lOffset = df['offset'].unique() # get te result with np.select df['output'] = np.select( # create mask for each values in offset condlist=[df['offset'].eq(i) for i in lOffset], # crerate the sum over the number of columns per offset value choicelist=[df.iloc[:,:i].sum(axis=1) for i in lOffset] ) print(df) # col1 col2 col3 offset output # 0 1 1 1 1 1 # 1 2 2 2 2 4 # 2 3 3 3 3 9 # 3 4 4 4 1 4 # 4 5 5 5 2 10 # 5 6 6 6 3 18 # 6 7 7 7 1 7 # 7 8 8 8 2 16 # 8 9 9 9 3 27 # 9 10 10 10 1 10
Note: this assumes that your offset column is the last one