Let’s say input was
d = {'col1': [1,2,3,4,5,6,7,8,9,10],
'col2': [1,2,3,4,5,6,7,8,9,10],
'col3': [1,2,3,4,5,6,7,8,9,10],
'offset': [1,2,3,1,2,3,1,2,3,1]}
df = pd.DataFrame(data=d)
I want to create an additional column that looks like this:
df['output'] = [1, 4, 9, 4, 10, 18, 7, 16, 27, 10]
Basically each number in offset is telling you the number of columns to sum over (from col1 as ref point).
Is there a vectorized way to do this without iterating through each value in offset?
Advertisement
Answer
You use np.select. To use it, create each of the column sum (1, 2, 3 … as needed) as the possible choices, and create a boolean masks for each value in offset column as the possible conditons.
# get all possible values from offset
lOffset = df['offset'].unique()
# get te result with np.select
df['output'] = np.select(
# create mask for each values in offset
condlist=[df['offset'].eq(i) for i in lOffset],
# crerate the sum over the number of columns per offset value
choicelist=[df.iloc[:,:i].sum(axis=1) for i in lOffset]
)
print(df)
# col1 col2 col3 offset output
# 0 1 1 1 1 1
# 1 2 2 2 2 4
# 2 3 3 3 3 9
# 3 4 4 4 1 4
# 4 5 5 5 2 10
# 5 6 6 6 3 18
# 6 7 7 7 1 7
# 7 8 8 8 2 16
# 8 9 9 9 3 27
# 9 10 10 10 1 10
Note: this assumes that your offset column is the last one