Skip to content
Advertisement

creating pandas function equivalent for EXCEL OFFSET function

Let’s say input was

d = {'col1': [1,2,3,4,5,6,7,8,9,10], 
 'col2': [1,2,3,4,5,6,7,8,9,10],
 'col3': [1,2,3,4,5,6,7,8,9,10],
 'offset': [1,2,3,1,2,3,1,2,3,1]} 

df = pd.DataFrame(data=d)

I want to create an additional column that looks like this:

df['output'] = [1, 4, 9, 4, 10, 18, 7, 16, 27, 10]

Basically each number in offset is telling you the number of columns to sum over (from col1 as ref point).

Is there a vectorized way to do this without iterating through each value in offset?

Advertisement

Answer

You use np.select. To use it, create each of the column sum (1, 2, 3 … as needed) as the possible choices, and create a boolean masks for each value in offset column as the possible conditons.

# get all possible values from offset
lOffset = df['offset'].unique()

# get te result with np.select
df['output'] = np.select(
    # create mask for each values in offset
    condlist=[df['offset'].eq(i) for i in lOffset],
    # crerate the sum over the number of columns per offset value
    choicelist=[df.iloc[:,:i].sum(axis=1) for i in lOffset]
)
print(df)
#    col1  col2  col3  offset  output
# 0     1     1     1       1       1
# 1     2     2     2       2       4
# 2     3     3     3       3       9
# 3     4     4     4       1       4
# 4     5     5     5       2      10
# 5     6     6     6       3      18
# 6     7     7     7       1       7
# 7     8     8     8       2      16
# 8     9     9     9       3      27
# 9    10    10    10       1      10

Note: this assumes that your offset column is the last one

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement