Skip to content
Advertisement

Python dataframe vectorizing for loop

I would like to vectorize this piece of python code with for loop conditioned on current state for speed and efficiency.

values for df_B are computed based on current-state (state) AND corresponding df_A value.

Any ideas would be appreciated.

import pandas as pd
df_A = pd.DataFrame({'a': [0, 1, -1, -1, 1, -1, 0, 0] ,})
df_B = pd.DataFrame( data=0, index=df_A.index, columns=['b'])
print(df_A)

state = 0
for index, iter in df_A.iterrows():
    if df_A.loc[index ,'a'] == -1:
        df_B.loc[index ,'b'] = -10 -state
    elif df_A.loc[index, 'a'] == 1:
        df_B.loc[index, 'b'] = 10 - state
    elif df_A.loc[index, 'a'] == 0:
        df_B.loc[index, 'b'] = 0 - state
    temp_state = state
    state += df_B.loc[index, 'b']
print(df_B)

Advertisement

Answer

This seems overkill. Your state variable basically is the previous value in df_A['a']*10. So we can just use shift:

s = df_A['a'].mul(10) 

df_B['b'] = s - s.shift(fill_value=0)
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement