I have a data frame named ‘plans_to_csv’ looking like this:
I need to do the following analysis to realize what is the actual mode. But this takes so long to run. Is there an alternative way for writing this code to make it faster? Thanks a lot for your help in advance.
for i in range (0, len(plans_to_csv)-2): if (plans_to_csv['mode'][i+1]=='walk' and plans_to_csv['type'][i+2]=='car interaction' and plans_to_csv['person_id'][i]==plans_to_csv['person_id'][i+2]): plans_to_csv['actual_mode_car'][i]=1
Advertisement
Answer
You can shift the columsn and do comparisons. That will make use of vectorization and should be faster.
selection = (plans_to_csv['mode'].shift(-1) == 'walk') & (plans_to_csv['type'].shift(-2)=='car interaction') & (plans_to_csv['person_id'] == plans_to_csv['person_id'].shift(-2)) plans_to_csv['actual_mode_car']= selection.astype(int)
Note that this sets all the entries to 0 that don’t match the comparison. If this is not wanted, you can just do plans_to_csv[‘actual_mode_car’][selection]= 1