Skip to content
Advertisement

Alternative way of writing for loop and if in python when working with a dataframe to make it faster

I have a data frame named ‘plans_to_csv’ looking like this:

enter image description here

I need to do the following analysis to realize what is the actual mode. But this takes so long to run. Is there an alternative way for writing this code to make it faster? Thanks a lot for your help in advance.

for i in range (0, len(plans_to_csv)-2):
    if (plans_to_csv['mode'][i+1]=='walk' and plans_to_csv['type'][i+2]=='car interaction' and 
        plans_to_csv['person_id'][i]==plans_to_csv['person_id'][i+2]):

        plans_to_csv['actual_mode_car'][i]=1

Advertisement

Answer

You can shift the columsn and do comparisons. That will make use of vectorization and should be faster.

selection = (plans_to_csv['mode'].shift(-1) == 'walk') & (plans_to_csv['type'].shift(-2)=='car interaction') & (plans_to_csv['person_id'] == plans_to_csv['person_id'].shift(-2))
plans_to_csv['actual_mode_car']= selection.astype(int)

Note that this sets all the entries to 0 that don’t match the comparison. If this is not wanted, you can just do plans_to_csv[‘actual_mode_car’][selection]= 1

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement