Skip to content
Advertisement

Pandas best way to iterate over rows quickly

I need to compare each value of a list to each value of a df column, and if there is a match take the value of another column.

I have a couple of loops working with iterrows but the code is taking a long time to run. I was wondering if there is a more efficient way to do this? It seems .loc might be a good answer but the docs aren’t super clear on how to make it work for this usecase.

My code so far is

listy = []
for view in joined_views:
    for row in df.iterrows():
        if view == row[1]['other_view']:
            listy.append(row[1]['other_column']

Advertisement

Answer

Pandas is built to apply operations across a group of data. iterrows is a relatively slow process to use when a group operation isn’t available. In your case, isin will select the rows you want, and then you can grab the other column.

This can be written as

import pandas as pd
df = pd.DataFrame({"other_view":[1,2,3,4,5], 
    "other_column":["a", "b", "c", "d", "e"]})
joined_views = [1, 4, 100, 900, 1000]
listy = df[df.other_view.isin(joined_viewss)].other_column
print(listy)

or, if you prefer to name the columns as strings

df[df["other_view"].isin(joined_views)]["other_column"]

In words, select df rows where other_view is in joined_views, then take the other_column values.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement