I need to compare each value of a list to each value of a df column, and if there is a match take the value of another column.
I have a couple of loops working with iterrows but the code is taking a long time to run. I was wondering if there is a more efficient way to do this? It seems .loc might be a good answer but the docs aren’t super clear on how to make it work for this usecase.
My code so far is
listy = [] for view in joined_views: for row in df.iterrows(): if view == row[1]['other_view']: listy.append(row[1]['other_column']
Advertisement
Answer
Pandas is built to apply operations across a group of data. iterrows
is a relatively slow process to use when a group operation isn’t available. In your case, isin
will select the rows you want, and then you can grab the other column.
This can be written as
import pandas as pd df = pd.DataFrame({"other_view":[1,2,3,4,5], "other_column":["a", "b", "c", "d", "e"]}) joined_views = [1, 4, 100, 900, 1000] listy = df[df.other_view.isin(joined_viewss)].other_column print(listy)
or, if you prefer to name the columns as strings
df[df["other_view"].isin(joined_views)]["other_column"]
In words, select df rows where other_view is in joined_views, then take the other_column values.