Skip to content
Advertisement

Python looping over a list to check if any of the list elements are equal to variable values in pandas dataframe

I have a pandas dataframe and I want to create a new dummy variable based on if the values of a variable in my dataframe equal values in a list.

df = pd.DataFrame({'variable1':[1,2,3,4,5,6,7,8], 
    'variable2':['a', 'r', 'b', 'w', 'c', 'p', 'l', 'a']})
my_list = ['a', 'b', 'c', 'd', 'e']

How can I create a new dummy variable for the dataframe, called variable 3, that equals 1 if variable 2 is present in the list and 0 if not?

I tried this using:

df['variable3'] = np.where(
        dataset['variable2'] in my_list,
        1, 0)

However, this throws a ValueError: The truth value of a Series is ambiguous.

I’ve been looking for an answer for this for a long time but none were sufficient for this problem.

Do you have any suggestions?

Advertisement

Answer

You’re almost there. When you want to check if the value of a dataframe column matches some list or another dataframe column, you can use df.isin.

df['variable3'] = np.where(
        df['variable2'].isin(my_list),
        1, 0)

df
Out[16]: 
   variable1 variable2  variable3
0          1         a          1
1          2         r          0
2          3         b          1
3          4         w          0
4          5         c          1
5          6         p          0
6          7         l          0
7          8         a          1
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement