I have a pandas dataframe and I want to create a new dummy variable based on if the values of a variable in my dataframe equal values in a list.
df = pd.DataFrame({'variable1':[1,2,3,4,5,6,7,8], 
    'variable2':['a', 'r', 'b', 'w', 'c', 'p', 'l', 'a']})
my_list = ['a', 'b', 'c', 'd', 'e']
How can I create a new dummy variable for the dataframe, called variable 3, that equals 1 if variable 2 is present in the list and 0 if not?
I tried this using:
df['variable3'] = np.where(
        dataset['variable2'] in my_list,
        1, 0)
However, this throws a ValueError: The truth value of a Series is ambiguous.
I’ve been looking for an answer for this for a long time but none were sufficient for this problem.
Do you have any suggestions?
Advertisement
Answer
You’re almost there. When you want to check if the value of a dataframe column matches some list or another dataframe column, you can use df.isin.
df['variable3'] = np.where(
        df['variable2'].isin(my_list),
        1, 0)
df
Out[16]: 
   variable1 variable2  variable3
0          1         a          1
1          2         r          0
2          3         b          1
3          4         w          0
4          5         c          1
5          6         p          0
6          7         l          0
7          8         a          1