I have a pandas dataframe and I want to create a new dummy variable based on if the values of a variable in my dataframe equal values in a list.
df = pd.DataFrame({'variable1':[1,2,3,4,5,6,7,8], 'variable2':['a', 'r', 'b', 'w', 'c', 'p', 'l', 'a']}) my_list = ['a', 'b', 'c', 'd', 'e']
How can I create a new dummy variable for the dataframe, called variable 3, that equals 1 if variable 2 is present in the list and 0 if not?
I tried this using:
df['variable3'] = np.where( dataset['variable2'] in my_list, 1, 0)
However, this throws a ValueError: The truth value of a Series is ambiguous.
I’ve been looking for an answer for this for a long time but none were sufficient for this problem.
Do you have any suggestions?
Advertisement
Answer
You’re almost there. When you want to check if the value of a dataframe column matches some list or another dataframe column, you can use df.isin
.
df['variable3'] = np.where( df['variable2'].isin(my_list), 1, 0) df Out[16]: variable1 variable2 variable3 0 1 a 1 1 2 r 0 2 3 b 1 3 4 w 0 4 5 c 1 5 6 p 0 6 7 l 0 7 8 a 1