I have a pandas dataframe and I want to create a new dummy variable based on if the values of a variable in my dataframe equal values in a list.
JavaScript
x
4
1
df = pd.DataFrame({'variable1':[1,2,3,4,5,6,7,8],
2
'variable2':['a', 'r', 'b', 'w', 'c', 'p', 'l', 'a']})
3
my_list = ['a', 'b', 'c', 'd', 'e']
4
How can I create a new dummy variable for the dataframe, called variable 3, that equals 1 if variable 2 is present in the list and 0 if not?
I tried this using:
JavaScript
1
4
1
df['variable3'] = np.where(
2
dataset['variable2'] in my_list,
3
1, 0)
4
However, this throws a ValueError: The truth value of a Series is ambiguous.
I’ve been looking for an answer for this for a long time but none were sufficient for this problem.
Do you have any suggestions?
Advertisement
Answer
You’re almost there. When you want to check if the value of a dataframe column matches some list or another dataframe column, you can use df.isin
.
JavaScript
1
16
16
1
df['variable3'] = np.where(
2
df['variable2'].isin(my_list),
3
1, 0)
4
5
df
6
Out[16]:
7
variable1 variable2 variable3
8
0 1 a 1
9
1 2 r 0
10
2 3 b 1
11
3 4 w 0
12
4 5 c 1
13
5 6 p 0
14
6 7 l 0
15
7 8 a 1
16