I’ve a simple column of strings, and a list of strings.
JavaScript
x
6
1
strings_col
2
"the cat is on the table"
3
"the dog is eating"
4
5
list1 = ["cat", "table", "dog"]
6
I need to create another column in which every row contains the string contained in the list if they are in the string_col, if it contains two or more strings from the list, then I’d like to have more rows. The result should be something like this:
JavaScript
1
5
1
strings_col string
2
"the cat is on the table" cat
3
"the cat is on the table" table
4
"the dog is eating" dog
5
How can I do that? thanks
Advertisement
Answer
You can use str.findall
:
JavaScript
1
7
1
>>> df.assign(string=df.strings_col.str.findall(r'|'.join(list1))).explode('string')
2
3
strings_col string
4
0 "the cat is on the table" cat
5
0 "the cat is on the table" table
6
1 "the dog is eating" dog
7
If you want you can reset_index
after that:
JavaScript
1
8
1
>>> df.assign(
2
string=df.strings_col.str.findall(r'|'.join(list1))
3
).explode('string').reset_index(drop=True)
4
strings_col string
5
0 "the cat is on the table" cat
6
1 "the cat is on the table" table
7
2 "the dog is eating" dog
8