I’ve a simple column of strings, and a list of strings.
strings_col "the cat is on the table" "the dog is eating" list1 = ["cat", "table", "dog"]
I need to create another column in which every row contains the string contained in the list if they are in the string_col, if it contains two or more strings from the list, then I’d like to have more rows. The result should be something like this:
strings_col string "the cat is on the table" cat "the cat is on the table" table "the dog is eating" dog
How can I do that? thanks
Advertisement
Answer
You can use str.findall:
>>> df.assign(string=df.strings_col.str.findall(r'|'.join(list1))).explode('string')
strings_col string
0 "the cat is on the table" cat
0 "the cat is on the table" table
1 "the dog is eating" dog
If you want you can reset_index after that:
>>> df.assign(
string=df.strings_col.str.findall(r'|'.join(list1))
).explode('string').reset_index(drop=True)
strings_col string
0 "the cat is on the table" cat
1 "the cat is on the table" table
2 "the dog is eating" dog