I’ve a simple column of strings, and a list of strings.
strings_col "the cat is on the table" "the dog is eating" list1 = ["cat", "table", "dog"]
I need to create another column in which every row contains the string contained in the list if they are in the string_col, if it contains two or more strings from the list, then I’d like to have more rows. The result should be something like this:
strings_col string "the cat is on the table" cat "the cat is on the table" table "the dog is eating" dog
How can I do that? thanks
Advertisement
Answer
You can use str.findall
:
>>> df.assign(string=df.strings_col.str.findall(r'|'.join(list1))).explode('string') strings_col string 0 "the cat is on the table" cat 0 "the cat is on the table" table 1 "the dog is eating" dog
If you want you can reset_index
after that:
>>> df.assign( string=df.strings_col.str.findall(r'|'.join(list1)) ).explode('string').reset_index(drop=True) strings_col string 0 "the cat is on the table" cat 1 "the cat is on the table" table 2 "the dog is eating" dog