Skip to content
Advertisement

get string from list if it’s contained in another string column

I’ve a simple column of strings, and a list of strings.

strings_col
"the cat is on the table"
"the dog is eating"

list1 = ["cat", "table", "dog"]

I need to create another column in which every row contains the string contained in the list if they are in the string_col, if it contains two or more strings from the list, then I’d like to have more rows. The result should be something like this:

 strings_col                   string
"the cat is on the table"      cat
"the cat is on the table"      table
"the dog is eating"            dog

How can I do that? thanks

Advertisement

Answer

You can use str.findall:

>>> df.assign(string=df.strings_col.str.findall(r'|'.join(list1))).explode('string')

                 strings_col string
0  "the cat is on the table"    cat
0  "the cat is on the table"  table
1        "the dog is eating"    dog

If you want you can reset_index after that:

>>> df.assign(
        string=df.strings_col.str.findall(r'|'.join(list1))
    ).explode('string').reset_index(drop=True)
                 strings_col string
0  "the cat is on the table"    cat
1  "the cat is on the table"  table
2        "the dog is eating"    dog
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement