Skip to content
Advertisement

how do I succinctly create a new dataframe column based on matching existing column values with list of values?

I want to create a new column in a dataframe by matching the values in an existing column’s values with a predefined list of values. I have two approaches to this below. Both run but dont give me exactly what I want. I prefer the first approach over the second but not sure where I am going wrong with both. I want the solution to be succinct without having to write out alot of np.where statements.

First approach:

words = [['one man ran','two men ran','three men ran'],['red balloons','white shirt','blue dress']]
df3 = pd.DataFrame(words, columns = ['col1','col2','col3'])

search_words1 = ['three','blue']

def columns(search_words1):
    for i in search_words1:
        return "".join(np.where((df3['col3'].str.contains(i)), i, ""))
        
    
df3['col4'] = df3['col3'].apply(lambda x: columns(x))
df3

incomplete result:

col1    col2    col3    col4
0   one man ran two men ran three men ran   t
1   red balloons    white shirt blue dress  b

Second approach:

search_words1 = ['three','blue']

def my_comments(search_words1):
        return "".join([i for i in search_words1 if any(i in x for x in df3['col3'])])
    

df3['col4'] = df3['col3'].apply(lambda x: my_comments(x))
df3

incomplete result:

col1    col2    col3    col4
0   one man ran two men ran three men ran   three men ran
1   red balloons    white shirt blue dress  blue dress

Desired output for both approaches:

col1    col2    col3    col4
0   one man ran two men ran three men ran   three
1   red balloons    white shirt blue dress  blue

Advertisement

Answer

Use str.extract: create a regex pattern of your search words and try to extract the matched pattern:

pattern = fr"b({'|'.join(search_words1)})b"
df3['col4'] = df3['col3'].str.extract(pattern)

Pattern:

>>> print(pattern)
b(man|red)b

b matches the empty string, but only at the beginning or end of a word. The ( ) is the capture group.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement