Skip to content
Advertisement

Retrieve all occurrencies from selected attributes to separate column in pandas

want to extract color from the product descriptions. I tried to use NER but it was nt successful. Now I am trying to define a list and match it with description.

I have data in dataframe column like this:

Description: Tampered black round grey/natural swing with yellow load-bearing cap

I defined also the list of colors

attributes =['red','blue','black','violet','grey','natural','beige','silver']

What I did was to create a matcher

def matcher(x):
   for i in attributes:
       if i in x.lower():
           return i
   else:
       return np.nan

And I applied it to the df

df['Colours'] = df['Description pre-work'].apply(matcher)

The result is horrible too. I get result:

matcher('Tampered black round grey/natural swing with yellow load-bearing cap')

red

How can I retrieve all the matches into list and store them in separate column in pandas? Expected output:

['black','grey','natural','yellow']

How can I prevent having red as match where there is no red?

I thought I would use

findall function

to retrieve the data how I want them but also that doesnt help me…

Lost. Thanks for help!

Advertisement

Answer

Jezreel’s first answer is very good! however when using

df['Colours'] = df['Description pre-work'].str.findall('|'.join(attributes), flags=re.I)

it will always find red when words such as “Tampered ” and such I suggest an easy quick fix (which is not the most robust one) but

def matcher(desc):
    colors = []
    # split sentence to words and find and exact much
    words = desc.lower().replace(';', ' ').replace('-', ' ').replace('/', ' ').split(" ")
    for color in attributes:
        if color in words:
            colors.append(color)
    return colors
 
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement