want to extract color from the product descriptions. I tried to use NER but it was nt successful. Now I am trying to define a list and match it with description.
I have data in dataframe column like this:
Description: Tampered black round grey/natural swing with yellow load-bearing cap
I defined also the list of colors
attributes =['red','blue','black','violet','grey','natural','beige','silver']
What I did was to create a matcher
def matcher(x): for i in attributes: if i in x.lower(): return i else: return np.nan
And I applied it to the df
df['Colours'] = df['Description pre-work'].apply(matcher)
The result is horrible too. I get result:
matcher('Tampered black round grey/natural swing with yellow load-bearing cap') red
How can I retrieve all the matches into list and store them in separate column in pandas? Expected output:
['black','grey','natural','yellow']
How can I prevent having red as match where there is no red?
I thought I would use
findall function
to retrieve the data how I want them but also that doesnt help me…
Lost. Thanks for help!
Advertisement
Answer
Jezreel’s first answer is very good! however when using
df['Colours'] = df['Description pre-work'].str.findall('|'.join(attributes), flags=re.I)
it will always find red when words such as “Tampered ” and such I suggest an easy quick fix (which is not the most robust one) but
def matcher(desc): colors = [] # split sentence to words and find and exact much words = desc.lower().replace(';', ' ').replace('-', ' ').replace('/', ' ').split(" ") for color in attributes: if color in words: colors.append(color) return colors