Apply string in list according to beginning of the strings in a pandas dataframe column

Question

Let&#8217;s take an example. I have a list of categories that are identified : The strings in that list can&#8217;t be a substring of another string in that list. And a dataframe : I would like to add a column Category to this dataframe. If the string in the column Items starts as a string in L_known_categori…

Accepted Answer

You can use regex in pandas.Series.str.extract:>>> df['Category'] = df['Items'].str.title().str.extract(        '(^'         + '|'.join(L_known_categories)         + ')'    )[0].fillna(df['Items'])>>> df    Items                   Category0   green apple             Green1   blue bottle             blue bottle2   RED APPLE               Red3   Green paper             Green4   Black & White glasses   Black & White5   An orange fruit         An orange fruit

Advertisement

Answer