Python Pandas find rows that match a pattern using column first characters and a set of values to match

Question

I have a sorted DataFrame by company_name: I would like to select the rows which have the first 3 letters in common and have the following rows ending with "u" or "w". Ideally I would like the result to look like this (including the "main" name as an extra column). Assume that the start of the company_name has to contain

Accepted Answer

Let&#8217;s try:# extract company name by removing ending `uw`s = df.company_name.str.extract('(.*)[uw]$', expand=False)company_names = s.fillna(df.company_name)# valid names are those appear alone and with `uw`valid_names = s.isna().groupby(company_names).transform('nunique') == 2df['main_name'] = company_names.where(valid_names)Output:  company_name main_name0         abcd      abcd1        abcdu      abcd2        abcdw      abcd3          efg       efg4         efgu       efg5        zvttu       NaN6        zvttw       NaN

Advertisement

Answer