Python pandas replace based on partial match with list item

Question

I have a large three-column dataframe of this form: And a long list with entries like this: I want to replace the entries in the Shaperef column of the df with a value from the list if the full Shaperef string matches any part of any list item. If there is no match, the entry is not changed. Desired output:

Accepted Answer

If Shaperef and all the entries in li are all strings you can write a function to apply over Shaperef to convert them:def f(row_val, seq):    for item in seq:        if row_val in item:            return item    return row_valThen:# read in your exampleimport pandas as pdfrom io import StringIOs = """Ref    Colourref      Shaperef      5      red 12         square 159      14 blue        (circle14,2)  10     6 orange 12    18 square12     pink1,7        [oval] [40]14     [green]        (rectsq#12,6)"""li = [    "oval 60 [oval] [40]",    "(circle14,2) circ",    "square 20",    "126 18 square 921#",]df = pd.read_csv(StringIO(s), sep=r"ss+", engine="python")# Apply the function here:df["Shaperef"] = df["Shaperef"].apply(lambda v: f(v, li))#    Ref    Colourref             Shaperef# 0    5       red 12            square 15# 1    9      14 blue    (circle14,2) circ# 2   10  6 orange 12   126 18 square 921## 3   12      pink1,7  oval 60 [oval] [40]# 4   14      [green]        (rectsq#12,6)This might not be a very quick way of doing this as it has a worst case run time of len(df) * len(li).

Advertisement

Answer