I have a large three-column dataframe of this form:
Ref Colourref Shaperef 5 red 12 square 15 9 14 blue (circle14,2) 10 6 orange 12 18 square 12 pink1,7 [oval] [40] 14 [green] (rectsq#12,6) ...
And a long list with entries like this:
li = [ 'oval 60 [oval] [40]', '(circle14,2) circ', 'square 20', '126 18 square 921#', ]
I want to replace the entries in the Shaperef column of the df with a value from the list if the full Shaperef string matches any part of any list item. If there is no match, the entry is not changed.
Desired output:
Ref Colourref Shaperef 5 red 12 square 15 9 14 blue (circle14,2) circ 10 6 orange 12 126 18 square 921# 12 pink1,7 oval 60 [oval] [40] 14 [green] (rectsq#12,6) ...
So refs 9, 10, 12 are updated as there is a partial match with a list item. Refs 5, 14 stay as there are.
Advertisement
Answer
If Shaperef
and all the entries in li
are all strings you can write a function to apply over Shaperef
to convert them:
def f(row_val, seq): for item in seq: if row_val in item: return item return row_val
Then:
# read in your example import pandas as pd from io import StringIO s = """Ref Colourref Shaperef 5 red 12 square 15 9 14 blue (circle14,2) 10 6 orange 12 18 square 12 pink1,7 [oval] [40] 14 [green] (rectsq#12,6) """ li = [ "oval 60 [oval] [40]", "(circle14,2) circ", "square 20", "126 18 square 921#", ] df = pd.read_csv(StringIO(s), sep=r"ss+", engine="python") # Apply the function here: df["Shaperef"] = df["Shaperef"].apply(lambda v: f(v, li)) # Ref Colourref Shaperef # 0 5 red 12 square 15 # 1 9 14 blue (circle14,2) circ # 2 10 6 orange 12 126 18 square 921# # 3 12 pink1,7 oval 60 [oval] [40] # 4 14 [green] (rectsq#12,6)
This might not be a very quick way of doing this as it has a worst case run time of len(df) * len(li)
.