I have a large three-column dataframe of this form:
Ref Colourref Shaperef 5 red 12 square 15 9 14 blue (circle14,2) 10 6 orange 12 18 square 12 pink1,7 [oval] [40] 14 [green] (rectsq#12,6) ...
And a long list with entries like this:
li = [
'oval 60 [oval] [40]',
'(circle14,2) circ',
'square 20',
'126 18 square 921#',
]
I want to replace the entries in the Shaperef column of the df with a value from the list if the full Shaperef string matches any part of any list item. If there is no match, the entry is not changed.
Desired output:
Ref Colourref Shaperef 5 red 12 square 15 9 14 blue (circle14,2) circ 10 6 orange 12 126 18 square 921# 12 pink1,7 oval 60 [oval] [40] 14 [green] (rectsq#12,6) ...
So refs 9, 10, 12 are updated as there is a partial match with a list item. Refs 5, 14 stay as there are.
Advertisement
Answer
If Shaperef and all the entries in li are all strings you can write a function to apply over Shaperef to convert them:
def f(row_val, seq):
for item in seq:
if row_val in item:
return item
return row_val
Then:
# read in your example
import pandas as pd
from io import StringIO
s = """Ref Colourref Shaperef
5 red 12 square 15
9 14 blue (circle14,2)
10 6 orange 12 18 square
12 pink1,7 [oval] [40]
14 [green] (rectsq#12,6)
"""
li = [
"oval 60 [oval] [40]",
"(circle14,2) circ",
"square 20",
"126 18 square 921#",
]
df = pd.read_csv(StringIO(s), sep=r"ss+", engine="python")
# Apply the function here:
df["Shaperef"] = df["Shaperef"].apply(lambda v: f(v, li))
# Ref Colourref Shaperef
# 0 5 red 12 square 15
# 1 9 14 blue (circle14,2) circ
# 2 10 6 orange 12 126 18 square 921#
# 3 12 pink1,7 oval 60 [oval] [40]
# 4 14 [green] (rectsq#12,6)
This might not be a very quick way of doing this as it has a worst case run time of len(df) * len(li).