I have a large three-column dataframe of this form:
JavaScript
x
8
1
Ref Colourref Shaperef
2
5 red 12 square 15
3
9 14 blue (circle14,2)
4
10 6 orange 12 18 square
5
12 pink1,7 [oval] [40]
6
14 [green] (rectsq#12,6)
7
8
And a long list with entries like this:
JavaScript
1
7
1
li = [
2
'oval 60 [oval] [40]',
3
'(circle14,2) circ',
4
'square 20',
5
'126 18 square 921#',
6
]
7
I want to replace the entries in the Shaperef column of the df with a value from the list if the full Shaperef string matches any part of any list item. If there is no match, the entry is not changed.
Desired output:
JavaScript
1
8
1
Ref Colourref Shaperef
2
5 red 12 square 15
3
9 14 blue (circle14,2) circ
4
10 6 orange 12 126 18 square 921#
5
12 pink1,7 oval 60 [oval] [40]
6
14 [green] (rectsq#12,6)
7
8
So refs 9, 10, 12 are updated as there is a partial match with a list item. Refs 5, 14 stay as there are.
Advertisement
Answer
If Shaperef
and all the entries in li
are all strings you can write a function to apply over Shaperef
to convert them:
JavaScript
1
6
1
def f(row_val, seq):
2
for item in seq:
3
if row_val in item:
4
return item
5
return row_val
6
Then:
JavaScript
1
28
28
1
# read in your example
2
import pandas as pd
3
from io import StringIO
4
5
s = """Ref Colourref Shaperef
6
5 red 12 square 15
7
9 14 blue (circle14,2)
8
10 6 orange 12 18 square
9
12 pink1,7 [oval] [40]
10
14 [green] (rectsq#12,6)
11
"""
12
li = [
13
"oval 60 [oval] [40]",
14
"(circle14,2) circ",
15
"square 20",
16
"126 18 square 921#",
17
]
18
df = pd.read_csv(StringIO(s), sep=r"ss+", engine="python")
19
20
# Apply the function here:
21
df["Shaperef"] = df["Shaperef"].apply(lambda v: f(v, li))
22
# Ref Colourref Shaperef
23
# 0 5 red 12 square 15
24
# 1 9 14 blue (circle14,2) circ
25
# 2 10 6 orange 12 126 18 square 921#
26
# 3 12 pink1,7 oval 60 [oval] [40]
27
# 4 14 [green] (rectsq#12,6)
28
This might not be a very quick way of doing this as it has a worst case run time of len(df) * len(li)
.