I’m working on a huge file that has names in columns that contain extraneous values (like the “|” key) that I want to remove, but for some reason my str.replace
function only seems to apply to some rows in the column.
My column in the dataframe summary
looks something like this:
JavaScript
x
8
1
Labels
2
test|test 1
3
test 2
4
test 3
5
test|test 4
6
test|test 5
7
test 6
8
As you can see, some columns are already how i want them to be, only containing the name “test #”, but some have “test|” in front, which I want removed.
My function to remove them is like this:
JavaScript
1
2
1
correction = summary["Labels"].str.replace('test|', '')
2
It seems to work for most of the values, but when I check for pipes (“|”) in the dataframe (once i merged correction
with summary
), it says it finds 9330 of them:
JavaScript
1
13
13
1
found = summary[summary['Labels'].str.contains('|',regex=False)]
2
print(len(found))
3
print(found['Labels'].value_counts())
4
5
Results
6
9330
7
test|test-667 59
8
test|test-765 40
9
test|test-1810 39
10
test|test-685 36
11
test|test-1077 33
12
..
13
Does anyone know why this is, and how i can fix it?
Advertisement
Answer
You were on the right track. Replace raw string as follows
JavaScript
1
9
1
summary['Labels'] = summary['Labels'].str.replace(r'test|','', regex=True)
2
3
4
5
Labels
6
0 test 1
7
1 test 2
8
2 test 4
9