Values in my DataFrame look like this:
JavaScript
x
6
1
id val
2
big_val_167 80
3
renv_100 100
4
color_100 200
5
color_60/write_10 200
6
I want to remove everything in values of id column after _numeric. So desired result must look like:
JavaScript
1
6
1
id val
2
big_val 80
3
renv 100
4
color 200
5
color 200
6
How to do that? I know that str.replace()
can be used, but I don’t understand how to write regular expression part in it.
Advertisement
Answer
You can use regex(re.search
) to find the first occurence of _ + digit and then you can solve the problem.
Code:
JavaScript
1
15
15
1
import re
2
import pandas as pd
3
4
def fix_id(id):
5
# Find the first occurence of: _ + digits in the id:
6
digit_search = re.search(r"_d", id)
7
return id[:digit_search.start()]
8
9
# Your df
10
df = pd.DataFrame({"id": ["big_val_167", "renv_100", "color_100", "color_60/write_10"],
11
"val": [80, 100, 200, 200]})
12
13
df["id"] = df["id"].apply(fix_id)
14
print(df)
15
Output:
JavaScript
1
6
1
id val
2
0 big_val 80
3
1 renv 100
4
2 color 200
5
3 color 200
6