Values in my DataFrame look like this:
id val big_val_167 80 renv_100 100 color_100 200 color_60/write_10 200
I want to remove everything in values of id column after _numeric. So desired result must look like:
id val big_val 80 renv 100 color 200 color 200
How to do that? I know that str.replace()
can be used, but I don’t understand how to write regular expression part in it.
Advertisement
Answer
You can use regex(re.search
) to find the first occurence of _ + digit and then you can solve the problem.
Code:
import re import pandas as pd def fix_id(id): # Find the first occurence of: _ + digits in the id: digit_search = re.search(r"_d", id) return id[:digit_search.start()] # Your df df = pd.DataFrame({"id": ["big_val_167", "renv_100", "color_100", "color_60/write_10"], "val": [80, 100, 200, 200]}) df["id"] = df["id"].apply(fix_id) print(df)
Output:
id val 0 big_val 80 1 renv 100 2 color 200 3 color 200