Skip to content
Advertisement

How to rename values in column having a specific separation symbols?

Values in my DataFrame look like this:

id                         val
big_val_167                80
renv_100                   100
color_100                  200
color_60/write_10          200

I want to remove everything in values of id column after _numeric. So desired result must look like:

id             val
big_val        80
renv           100
color          200
color          200

How to do that? I know that str.replace() can be used, but I don’t understand how to write regular expression part in it.

Advertisement

Answer

You can use regex(re.search) to find the first occurence of _ + digit and then you can solve the problem.

Code:

import re
import pandas as pd

def fix_id(id):
    # Find the first occurence of: _ + digits in the id:
    digit_search = re.search(r"_d", id)
    return id[:digit_search.start()]

# Your df
df = pd.DataFrame({"id": ["big_val_167", "renv_100", "color_100", "color_60/write_10"],
                   "val": [80, 100, 200, 200]})

df["id"] = df["id"].apply(fix_id)
print(df)

Output:

        id  val
0  big_val   80
1     renv  100
2    color  200
3    color  200
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement