I have dataframe which contains id
column with the following sample values
16620625 5686 16310427-5502 16501010 4957 16110430 8679 16990624/4174 16230404.1177 16820221/3388
I want to standardise to XXXXXXXX-XXXX (i.e. 8 and 4 digits separated by a dash), How can I achieve that using python.
here’s my code
df['id'] df.replace(" ", "-")
Advertisement
Answer
Can use DataFrame.replace() function using a regular expression like this:
df = df.replace(regex=r'^(d{8})D(d{4})$', value=r'1-2')
Here’s example code with sample data.
import pandas as pd df = pd.DataFrame({'id': [ '16620625 5686', '16310427-5502', '16501010 4957', '16110430 8679', '16990624/4174', '16230404.1177', '16820221/3388']}) # normalize matching strings with 8-digits + delimiter + 4-digits df = df.replace(regex=r'^(d{8})D(d{4})$', value=r'1-2') print(df)
Output:
id 0 16620625-5686 1 16310427-5502 2 16501010-4957 3 16110430-8679 4 16990624-4174 5 16230404-1177 6 16820221-3388
If any value does not match the regexp of the expected format then it’s value will not be changed.