How to standardize column in pandas

Question

I have dataframe which contains id column with the following sample values I want to standardise to XXXXXXXX-XXXX (i.e. 8 and 4 digits separated by a dash), How can I achieve that using python. here's my code Answer Can use DataFrame.replace() function using a regular expression like this: Here's example code with sample data. Output: If any value does not

Accepted Answer

Can use DataFrame.replace() function using a regular expression like this:df = df.replace(regex=r'^(d{8})D(d{4})$', value=r'1-2')Here&#8217;s example code with sample data.import pandas as pddf = pd.DataFrame({'id': [            '16620625 5686',            '16310427-5502',            '16501010 4957',            '16110430 8679',            '16990624/4174',            '16230404.1177',            '16820221/3388']})# normalize matching strings with 8-digits + delimiter + 4-digitsdf = df.replace(regex=r'^(d{8})D(d{4})$', value=r'1-2')print(df)Output:              id0  16620625-56861  16310427-55022  16501010-49573  16110430-86794  16990624-41745  16230404-11776  16820221-3388If any value does not match the regexp of the expected format then it&#8217;s value will not be changed.

Advertisement

Answer