Remove space between abbreviated letters in a string column

Question

i have a panda dataframe as follows: I have removed the punctuations and removed the spaces between abbreviated letters: the output is (e.g 'I called the cia') what I would like to happen is however the following ('I called the CIA'). so I essentially like the abbreviations to be upper cased. I tried the following, but got no results or

Accepted Answer

pandas.Series.str.replace allows 2nd argument to be callable compliant with requirements of 2nd argument of re.sub. Using that you might first uppercase your abbreviations as follows:import pandas as pddef make_upper(m):  # where m is re.Match object    return m.group(0).upper()d = {'col1': ['I called the c. i. a', 'the house is e. m', 'this is an e. u. call!','how is the p. o. r going?']}df = pd.DataFrame(data=d)df['col1'] = df['col1'].str.replace(r'bw.?b', make_upper)print(df)output                        col10       I called the C. I. A1          the house is E. M2     this is an E. U. call!3  how is the P. O. R going?which then you can further processing using code you already haddf['col1'] = df['col1'].str.replace('[^ws]','')df['col1'] = df['col1'].str.replace(r'(?<=bw)s*[ &]s*(?=wb)','')print(df)output               col10      I called the CIA1       the house is EM2    this is an EU call3  how is the POR goingYou might elect to improve pattern I used (r'bw.?b') if you encounter cases which it does not cover. I used word boundaries and literal dot (.), so as is it does find any single word character (w) optionally (?) followed by dot.

Advertisement

Answer