Skip to content
Advertisement

Remove space between abbreviated letters in a string column

i have a panda dataframe as follows:

JavaScript

I have removed the punctuations and removed the spaces between abbreviated letters:

JavaScript

the output is (e.g ‘I called the cia’) what I would like to happen is however the following (‘I called the CIA’). so I essentially like the abbreviations to be upper cased. I tried the following, but got no results

JavaScript

or

JavaScript

Advertisement

Answer

pandas.Series.str.replace allows 2nd argument to be callable compliant with requirements of 2nd argument of re.sub. Using that you might first uppercase your abbreviations as follows:

JavaScript

output

JavaScript

which then you can further processing using code you already had

JavaScript

output

JavaScript

You might elect to improve pattern I used (r'bw.?b') if you encounter cases which it does not cover. I used word boundaries and literal dot (.), so as is it does find any single word character (w) optionally (?) followed by dot.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement