Python, remove all non-alphabet chars from string

Question

I am writing a python MapReduce word count program. Problem is that there are many non-alphabet chars strewn about in the data, I have found this post Stripping everything but alphanumeric chars from a string in Python which shows a nice solution using regex, but I am not sure how to implement it I&#8217;m af…

Accepted Answer

Use re.subimport reregex = re.compile('[^a-zA-Z]')#First parameter is the replacement, second parameter is your input stringregex.sub('', 'ab3d*E')#Out: 'abdE'Alternatively, if you only want to remove a certain set of characters (as an apostrophe might be okay in your input&#8230;)regex = re.compile('[,.!?]') #etc.

Advertisement

Answer