Skip to content
Advertisement

Regex Python – Keep only ASCII and copyright symbol

I have the following function to keep only ASCII characters:

import re

def remove_special_chars(txt):
    txt = re.sub(r'[^x00-x7F]+', '', txt)
    return txt

But now I also want to keep the copyright symbol (©). What should I add to the pattern?

Advertisement

Answer

Add the copyright symbol’s hex xA9 (source) to your match group:

txt = re.sub(r'[^x00-x7FxA9]+', '', txt)

Regex101

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement