Regex Python – Keep only ASCII and copyright symbol

Tags: ,



I have the following function to keep only ASCII characters:

import re

def remove_special_chars(txt):
    txt = re.sub(r'[^x00-x7F]+', '', txt)
    return txt

But now I also want to keep the copyright symbol (©). What should I add to the pattern?

Answer

Add the copyright symbol’s hex xA9 (source) to your match group:

txt = re.sub(r'[^x00-x7FxA9]+', '', txt)

Regex101



Source: stackoverflow