UTF-8 decoding doesn’t decode special characters in python

Question

Hi I have the following data (abstracted) that comes from an API. I'm using the following code to decode the data byte: The cleanhtml is a regex function that I've created to remove html tags from the returned data (It's working correctly). Although, decode(utf-8) is not removing characters like u00e1. My expected output is: I've tried to use replace("\u00e1", "á")

Accepted Answer

u00e1 is another way of representing the á character when displaying the contents of a Python string.If you open a Python interactive session and run print({"Product" : "Tu00e1bua 21X40"}) you&#8217;ll see output of {'Product': 'Tábua 21X40'}. The u00e1 doesn&#8217;t exist in the string as those individual characters.The u escape sequence indicates that the following numbers specify a Unicode character.Attempting to replace u00e1 with á won&#8217;t achieve anything because that&#8217;s what it already is. Additionally, replace("\u00e1", "á") is attempting to replace the individual characters of a slash, a u, etc and, as mentioned, they don&#8217;t actually exist in the string in that way.If you explain the problem you&#8217;re encountering further then we may be able to help more, but currently it sounds like the string has the correct content but is just being displayed differently than you expect.

Advertisement

Answer