Character reading from file in Python

Question

In a text file, there is a string &#8220;I don&#8217;t like this&#8221;. However, when I read it into a string, it becomes &#8220;I donxe2x80x98t like this&#8221;. I understand that u2018 is the unicode representation of &#8220;&#8216;&#8221;. I use command to do the reading. Now, is it possible to read the s…

Accepted Answer

Ref: http://docs.python.org/howto/unicodeReading Unicode from a file is therefore simple:import codecswith codecs.open('unicode.rst', encoding='utf-8') as f:    for line in f:        print repr(line)It&#8217;s also possible to open files in update mode, allowing both reading and writing:with codecs.open('test', encoding='utf-8', mode='w+') as f:    f.write(u'u4500 blah blah blahn')    f.seek(0)    print repr(f.readline()[:1])EDIT: I&#8217;m assuming that your intended goal is just to be able to read the file properly into a string in Python. If you&#8217;re trying to convert to an ASCII string from Unicode, then there&#8217;s really no direct way to do so, since the Unicode characters won&#8217;t necessarily exist in ASCII.If you&#8217;re trying to convert to an ASCII string, try one of the following: Replace the specific unicode chars with ASCII equivalents, if you are only looking to handle a few special cases such as this particular exampleUse the unicodedata module&#8217;s normalize() and the string.encode() method to convert as best you can to the next closest ASCII equivalent (Ref https://web.archive.org/web/20090228203858/http://techxplorer.com/2006/07/18/converting-unicode-to-ascii-using-python): >>> teststru'I donxe2x80x98t like this'>>> unicodedata.normalize('NFKD', teststr).encode('ascii', 'ignore')'I donat like this'

Advertisement

Answer