Skip to content
Advertisement

Tag: unicode

How to completely sanitize a string of illegal characters in python?

I have a feature of my program where the user can upload a csv file, which my program goes through and uses as input. I have one user complaining about a problem where his input is throwing up an error. The error is caused by there being an illegal character that is encoded wrong. The characters is below: Sometimes it

What is the default content-type/charset?

According to this answer: urllib2 read to Unicode I have to get the content-type in order to change to Unicode. However, some websites don’t have a “charset”. For example, the [‘content-type’] for this page is “text/html”. I can’t convert it to Unicode. Is there a default “encoding” (English, of course)…so that if nothing is found, I can just use that?

Character reading from file in Python

In a text file, there is a string “I don’t like this”. However, when I read it into a string, it becomes “I donxe2x80x98t like this”. I understand that u2018 is the unicode representation of “‘”. I use command to do the reading. Now, is it possible to read the string in such a way that when it is read

Python, Unicode, and the Windows console

When I try to print a Unicode string in a Windows console, I get an error . UnicodeEncodeError: ‘charmap’ codec can’t encode character …. I assume this is because the Windows console does not accept Unicode-only characters. What’s the best way around this? Is there any way I can make Python automatically print a ? instead of failing in this

Advertisement