Skip to content
Advertisement

How to encode a webscraped image link in UTF-8 to ASCII but still have a functional link?

I’m trying to webscrape a link to an image to use it in my Kivy app. The problem is that the image adress has Polish signs in it (ę, ł , ó, ą) and I get this error:

JavaScript

Full error traceback:

JavaScript

Here is an example where you can see what I mean. On picture loads normaly, without errors, the second one outputs the UnicodeEncodeError and displays a black color.

JavaScript

Output of the code above:

Output of the code

Is there a way to avoid this error and still have a functional link?

Advertisement

Answer

URL should already be ASCII compatible. The traffic on Internet (aka HTTP) works so: only ASCII URLS (with additional restrictions). Browsers now tend to unescape URL. [the %20 and other %xx character we saw in part in URL]. Note: now we have UTF-8 encoding, and on top a URL escaping. So, you should remember that you have two layers to encoding.

You should escape URL, see URL quoting. I would use quote() and unquote(). On comments, we saw a quote_plus(), but that change also the space, useful some time, but it will change the meaning of original data.

EDIT:

Ok, I problems. there seems to be something strange on how kivy handle the URLS. quote() is meant only for the path part, not for the first part of URL.

As an hack (it doesn’t work if you have a specific port: it will quote the : in front of the port):

JavaScript

So you get the wanted: 'https://nowa.1lo.gorzow.pl/wp-content/uploads/2020/11/Szko%C5%82a-do-hymnu.png' as used by browsers.

You may want to include it into your own functions (and maybe check if there is a port number, to exclude it from quoting).

But wait, maybe someone has the true solution for Kivy. I never use full qualified path (so with protocol and domain), so for me basic quote() is enough.

Advertisement