Skip to content
Advertisement

UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xc4 in position 1: ordinal not in range(128)

So, as title of the questions says, I have a problem with encoding/decoding of strings.

I am using: python 2.7 | django 1.11 | jinja2 2.8

Basically, I am retrieving some data from data base, I serialize it, set cache on it, then get the cache, deserialize it and rendering it to the template.

Problem:

I have first names and last names of persons that have characters like “ă” in the names. I serialize using json.dumps.

A sample of serialized dictionary looks like (I have 10 like this):

active_agents = User.region_objects.get_active_agents()
agents_by_commission_last_month = active_agents.values(....
                                                          "first_name", "last_name").order_by(
        '-total_paid_transaction_value_last_month')

Then, when I set the cache I do it like:

for key, value in context.items():
   ......
   value = json.dumps(list(value), default=str, ensure_ascii=False).encode('utf-8')

, where value is the list of dictionaries returned by .values() from the aforementioned code and key is region_agents_by_commission_last_month (like the variable from the previous code)

Now, I have to get the cache. So I am doing the same process, but reversed.

serialized_keys = ['agencies_by_commission_last_month',
                       'region_agents_by_commission_last_month', 'region_agents_by_commission_last_12_months',
                       'region_agents_by_commission_last_30_days',
                       'agencies_by_commission_last_year',
                       'agencies_by_commission_last_12_months',
                       'agencies_by_commission_last_30_days',
                       'region_agents_by_commission_last_year',
                       'agency',
                       'for_agent']
    context = {}

    for key, value in region_ranking_cache.items():
        if key in serialized_keys:
            objects = json.loads(value, object_hook=_decode_dict)
            for serilized_dict in objects:
                ....
                 d['full_name'] = d['first_name'] + " " + d['last_name']
                 full_name = d['full_name'].decode('utf-8').encode('utf-8')
                 d['full_name'] = full_name
                 print(d['full_name'])
                ....

where _decode_dict for object_hook looks like:

The result from print: Cătălin Pintea , which is ok. But in the dictionary I render: 'full_name': 'Cxc4x83txc4x83lin Pintea',

def _decode_list(data):
    rv = []
    for item in data:
        if isinstance(item, unicode):
            item = item.encode('utf-8')
        elif isinstance(item, list):
            item = _decode_list(item)
        elif isinstance(item, dict):
            item = _decode_dict(item)
        rv.append(item)
    return rv


def _decode_dict(data):
    rv = {}
    for key, value in data.items():
        if isinstance(key, unicode):
            key = key.encode('utf-8')
        if isinstance(value, unicode):
            value = value.encode('utf-8')
        elif isinstance(value, list):
            value = _decode_list(value)
        elif isinstance(value, dict):
            value = _decode_dict(value)
        rv[key] = value
    return rv

Basically, I use this object hook function in order to encode() to utf-8 all keys and value when json.loads.

This is how I avoided this error to be thrown in views.py.

Error

Somewhere on template, I am using:

<td>{{ agent.full_name }}</td>

And agent.full_name comes from : 'full_name': 'Cxc4x83txc4x83lin Pintea',

Traceback

Traceback:

File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/exception.py" in inner
  41.             response = get_response(request)

File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py" in _legacy_get_response
  249.             response = self._get_response(request)

File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py" in _get_response
  187.                 response = self.process_exception_by_middleware(e, request)

File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py" in _get_response
  185.                 response = wrapped_callback(request, *callback_args, **callback_kwargs)

File "/usr/local/lib/python2.7/dist-packages/django/utils/decorators.py" in inner
  185.                     return func(*args, **kwargs)

File "/usr/local/lib/python2.7/dist-packages/django/contrib/auth/decorators.py" in _wrapped_view
  23.                 return view_func(request, *args, **kwargs)

File "/app/crmrebs/utils/__init__.py" in wrapper
  255.             return http_response_class(t.render(output, request))

File "/usr/local/lib/python2.7/dist-packages/django_jinja/backend.py" in render
  106.         return mark_safe(self.template.render(context))

File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py" in render
  989.         return self.environment.handle_exception(exc_info, True)

File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py" in handle_exception
  754.         reraise(exc_type, exc_value, tb)

File "/app/crmrebs/jinja2/ranking/dashboard_ranking.html" in top-level template code
  1. {% extends "base.html" %}

File "/app/crmrebs/jinja2/base.html" in top-level template code
  1. {% extends "base_stripped.html" %}

File "/app/crmrebs/jinja2/base_stripped.html" in top-level template code
  94.           {% block content %}

File "/app/crmrebs/jinja2/ranking/dashboard_ranking.html" in block "content"
  83.           {% include "dashboard/region_ranking.html" %}

File "/app/crmrebs/jinja2/dashboard/region_ranking.html" in top-level template code
  41.         {% include "dashboard/_agent_ranking_row_month.html" %}

File "/app/crmrebs/jinja2/dashboard/_agent_ranking_row_month.html" in top-level template code
  2.   <td>{{ agent.full_name }}</td>

Exception Type: UnicodeDecodeError at /ranking
Exception Value: 'ascii' codec can't decode byte 0xc4 in position 1: ordinal not in range(128)

And this is from where the error comes. I tried other things, but I guess it is a limitation of python 2.7. I usually use python 3.9, but for this project I have to use 2.7. I tried other answers around here but nothing really helped.

Can anybody help me to serialize this dictionary properly and how can I avoid this mess?

I hope I made myself clear.

Have a nice day everyone !

Advertisement

Answer

So, I managed to solve my issue.

  1. I figured out that active_agents.values(...."first_name", "last_name").order_by('-total_paid_transaction_value_last_month') retrieved a dictionary where its key and values were already in unicode (bacause of the way it was configured in models.py, django 1.11 and python2.7. So, the process of serializing was just fine. It is indeed true that the final result that went to template was looking like ’Cxc4x83txc4x83lin'. The error came from /xc4/.
  2. In order to fix it on template, I just did this: {{ agent.full_name.decode(“utf-8”) }}, which gave me the right result: Cătălin Pintea

Thanks @BoarGules. It was true that d['last_name'] and d['first_name'] were in unicode. So when I did the concatenation, I had to add u" ".

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement