Skip to content
Advertisement

how to fix python pandas encoding issue?

I import csv table into JUPYTER NOTEBOOK, and something wrong is happening when I try to iloc a video views column (К-ть переглядів).

I need to format this cell to INT type (using .astype()), but it tells me that there is an error:

ValueError: invalid literal for int() with base 10: ‘380xa0891xa0555’

Can anyone please tell me what is wrong?

Screenshot: enter image description here

Advertisement

Answer

This is a non breaking space (chr(160)). Use str.replace to remove them.

>>> df['A']
0    380 891 555
Name: A, dtype: object

>>> df['A'].dtype.name
'object'

>>> df['A'].astype(int)
ValueError: invalid literal for int() with base 10: '380xa0891xa0555'

>>> df['A'].str.replace(chr(160), '').astype(int)
0    380891555
Name: A, dtype: int64
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement