Skip to content
Advertisement

How to recognise different csv encodings?

I am not sure if it’s with the encoding itself however this is my problem;

import csv

with open('vocabulary.csv', 'r') as csv_file:
    csv_reader = csv.reader(csv_file)
    for line in csv_reader:
        print(line)

I would expect it to print this:

screenshot

However it does not recognise any of the Japanese characters and rather comes up with

['1', 'x1b$B0l$Dx1b(B', 'x1b$B$R$H$Dx1b(B', 'one']
['2', 'x1b$BFs$Dx1b(B', 'x1b$B$U$?$Dx1b(B', 'two']
['3', 'x1b$B1_x1b(B', 'x1b$B$($sx1b(B', 'yen']
['4', 'x1b$B6bx1b(B', 'x1b$B$+$Mx1b(B', 'money']
['5', 'x1b$B$3$lx1b(B', 'x1b$B$3$lx1b(B', 'this']
['6', 'x1b$B?eMKF|x1b(B', 'x1b$B$9$$$h$&$Sx1b(B', 'Wednesday']
['7', 'x1b$B$"$lx1b(B', 'x1b$B$"$lx1b(B', 'that']
['8', 'x1b$B@hx1b(B', 'x1b$B$5$-x1b(B', 'ahead']

The encoding I used on the csv file was ISO2022. My question is, is there a way to make this appear properly?

Advertisement

Answer

file = open('vocabulary.csv', 'r', encoding='ISO2022')

or try

line.decode('ISO-2022-JP')
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement