I wanted to convert my text file into csv file, however, my output seems to be very different from what I expected. Below are the examples:
text.txt (Encoding is “UTF-8”)
text =
-0.00010712468871868001 gram_0:Coll:0::ん -0.00010712468871868001 gram-1:Coll:-1::止まる -0.00010712468871868001 gram-3:Coll:-3::帰る -0.00010712468871868001 gram1:Coll:0::ん -0.00010712468871868001 gram2:Coll:2::いく -0.00010712468871868001 gram3:Coll:3::く
My code:
import csv with open('text.txt', 'r', encoding="utf-8") as in_file: stripped = (line.strip() for line in in_file) lines = (line.split(",") for line in stripped if line) with open('log.csv', 'w', encoding="utf-8") as out_file: writer = csv.writer(out_file) writer.writerow(('title', 'intro')) writer.writerows(lines)
Output:
My expected output:
It seems like I am getting quite a lot of ……. for the japanese characters. Could anyone please assist me on this?
Advertisement
Answer
Windows use the BOM to determine encoding of text, but Python does not seem to auto-generate the BOM, and Windows may recognize the output file as ANSI. Try adding out_file.write('ufeff')
immediately after the inner with
.
Source: Adding BOM (unicode signature) while saving file in python