I was trying to save my dataset in a CSV file with the following script:
JavaScript
x
22
22
1
with open(data_path+'Furough.csv', 'w',encoding="utf-8") as f0:
2
df = pd.DataFrame(columns=['title','poem','year'])
3
for f in onlyfiles:
4
poem=[]
5
title=""
6
year=0
7
with open(mypath+f,"r",encoding="utf-8") as f1:
8
for line in f1:
9
if line.__contains__("TIMESTAMP"):
10
year=int(line[12:15])
11
continue
12
if line.__contains__('TITLE'):
13
title=line[7:]
14
if line!="":
15
poem.append(line)
16
df = df.append({
17
'title': title,
18
'poem':poem,
19
'year': int(float(year))
20
}, ignore_index=True)
21
df.to_csv(f0, index=False,encoding='utf-8-sig')
22
but the result is confusing, write some unknown chars to CSV file instead of Farsi chars: Can anyone help me?
I want to write all these files in a CSV:
example of what I have in one of them and want to write:
JavaScript
1
10
10
1
[V_START] بر پرده•های درهم امیال سرکشم [HEM]
2
نقش عجیب چهرۀ یک ناشناس بود [V_END]
3
[V_START] نقشی ز چهره•ای که چو می•جستمش به شوق [HEM]
4
پیوسته می•رمید و بمن رخ نمی•نمود [V_END]
5
6
[V_START] یک شب نگاه خستۀ مردی به روی من [HEM]
7
لغزید و سست گشت و همان •جا خموش ماند [V_END]
8
[V_START] تا خواستم که بگسلم این رشتۀ نگاه [HEM]
9
قلبم تپید و باز مرا سوی او کشاند [V_END]
10
but result:
Advertisement
Answer
To add to Cimbali’s answer, another method to add a UTF8 BOM is by using the encoding “utf-8-sig” instead of “utf-8”, as it will automatically take care of it for you.
Further information is in this question: Unable to Save Arabic Decoded Unicode to CSV File Using Python