Skip to content
Advertisement

How to solve “UnicodeDecodeError ” wile using pandas to automate +100 .csv files?

I have been trying to write a For loop to store all of the CSV files in a directory into one. The files in that directory were produced by another pandas program I wrote, and I have used “group.to_csv(f’data3/{station}.csv’, index = False, encoding = “utf-8″)” for them to make sure the encoding is utf-8.

The combining code I used is as follows:

import os import pandas as pd

master_df =  pd.DataFrame() directory_path = '/pandas learning/data3'

for file in os.listdir (directory_path):
    pd.read_csv(f'data3/{file}')
    if file.endswith('.csv'):
        master_df = master_df.append(pd.read_csv(f'data3/{file}'))

master_df.to_csv("final.csv" )

when I run the program, it gives me a UnicodeDecodeError, and since this code is for about 100 files I can’t go and change the encoding of them 1 by 1.

Advertisement

Answer

Since pandas.DataFrame.append is deprecated, use pandas.concat instead :

import os
import pandas as pd

directory_path = '/pandas learning/data3'

data=[]
for file in os.listdir(directory_path):
    if file.endswith('.csv'):
        temp = pd.read_csv(os.path.join(directory_path, f), enconding='xxxx')
        data.append(temp)

master_df = pd.concat(data)

master_df.to_csv("final.csv" )

Note : Make sure to put the encoding that correspond to your .csv files (eg. utf-8, ansi, utf-8-sig, ..)

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement