Skip to content
Advertisement

pandas csv UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x81 in position 162: invalid start byte

I’m trying to read one column from a csv (with header ‘Peptide Sequence’). However, this gives me the error as in the title. I know this probably has something to do with the encoding, which I know very little about. Is there a quick workaround for this?

import pandas as pd
file = r'C:...thpdb.csv'
df = pd.read_csv(file, usecols=['Peptide Sequence'])
print(df)

Advertisement

Answer

read_csv takes an encoding argument to deal with files in different formats, “ISO-8859-1” should work for you. See here:

import pandas as pd
file = r'C:...thpdb.csv'
df = pd.read_csv(file, usecols=['Peptide Sequence'], encoding = "ISO-8859-1")
print(df)
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement