Python

I read a csv file and I want to remove duplicate entries. When I run the commands to do that, it creates a new first row that contains column numbers and a new column that contains row numbers. See Why does it do that and how should I fix this?

def remove_duplicates(file):
    df = pd.read_csv(file, encoding="latin-1", header = None)
    Helper.printline(f"Rows in file {file}: {df.shape[0]}")
    df.drop_duplicates(keep='first', inplace=True)
    Helper.printline(f"Rows in file {file} with duplicates removed: {df.shape[0]}")
    df.to_csv(file)

Answer

Use df.to_csv(file, header=False, index=False) to save the .csv file without headers and indexes.

Python Pandas drop_duplicates – adds a new column and row to my data frame

Advertisement

Answer