I read a csv file and I want to remove duplicate entries.
When I run the commands to do that, it creates a new first row that contains column numbers and a new column that contains row numbers.
See
Why does it do that and how should I fix this?
def remove_duplicates(file): df = pd.read_csv(file, encoding="latin-1", header = None) Helper.printline(f"Rows in file {file}: {df.shape[0]}") df.drop_duplicates(keep='first', inplace=True) Helper.printline(f"Rows in file {file} with duplicates removed: {df.shape[0]}") df.to_csv(file)
Advertisement
Answer
Use df.to_csv(file, header=False, index=False)
to save the .csv file without headers and indexes.