I read a csv file and I want to remove duplicate entries. When I run the commands to do that, it creates a new first row that contains column numbers and a new column that contains row numbers. See Why does it do that and how should I fix this?
def remove_duplicates(file): df = pd.read_csv(file, encoding="latin-1", header = None) Helper.printline(f"Rows in file {file}: {df.shape[0]}") df.drop_duplicates(keep='first', inplace=True) Helper.printline(f"Rows in file {file} with duplicates removed: {df.shape[0]}") df.to_csv(file)
Advertisement
Answer
Use df.to_csv(file, header=False, index=False)
to save the .csv file without headers and indexes.