Skip to content
Advertisement

How to remove empty rows in csv using pandas?

I got csv dataset like this and i need to remove any empty rows inside of it i already tried following code but still it fails when it reads empty row, and return

pandas.errors.ParserError: Error tokenizing data. C error: Expected 7 fields in line 11, saw 8

    def feed_db():
        try:
            employees = pd.read_csv(
                'employee.csv', delimiter=',',  sep='t', encoding="utf-8", header=None, skipinitialspace=True, skip_blank_lines=True)
            employees.columns = [c.strip().lower().replace(' ', '_')
                                 for c in employees.columns]
            employees.to_sql('employees', conn, if_exists='replace', index=False)
    
        except Error as exc:
            raise Error('Database initialization failed', exc)

sample dataset

Employee Id, Full Name, Gender, Date of Birth, Joined Date, Salary (USD), Branch
EN_0001, Aditi Musunur, Male, 1990-03-24, 2011-07-05, 1500, Sri Lanka
EN_0002, Advitiya Sujeet, Male, 1986-07-28, 2010-03-24, 1600, Sri Lanka
EN_0003, Alagesan Poduri, Male, 1982-05-25, 2016-06-24, 1800, Sri Lanka
EN_0004, Amrish Ilyas, Female, 1987-10-24, 2013-12-17, 2000, India
EN_0005, Aprativirya Seshan, Female, 1981-12-16, 2012-03-14, 1750, India
EN_0006, Asvathama Ponnada, Male, 1986-01-09, 2014-06-18, 2300, Pakistan
EN_0007, Avantas Ghosal, Female, 1981-10-05, 2016-08-26, 4200, Pakistan
EN_0008, Avidosa Vaisakhi, Male, 1980-08-09, 2018-03-05, 3100, Bangladesh
EN_0009, Barsati Sandipa, Male, 1988-04-09, 2011-05-03, 2925, Bangladesh
EN_0010, Debasis Sundhararajan, Female, 1990-03-26, 2015-05-18, , 2800, Bangladesh
EN_0011, Debas Sundhar, Female, 1990-03-26, 2015-05-18, 2800, Bangladesh

how can i remove those empty rows using pandas ?

Advertisement

Answer

As mentioned, there is a redundant empty cell in line 11. You can open and fix the csv, write it to StringIO, then open it with pd.read_csv:

import pandas as pd
import io
s = io.StringIO()

with open('employee.csv') as file:
    for line in file:
        s.write(str(line).replace(", ,", ","))
s.seek(0)

df = pd.read_csv(s)

result:

Employee Id Full Name Gender Date of Birth Joined Date Salary (USD) Branch
0 EN_0001 Aditi Musunur Male 1990-03-24 2011-07-05 1500 Sri Lanka
1 EN_0002 Advitiya Sujeet Male 1986-07-28 2010-03-24 1600 Sri Lanka
2 EN_0003 Alagesan Poduri Male 1982-05-25 2016-06-24 1800 Sri Lanka
3 EN_0004 Amrish Ilyas Female 1987-10-24 2013-12-17 2000 India
4 EN_0005 Aprativirya Seshan Female 1981-12-16 2012-03-14 1750 India
5 EN_0006 Asvathama Ponnada Male 1986-01-09 2014-06-18 2300 Pakistan
6 EN_0007 Avantas Ghosal Female 1981-10-05 2016-08-26 4200 Pakistan
7 EN_0008 Avidosa Vaisakhi Male 1980-08-09 2018-03-05 3100 Bangladesh
8 EN_0009 Barsati Sandipa Male 1988-04-09 2011-05-03 2925 Bangladesh
9 EN_0010 Debasis Sundhararajan Female 1990-03-26 2015-05-18 2800 Bangladesh
10 EN_0011 Debas Sundhar Female 1990-03-26 2015-05-18 2800 Bangladesh
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement