I got csv dataset like this and i need to remove any empty rows inside of it i already tried following code but still it fails when it reads empty row, and return
pandas.errors.ParserError: Error tokenizing data. C error: Expected 7 fields in line 11, saw 8
def feed_db():
try:
employees = pd.read_csv(
'employee.csv', delimiter=',', sep='t', encoding="utf-8", header=None, skipinitialspace=True, skip_blank_lines=True)
employees.columns = [c.strip().lower().replace(' ', '_')
for c in employees.columns]
employees.to_sql('employees', conn, if_exists='replace', index=False)
except Error as exc:
raise Error('Database initialization failed', exc)
sample dataset
Employee Id, Full Name, Gender, Date of Birth, Joined Date, Salary (USD), Branch EN_0001, Aditi Musunur, Male, 1990-03-24, 2011-07-05, 1500, Sri Lanka EN_0002, Advitiya Sujeet, Male, 1986-07-28, 2010-03-24, 1600, Sri Lanka EN_0003, Alagesan Poduri, Male, 1982-05-25, 2016-06-24, 1800, Sri Lanka EN_0004, Amrish Ilyas, Female, 1987-10-24, 2013-12-17, 2000, India EN_0005, Aprativirya Seshan, Female, 1981-12-16, 2012-03-14, 1750, India EN_0006, Asvathama Ponnada, Male, 1986-01-09, 2014-06-18, 2300, Pakistan EN_0007, Avantas Ghosal, Female, 1981-10-05, 2016-08-26, 4200, Pakistan EN_0008, Avidosa Vaisakhi, Male, 1980-08-09, 2018-03-05, 3100, Bangladesh EN_0009, Barsati Sandipa, Male, 1988-04-09, 2011-05-03, 2925, Bangladesh EN_0010, Debasis Sundhararajan, Female, 1990-03-26, 2015-05-18, , 2800, Bangladesh EN_0011, Debas Sundhar, Female, 1990-03-26, 2015-05-18, 2800, Bangladesh
how can i remove those empty rows using pandas ?
Advertisement
Answer
As mentioned, there is a redundant empty cell in line 11. You can open and fix the csv, write it to StringIO, then open it with pd.read_csv:
import pandas as pd
import io
s = io.StringIO()
with open('employee.csv') as file:
for line in file:
s.write(str(line).replace(", ,", ","))
s.seek(0)
df = pd.read_csv(s)
result:
| Employee Id | Full Name | Gender | Date of Birth | Joined Date | Salary (USD) | Branch | |
|---|---|---|---|---|---|---|---|
| 0 | EN_0001 | Aditi Musunur | Male | 1990-03-24 | 2011-07-05 | 1500 | Sri Lanka |
| 1 | EN_0002 | Advitiya Sujeet | Male | 1986-07-28 | 2010-03-24 | 1600 | Sri Lanka |
| 2 | EN_0003 | Alagesan Poduri | Male | 1982-05-25 | 2016-06-24 | 1800 | Sri Lanka |
| 3 | EN_0004 | Amrish Ilyas | Female | 1987-10-24 | 2013-12-17 | 2000 | India |
| 4 | EN_0005 | Aprativirya Seshan | Female | 1981-12-16 | 2012-03-14 | 1750 | India |
| 5 | EN_0006 | Asvathama Ponnada | Male | 1986-01-09 | 2014-06-18 | 2300 | Pakistan |
| 6 | EN_0007 | Avantas Ghosal | Female | 1981-10-05 | 2016-08-26 | 4200 | Pakistan |
| 7 | EN_0008 | Avidosa Vaisakhi | Male | 1980-08-09 | 2018-03-05 | 3100 | Bangladesh |
| 8 | EN_0009 | Barsati Sandipa | Male | 1988-04-09 | 2011-05-03 | 2925 | Bangladesh |
| 9 | EN_0010 | Debasis Sundhararajan | Female | 1990-03-26 | 2015-05-18 | 2800 | Bangladesh |
| 10 | EN_0011 | Debas Sundhar | Female | 1990-03-26 | 2015-05-18 | 2800 | Bangladesh |