Skip to content
Advertisement

Pandas skipping lines when in read_csv, can I record these to variable/log file

I’ve seen similar questions on here but nothing that is quite what I want to do.

I’m reading in a tsv/csv file using

JavaScript

I have clearly defined headers within the file but sometimes I see that the file has unexpected additional columns and get the following messages in the console

Skipping line 251643: Expected 20 fields in line 251643, saw 21

This is fine for my process, I would just like to know a way that I can record these messages or lines to either a dataframe or log file so that I know what lines have been skipped. Due to the fact that the files can be submitted by anyone and it’s an issue with formatting, I’m not interested in fixing the message, just recording out the line numbers that fail

Massive thanks in advance :)

Edit: include try except clause

Advertisement

Answer

To reproduce the issue, I used the following CSV file (dummy.csv):

JavaScript

It may be noted that some lines have extra fields.

Since, we are using error_bad_lines=False, no errors/exceptions will be raised, so try-except is not the way ahead. We need to redirect the stderr:

JavaScript

The above code will write the messages to a log file!

Here is a sample output from the log file:

JavaScript

Update

Modified the code based on a suggestion (in comments below)

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement