I’ve seen similar questions on here but nothing that is quite what I want to do.
I’m reading in a tsv/csv file using
try: dataframe = pd.read_csv( filepath_or_buffer=filename_or_obj, sep='t', encoding='utf-8', skip_blank_lines=True, error_bad_lines=False, warn_bad_lines=True, dtype=data_type_dict, engine='python', quoting=csv.QUOTE_NONE ) except UnicodeDecodeError: dataframe = pd.read_csv( filepath_or_buffer=exception_filename_or_obj, sep='t', encoding='latin-1', skip_blank_lines=True, error_bad_lines=False, warn_bad_lines=True, dtype=data_type_dict, engine='python', quoting=csv.QUOTE_NONE )
I have clearly defined headers within the file but sometimes I see that the file has unexpected additional columns and get the following messages in the console
Skipping line 251643: Expected 20 fields in line 251643, saw 21
This is fine for my process, I would just like to know a way that I can record these messages or lines to either a dataframe or log file so that I know what lines have been skipped. Due to the fact that the files can be submitted by anyone and it’s an issue with formatting, I’m not interested in fixing the message, just recording out the line numbers that fail
Massive thanks in advance :)
Edit: include try except clause
Advertisement
Answer
To reproduce the issue, I used the following CSV file (dummy.csv):
F1,F2,F3 11,A,10.54 18,B,0.12,low 24,A,19.00 10,C,7.01,low 22,D,39.11,high 49,E,12.12
It may be noted that some lines have extra fields.
Since, we are using error_bad_lines=False
, no errors/exceptions will be raised, so try-except
is not the way ahead. We need to redirect the stderr
:
from contextlib import redirect_stderr import pandas as pd # import io with open('error_messages.log', 'w') as h: # f = io.StringIO() # with redirect_stderr(f): with redirect_stderr(h): df = pd.read_csv(filepath_or_buffer='dummy.csv', sep=',', # change it for your data encoding='latin-1', skip_blank_lines=True, error_bad_lines=False, # dtype=data_type_dict, engine='python', # quoting=csv.QUOTE_NONE ) # h.write(f.getvalue()) # Write the error messages to log file print(df)
The above code will write the messages to a log file!
Here is a sample output from the log file:
Skipping line 3: Expected 3 fields in line 3, saw 4 Skipping line 5: Expected 3 fields in line 5, saw 4 Skipping line 6: Expected 3 fields in line 6, saw 4
Update
Modified the code based on a suggestion (in comments below)