I have a log file (Text.TXT in this case):
# 1: 5 # 3: x # F: 5. # ID: 001 # No.: 2 # No.: 4 # Time: 20191216T122109 # Value: ";" # Time: 4 # Time: "" # Time ms: "" # Date: "" # Time separator: "T" # J: 1000000 # Silent: false # mode: true Timestamp;T;ID;P 16T122109957;0;6;0006
To read in this log file into pandas and ignore all the header info I would use skiprows
up to line 16 like so:
pd.read_csv('test.TXT',skiprows=16,sep=';')
But this produces EmptyDataError
as it is skipping past where the data is starting. To make this work I’ve had to use it on line 11:
pd.read_csv('test.TXT',skiprows=11,sep=';') Timestamp T ID P 0 16T122109957 0 6 6
My question is if the data doesn’t start until row 17, in this case, why do I need to request a skiprows up to row 11?
Advertisement
Answer
One work around is to use comment
parameter of pd.read_csv
from io import StringIO text='''# 1: 5 # 3: x # F: 5. # ID: 001 # No.: 2 # No.: 4 # Time: 20191216T122109 # Value: ";" # Time: 4 # Time: "" # Time ms: "" # Date: "" # Time separator: "T" # J: 1000000 # Silent: false # mode: true Timestamp;T;ID;P 16T122109957;0;6;0006''' df = pd.read_csv(StringIO(text),comment='#',sep=';') df Timestamp T ID P 0 16T122109957 0 6 6
Or
df = pd.read_csv(StringIO(text),header=0,comment='#',sep=';')
From docs under header parameter:
Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file.
Not sure about skiprows
‘s weird behaviour here.