I have a log file (Text.TXT in this case):
# 1: 5 # 3: x # F: 5. # ID: 001 # No.: 2 # No.: 4 # Time: 20191216T122109 # Value: ";" # Time: 4 # Time: "" # Time ms: "" # Date: "" # Time separator: "T" # J: 1000000 # Silent: false # mode: true Timestamp;T;ID;P 16T122109957;0;6;0006
To read in this log file into pandas and ignore all the header info I would use skiprows up to line 16 like so:
pd.read_csv('test.TXT',skiprows=16,sep=';')
But this produces EmptyDataError as it is skipping past where the data is starting. To make this work I’ve had to use it on line 11:
pd.read_csv('test.TXT',skiprows=11,sep=';')
Timestamp T ID P
0 16T122109957 0 6 6
My question is if the data doesn’t start until row 17, in this case, why do I need to request a skiprows up to row 11?
Advertisement
Answer
One work around is to use comment parameter of pd.read_csv
from io import StringIO
text='''# 1: 5
# 3: x
# F: 5.
# ID: 001
# No.: 2
# No.: 4
# Time: 20191216T122109
# Value: ";"
# Time: 4
# Time: ""
# Time ms: ""
# Date: ""
# Time separator: "T"
# J: 1000000
# Silent: false
# mode: true
Timestamp;T;ID;P
16T122109957;0;6;0006'''
df = pd.read_csv(StringIO(text),comment='#',sep=';')
df
Timestamp T ID P
0 16T122109957 0 6 6
Or
df = pd.read_csv(StringIO(text),header=0,comment='#',sep=';')
From docs under header parameter:
Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file.
Not sure about skiprows‘s weird behaviour here.