I have a log file (Text.TXT in this case):
JavaScript
x
19
19
1
# 1: 5
2
# 3: x
3
# F: 5.
4
# ID: 001
5
# No.: 2
6
# No.: 4
7
# Time: 20191216T122109
8
# Value: ";"
9
# Time: 4
10
# Time: ""
11
# Time ms: ""
12
# Date: ""
13
# Time separator: "T"
14
# J: 1000000
15
# Silent: false
16
# mode: true
17
Timestamp;T;ID;P
18
16T122109957;0;6;0006
19
To read in this log file into pandas and ignore all the header info I would use skiprows
up to line 16 like so:
JavaScript
1
2
1
pd.read_csv('test.TXT',skiprows=16,sep=';')
2
But this produces EmptyDataError
as it is skipping past where the data is starting. To make this work I’ve had to use it on line 11:
JavaScript
1
4
1
pd.read_csv('test.TXT',skiprows=11,sep=';')
2
Timestamp T ID P
3
0 16T122109957 0 6 6
4
My question is if the data doesn’t start until row 17, in this case, why do I need to request a skiprows up to row 11?
Advertisement
Answer
One work around is to use comment
parameter of pd.read_csv
JavaScript
1
26
26
1
from io import StringIO
2
3
text='''# 1: 5
4
# 3: x
5
# F: 5.
6
# ID: 001
7
# No.: 2
8
# No.: 4
9
# Time: 20191216T122109
10
# Value: ";"
11
# Time: 4
12
# Time: ""
13
# Time ms: ""
14
# Date: ""
15
# Time separator: "T"
16
# J: 1000000
17
# Silent: false
18
# mode: true
19
Timestamp;T;ID;P
20
16T122109957;0;6;0006'''
21
22
df = pd.read_csv(StringIO(text),comment='#',sep=';')
23
df
24
Timestamp T ID P
25
0 16T122109957 0 6 6
26
Or
JavaScript
1
2
1
df = pd.read_csv(StringIO(text),header=0,comment='#',sep=';')
2
From docs under header parameter:
Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file.
Not sure about skiprows
‘s weird behaviour here.