I want to convert lines of LogCat Text Files to structured Pandas DF. I cannot seem to properly conceptualize how I am going to do this…Here’s my basic pseudo-code:
JavaScript
x
23
23
1
dateTime = []
2
processID = []
3
threadID = []
4
priority = []
5
application = []
6
tag = []
7
text = []
8
9
logFile = "xxxxxx.log"
10
11
for line in logfile:
12
split the string according to the basic structure
13
dateTime = [0]
14
processID = [1]
15
threadID = [2]
16
priority = [3]
17
application = [4]
18
tag = [5]
19
text = [6]
20
append each to the empty list above
21
22
write the lists to pandas dataframe & add column names
23
The problem is: I do not know how to properly define the delimiter with this structure
08-01 14:28:35.947 1320 1320 D wpa_xxxx: wlan1: skip–ssid
Advertisement
Answer
JavaScript
1
11
11
1
import re
2
import pandas as pd
3
4
ROW_PATTERN = re.compile(r"""(d{2}-d{2} d{2}:d{2}:d{2}.d+) (d+) (d+) ([A-Z]) (S+) (S+) (S+)""")
5
6
with open(logFile) as f:
7
s = pd.Series(f.readlines())
8
9
df = s.extract(ROW_PATTERN)
10
df.columns = ['dateTime', 'processID', 'threadID', 'priority', 'application', 'tag', 'text']
11
This will read each line of logFile
into a row in a Series, which can then be expanded into a DataFrame via each group in the regular expression. This assumes that 08-01 14:28:35.947
is the first value in each row and that subsequent values are separated by white space.