I want to convert lines of LogCat Text Files to structured Pandas DF. I cannot seem to properly conceptualize how I am going to do this…Here’s my basic pseudo-code:
dateTime = [] processID = [] threadID = [] priority = [] application = [] tag = [] text = [] logFile = "xxxxxx.log" for line in logfile: split the string according to the basic structure dateTime = [0] processID = [1] threadID = [2] priority = [3] application = [4] tag = [5] text = [6] append each to the empty list above write the lists to pandas dataframe & add column names
The problem is: I do not know how to properly define the delimiter with this structure
08-01 14:28:35.947 1320 1320 D wpa_xxxx: wlan1: skip–ssid
Advertisement
Answer
import re import pandas as pd ROW_PATTERN = re.compile(r"""(d{2}-d{2} d{2}:d{2}:d{2}.d+) (d+) (d+) ([A-Z]) (S+) (S+) (S+)""") with open(logFile) as f: s = pd.Series(f.readlines()) df = s.extract(ROW_PATTERN) df.columns = ['dateTime', 'processID', 'threadID', 'priority', 'application', 'tag', 'text']
This will read each line of logFile
into a row in a Series, which can then be expanded into a DataFrame via each group in the regular expression. This assumes that 08-01 14:28:35.947
is the first value in each row and that subsequent values are separated by white space.