Skip to content
Advertisement

How to parse the log data which is in form of nested [key=value] format using python pandas

I have huge Sensor log data which is in form of [key=value] pair I need to parse the data column wise i found this code for my problem

import pandas as pd

lines = []
with open('/path/to/test.txt', 'r') as infile:
    for line in infile:
        if "," not in line:
            continue
        else:
            lines.append(line.strip().split(","))

row_names = []
column_data = {}

max_length = max(*[len(line) for line in lines])

for line in lines:
    while(len(line) < max_length):
        line.append(f'{len(line)-1}=NaN')

for line in lines:
    row_names.append(" ".join(line[:2]))
    for info in line[2:]:
        (k,v) = info.split("=")
        if k in column_data:
            column_data[k].append(v)
        else:
            column_data[k] = [v]

df = pd.DataFrame(column_data)
df.index = row_names
print(df)

df.to_csv('/path/to/test.csv')

the above code is suitable when the data is in form of “Priority=0, X=776517049” but my data is something like this [Priority=0][X=776517049] and there is no separator in between two columns how can i do it in python and i am sharing the link of sample data here raw data and bilow that expected parsed data which i done manually https://docs.google.com/spreadsheets/d/1EVTVL8RAkrSHhZO48xV1uEGqOzChQVf4xt7mHkTcqzs/edit?usp=sharing kindly check this link

Advertisement

Answer

I’ve downloaded as csv.

Since your file has multiple tables on one sheet, I’ve limited to 100 rows, you can remove that parameter.

raw = pd.read_csv(
    "logdata - Sheet1.csv",  # filename 
    skiprows=1,  # skip the first row
    nrows=100,  # use 100 rows, remove in your example
    usecols=[0],  # only use the first column
    header=None  # your dataset has no column names
)

Then you can use a regex to extract the values:

df = raw[0].str.extract(r"[Priority=(d*)] [GPS element=[X=(d*)] [Y=(d*)] [Speed=(d*)]")

and set column names:

df.columns = ["Priority", "X", "Y", "Speed"]

result:

  Priority          X          Y Speed
0         0  776517049  128887449     4
1         0  776516816  128887733     0
2         0  776516816  128887733     0
3         0  776516833  128887166     0
4         0  776517200  128886133     0
5         0  776516883  128885933     8
..................................... 
99        0  776494483  128908783     0
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement