I have huge Sensor log data which is in form of [key=value] pair I need to parse the data column wise i found this code for my problem
import pandas as pd lines = [] with open('/path/to/test.txt', 'r') as infile: for line in infile: if "," not in line: continue else: lines.append(line.strip().split(",")) row_names = [] column_data = {} max_length = max(*[len(line) for line in lines]) for line in lines: while(len(line) < max_length): line.append(f'{len(line)-1}=NaN') for line in lines: row_names.append(" ".join(line[:2])) for info in line[2:]: (k,v) = info.split("=") if k in column_data: column_data[k].append(v) else: column_data[k] = [v] df = pd.DataFrame(column_data) df.index = row_names print(df) df.to_csv('/path/to/test.csv')
the above code is suitable when the data is in form of “Priority=0, X=776517049” but my data is something like this [Priority=0][X=776517049] and there is no separator in between two columns how can i do it in python and i am sharing the link of sample data here raw data and bilow that expected parsed data which i done manually https://docs.google.com/spreadsheets/d/1EVTVL8RAkrSHhZO48xV1uEGqOzChQVf4xt7mHkTcqzs/edit?usp=sharing kindly check this link
Advertisement
Answer
I’ve downloaded as csv.
Since your file has multiple tables on one sheet, I’ve limited to 100 rows, you can remove that parameter.
raw = pd.read_csv( "logdata - Sheet1.csv", # filename skiprows=1, # skip the first row nrows=100, # use 100 rows, remove in your example usecols=[0], # only use the first column header=None # your dataset has no column names )
Then you can use a regex to extract the values:
df = raw[0].str.extract(r"[Priority=(d*)] [GPS element=[X=(d*)] [Y=(d*)] [Speed=(d*)]")
and set column names:
df.columns = ["Priority", "X", "Y", "Speed"]
result:
Priority X Y Speed 0 0 776517049 128887449 4 1 0 776516816 128887733 0 2 0 776516816 128887733 0 3 0 776516833 128887166 0 4 0 776517200 128886133 0 5 0 776516883 128885933 8 ..................................... 99 0 776494483 128908783 0