I’m interested to know how to elegantly go about splitting a single-columned file of the following format into a more classic tabular layout using Pandas.
(The file is received as an output from an eye tracker.)
Current Format:
TimeStampGazePointXLeftGazePointYLeftGazePointXRightGazePointYRight 00000000.11111111111111.22222222222222.33333333333333.4444444444444 00000000.11111111111111.22222222222222.33333333333333.4444444444444 00000000.11111111111111.22222222222222.33333333333333.4444444444444
Desired Format:
TimeStamp GazePointXLeft GazePointYLeft GazePointXRight GazePointYRight 000000000 11111111111111 22222222222222 333333333333333 444444444444444 000000000 11111111111111 22222222222222 333333333333333 444444444444444 000000000 11111111111111 22222222222222 333333333333333 444444444444444
Where I’m stuck:
I imagine the solution will involve Pandas’ split method but I’m having trouble figuring out how to get there. I imagine I’ll have to “manually” add the respective columns while somehow splitting period-delimited rows of data…
df = pd.DataFrame('data.csv')
headers = ["TimeStamp", ..., "GazePointYRight"]
for header in headers:
df[header] = df[1:].split(".")[headers.index(header)] <--- # Splitting rows by period and taking data based on header index in list
I’d really appreciate some direction. Thanks in advance.
Advertisement
Answer
pandas.read_... has several usefull parameters to play with.
I believe you want something like this?
import pandas as pd
columns_names = [
'TimeStamp',
'GazePointXLeft',
'GazePointYLeft',
'GazePointXRight',
'GazePointYRight',
]
df = pd.read_csv("lixo.csv", sep='.', skiprows=1, names=columns_names)