I’m interested to know how to elegantly go about splitting a single-columned file of the following format into a more classic tabular layout using Pandas.
(The file is received as an output from an eye tracker.)
Current Format:
TimeStampGazePointXLeftGazePointYLeftGazePointXRightGazePointYRight 00000000.11111111111111.22222222222222.33333333333333.4444444444444 00000000.11111111111111.22222222222222.33333333333333.4444444444444 00000000.11111111111111.22222222222222.33333333333333.4444444444444
Desired Format:
TimeStamp GazePointXLeft GazePointYLeft GazePointXRight GazePointYRight 000000000 11111111111111 22222222222222 333333333333333 444444444444444 000000000 11111111111111 22222222222222 333333333333333 444444444444444 000000000 11111111111111 22222222222222 333333333333333 444444444444444
Where I’m stuck:
I imagine the solution will involve Pandas’ split
method but I’m having trouble figuring out how to get there. I imagine I’ll have to “manually” add the respective columns while somehow splitting period-delimited rows of data…
df = pd.DataFrame('data.csv') headers = ["TimeStamp", ..., "GazePointYRight"] for header in headers: df[header] = df[1:].split(".")[headers.index(header)] <--- # Splitting rows by period and taking data based on header index in list
I’d really appreciate some direction. Thanks in advance.
Advertisement
Answer
pandas.read_...
has several usefull parameters to play with.
I believe you want something like this?
import pandas as pd columns_names = [ 'TimeStamp', 'GazePointXLeft', 'GazePointYLeft', 'GazePointXRight', 'GazePointYRight', ] df = pd.read_csv("lixo.csv", sep='.', skiprows=1, names=columns_names)