Skip to content
Advertisement

Splitting single-columned .CSV into multiple columns with Pandas

I’m interested to know how to elegantly go about splitting a single-columned file of the following format into a more classic tabular layout using Pandas.

(The file is received as an output from an eye tracker.)

Current Format:

TimeStampGazePointXLeftGazePointYLeftGazePointXRightGazePointYRight
00000000.11111111111111.22222222222222.33333333333333.4444444444444
00000000.11111111111111.22222222222222.33333333333333.4444444444444
00000000.11111111111111.22222222222222.33333333333333.4444444444444

Desired Format:

TimeStamp GazePointXLeft GazePointYLeft GazePointXRight GazePointYRight
000000000 11111111111111 22222222222222 333333333333333 444444444444444
000000000 11111111111111 22222222222222 333333333333333 444444444444444
000000000 11111111111111 22222222222222 333333333333333 444444444444444

Where I’m stuck: I imagine the solution will involve Pandas’ split method but I’m having trouble figuring out how to get there. I imagine I’ll have to “manually” add the respective columns while somehow splitting period-delimited rows of data…

df = pd.DataFrame('data.csv')

headers = ["TimeStamp", ...,  "GazePointYRight"]

for header in headers:
    df[header] = df[1:].split(".")[headers.index(header)] <--- # Splitting rows by period and taking data based on header index in list

I’d really appreciate some direction. Thanks in advance.

Advertisement

Answer

pandas.read_... has several usefull parameters to play with.

I believe you want something like this?

import pandas as pd

columns_names = [
    'TimeStamp',
    'GazePointXLeft',
    'GazePointYLeft',
    'GazePointXRight',
    'GazePointYRight',
    ]

df = pd.read_csv("lixo.csv", sep='.', skiprows=1, names=columns_names)
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement