Splitting single-columned .CSV into multiple columns with Pandas

I’m interested to know how to elegantly go about splitting a single-columned file of the following format into a more classic tabular layout using Pandas.

(The file is received as an output from an eye tracker.)

Current Format:

TimeStampGazePointXLeftGazePointYLeftGazePointXRightGazePointYRight
00000000.11111111111111.22222222222222.33333333333333.4444444444444
00000000.11111111111111.22222222222222.33333333333333.4444444444444
00000000.11111111111111.22222222222222.33333333333333.4444444444444

JavaScript
​x
 
TimeStampGazePointXLeftGazePointYLeftGazePointXRightGazePointYRight
00000000.11111111111111.22222222222222.33333333333333.4444444444444
00000000.11111111111111.22222222222222.33333333333333.4444444444444
00000000.11111111111111.22222222222222.33333333333333.4444444444444
​

Desired Format:

TimeStamp GazePointXLeft GazePointYLeft GazePointXRight GazePointYRight
000000000 11111111111111 22222222222222 333333333333333 444444444444444
000000000 11111111111111 22222222222222 333333333333333 444444444444444
000000000 11111111111111 22222222222222 333333333333333 444444444444444

JavaScript
 
TimeStamp GazePointXLeft GazePointYLeft GazePointXRight GazePointYRight
000000000 11111111111111 22222222222222 333333333333333 444444444444444
000000000 11111111111111 22222222222222 333333333333333 444444444444444
000000000 11111111111111 22222222222222 333333333333333 444444444444444
​

Where I’m stuck: I imagine the solution will involve Pandas’ split method but I’m having trouble figuring out how to get there. I imagine I’ll have to “manually” add the respective columns while somehow splitting period-delimited rows of data…

df = pd.DataFrame('data.csv')

headers = ["TimeStamp", ...,  "GazePointYRight"]

for header in headers:
    df[header] = df[1:].split(".")[headers.index(header)] <--- # Splitting rows by period and taking data based on header index in list

JavaScript
 
df = pd.DataFrame('data.csv')
​
headers = ["TimeStamp", ...,  "GazePointYRight"]
​
for header in headers:
    df[header] = df[1:].split(".")[headers.index(header)] <--- # Splitting rows by period and taking data based on header index in list
​

I’d really appreciate some direction. Thanks in advance.

Answer

pandas.read_... has several usefull parameters to play with.

I believe you want something like this?

import pandas as pd

columns_names = [
    'TimeStamp',
    'GazePointXLeft',
    'GazePointYLeft',
    'GazePointXRight',
    'GazePointYRight',
    ]

df = pd.read_csv("lixo.csv", sep='.', skiprows=1, names=columns_names)

JavaScript
 
import pandas as pd
​
columns_names = [
    'TimeStamp',
    'GazePointXLeft',
    'GazePointYLeft',
    'GazePointXRight',
    'GazePointYRight',
    ]
​
df = pd.read_csv("lixo.csv", sep='.', skiprows=1, names=columns_names)
​

Advertisement

Answer