I have CSV file with Vector3 values exported from a C# program.
I would like to use vector operations (like calculating the distance etc.) in pandas.
As far as I have seen, there is no Vector3 type in pandas. np.array
offers this kind of operations but it is not available in pandas.
What is the easiest way to accomplish vector calculations in a dataframe like data structure?
I would appreciate a detailed description starting with how to import the records from the CSV file as a vector type and ending with a calculation example.
The csv file has the following format:
aBin, bBin1, bBin2, bBin3, bBin4, ... 1, "(-1.6831280, 0.0000000, 2.4093440)", "(0.9445564, 0.0000000, 1.9509810)", "(-1.6831280, 0.0000000, 2.4093440)", "(0.9445564, 0.0000000, 1.9509810)" ... 2, "(-5.6848290, 0.0000000, 2.7744440)", "(0.6555564, 0.0000000, 7.2209800)", "(-3.6818280, 0.0000000, 2.5663330)", "(0.6445564, 0.0000000, 2.9509810)" ... ...
Edit
This CSV contains measurements of a program. There is a similar CSV (same shape) of another program and I want to calculate the distances between those two CSV (e.g. the distance between the value of [aBin1][bBin1]
of the first CSV with [aBin1][bBin1]
of the second CSV). Finally I want to sum this distances to a single value.
Advertisement
Answer
# vector1.txt aBin, bBin1, bBin2 1, "(-1.6831280, 0.0000000, 2.4093440)", "(0.9445564, 0.0000000, 1.9509810)" 2, "(-5.6848290, 0.0000000, 2.7744440)", "(0.6555564, 0.0000000, 7.2209800)"
# vector2.txt aBin, bBin1, bBin2 1, "(-1.6831280, 1.0000000, 2.4093440)", "(0.9445564, 2.0000000, 1.9509810)" 2, "(-5.6848290, 3.0000000, 2.7744440)", "(0.6555564, 4.0000000, 7.2209800)"
First, I loaded two files with file_to_dataframe
function.
import numpy as np import pandas as pd def file_to_dataframe(fpath): # Function to change the format of file -> DataFrame # You can skip it if you can load the file as DataFrame with open(fpath, "r") as f: columns = f.readline().rstrip().split(', ')[1:] df = pd.DataFrame(columns=columns) for line in f: row = [x.replace('"', '') for x in line.rstrip().split(', "')[1:]] df = df.append(pd.Series(row, index=columns), ignore_index=True) return df.applymap(lambda x: np.array(eval(x))) # Read file df1 = file_to_dataframe('data/vector1.txt') df2 = file_to_dataframe('data/vector2.txt')
>>df1 bBin1 bBin2 0 [-1.683128, 0.0, 2.409344] [0.9445564, 0.0, 1.950981] 1 [-5.684829, 0.0, 2.774444] [0.6555564, 0.0, 7.22098] >>df2 bBin1 bBin2 0 [-1.683128, 1.0, 2.409344] [0.9445564, 2.0, 1.950981] 1 [-5.684829, 3.0, 2.774444] [0.6555564, 4.0, 7.22098]
And I got dist with np.linalg.norm
function with flatten data from dataframe.
and I made DataFrame with the result.
def dist(x, y): # https://stackoverflow.com/questions/1401712/how-can-the-euclidean-distance-be-calculated-with-numpy return np.linalg.norm(x-y) new_vals = [dist(x, y) for x, y in zip(df1.values.flat, df2.values.flat)] df_dist = pd.DataFrame(np.array(new_vals).reshape(df1.shape), columns=df1.columns, )
>>df_dist bBin1 bBin2 0 1.0 2.0 1 3.0 4.0