Vector Calculations in Pandas

Question

I have CSV file with Vector3 values exported from a C# program. I would like to use vector operations (like calculating the distance etc.) in pandas. As far as I have seen, there is no Vector3 type in pandas. np.array offers this kind of operations but it is not available in pandas. What is the easiest way to accomplish vector

Accepted Answer

# vector1.txtaBin, bBin1, bBin21, "(-1.6831280, 0.0000000, 2.4093440)", "(0.9445564, 0.0000000, 1.9509810)"2, "(-5.6848290, 0.0000000, 2.7744440)", "(0.6555564, 0.0000000, 7.2209800)"# vector2.txtaBin, bBin1, bBin21, "(-1.6831280, 1.0000000, 2.4093440)", "(0.9445564, 2.0000000, 1.9509810)"2, "(-5.6848290, 3.0000000, 2.7744440)", "(0.6555564, 4.0000000, 7.2209800)"First, I loaded two files with file_to_dataframe function.import numpy as npimport pandas as pddef file_to_dataframe(fpath):    # Function to change the format of file -> DataFrame    # You can skip it if you can load the file as DataFrame    with open(fpath, "r") as f:        columns = f.readline().rstrip().split(', ')[1:]        df = pd.DataFrame(columns=columns)        for line in f:            row = [x.replace('"', '') for x in line.rstrip().split(', "')[1:]]            df = df.append(pd.Series(row, index=columns), ignore_index=True)    return df.applymap(lambda x: np.array(eval(x)))# Read filedf1 = file_to_dataframe('data/vector1.txt')df2 = file_to_dataframe('data/vector2.txt')>>df1                        bBin1                       bBin20  [-1.683128, 0.0, 2.409344]  [0.9445564, 0.0, 1.950981]1  [-5.684829, 0.0, 2.774444]   [0.6555564, 0.0, 7.22098]>>df2                        bBin1                       bBin20  [-1.683128, 1.0, 2.409344]  [0.9445564, 2.0, 1.950981]1  [-5.684829, 3.0, 2.774444]   [0.6555564, 4.0, 7.22098]And I got dist with np.linalg.norm function with flatten data from dataframe.and I made DataFrame with the result.def dist(x, y):    # https://stackoverflow.com/questions/1401712/how-can-the-euclidean-distance-be-calculated-with-numpy    return np.linalg.norm(x-y)new_vals = [dist(x, y) for x, y in zip(df1.values.flat, df2.values.flat)]df_dist = pd.DataFrame(np.array(new_vals).reshape(df1.shape), columns=df1.columns, )>>df_dist   bBin1  bBin20    1.0    2.01    3.0    4.0

Advertisement

Answer