Vector Calculations in Pandas

Question

I have CSV file with Vector3 values exported from a C# program. I would like to use vector operations (like calculating the distance etc.) in pandas. As far as I have seen, there is no Vector3 type in pandas. np.array offers this kind of operations but it is not available in pandas. What is the easiest way to…

Accepted Answer

# vector1.txtaBin, bBin1, bBin21, "(-1.6831280, 0.0000000, 2.4093440)", "(0.9445564, 0.0000000, 1.9509810)"2, "(-5.6848290, 0.0000000, 2.7744440)", "(0.6555564, 0.0000000, 7.2209800)"# vector2.txtaBin, bBin1, bBin21, "(-1.6831280, 1.0000000, 2.4093440)", "(0.9445564, 2.0000000, 1.9509810)"2, "(-5.6848290, 3.0000000, 2.7744440)", "(0.6555564, 4.0000000, 7.2209800)"First, I loaded two files with file_to_dataframe function.import numpy as npimport pandas as pddef file_to_dataframe(fpath):    # Function to change the format of file -> DataFrame    # You can skip it if you can load the file as DataFrame    with open(fpath, "r") as f:        columns = f.readline().rstrip().split(', ')[1:]        df = pd.DataFrame(columns=columns)        for line in f:            row = [x.replace('"', '') for x in line.rstrip().split(', "')[1:]]            df = df.append(pd.Series(row, index=columns), ignore_index=True)    return df.applymap(lambda x: np.array(eval(x)))# Read filedf1 = file_to_dataframe('data/vector1.txt')df2 = file_to_dataframe('data/vector2.txt')>>df1                        bBin1                       bBin20  [-1.683128, 0.0, 2.409344]  [0.9445564, 0.0, 1.950981]1  [-5.684829, 0.0, 2.774444]   [0.6555564, 0.0, 7.22098]>>df2                        bBin1                       bBin20  [-1.683128, 1.0, 2.409344]  [0.9445564, 2.0, 1.950981]1  [-5.684829, 3.0, 2.774444]   [0.6555564, 4.0, 7.22098]And I got dist with np.linalg.norm function with flatten data from dataframe.and I made DataFrame with the result.def dist(x, y):    # https://stackoverflow.com/questions/1401712/how-can-the-euclidean-distance-be-calculated-with-numpy    return np.linalg.norm(x-y)new_vals = [dist(x, y) for x, y in zip(df1.values.flat, df2.values.flat)]df_dist = pd.DataFrame(np.array(new_vals).reshape(df1.shape), columns=df1.columns, )>>df_dist   bBin1  bBin20    1.0    2.01    3.0    4.0

Advertisement

Answer