I have a dataframe where, columns with subscript 1 are starting points and with 2 are end points. I want to find a difference in kilometers between them. I tried following code however got an error
import mpu
import pandas as pd
import numpy as np
data = {'lat1': [39.92123, 39.93883, 39.93883, 39.91034, 39.91248],
'lon1': [116.51172, 116.51135, 116.51135, 116.51627, 116.47186],
'lat2': [np.nan, 39.92123, 39.93883, 39.93883, 39.91034],
'lon2': [np.nan, 116.51172, 116.51135, 116.51135, 116.51627 ]}
# Create DataFrame
df = pd.DataFrame(data)
df['distance'] = mpu.haversine_distance((df.lat1, df.lon1), (df.lat2, df.lon2))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Advertisement
Answer
Try using .apply() with lambda function so that you can pass the coordinates as scalar values instead of now passing 4 Pandas series to the function:
df['distance'] = df.apply(lambda x: mpu.haversine_distance((x.lat1, x.lon1), (x.lat2, x.lon2)), axis=1)
You can also use list(map(...)) for faster execution, as follows:
df['distance'] = list(map(mpu.haversine_distance, zip(df.lat1, df.lon1), zip(df.lat2, df.lon2)))