I have a dataframe where, columns with subscript 1 are starting points and with 2 are end points. I want to find a difference in kilometers between them. I tried following code however got an error
import mpu import pandas as pd import numpy as np data = {'lat1': [39.92123, 39.93883, 39.93883, 39.91034, 39.91248], 'lon1': [116.51172, 116.51135, 116.51135, 116.51627, 116.47186], 'lat2': [np.nan, 39.92123, 39.93883, 39.93883, 39.91034], 'lon2': [np.nan, 116.51172, 116.51135, 116.51135, 116.51627 ]} # Create DataFrame df = pd.DataFrame(data) df['distance'] = mpu.haversine_distance((df.lat1, df.lon1), (df.lat2, df.lon2))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Advertisement
Answer
Try using .apply()
with lambda function so that you can pass the coordinates as scalar values instead of now passing 4 Pandas series to the function:
df['distance'] = df.apply(lambda x: mpu.haversine_distance((x.lat1, x.lon1), (x.lat2, x.lon2)), axis=1)
You can also use list(map(...))
for faster execution, as follows:
df['distance'] = list(map(mpu.haversine_distance, zip(df.lat1, df.lon1), zip(df.lat2, df.lon2)))