How to find lines in pandas columns with close values?

Question

I need to find 'user_id' of users standing closeby to each other. So we have data: So, in this dataset it would be users with id '101' and '302'. But our dataset has millions of lines in it. Are there any built-in functions in pandas or python to solve the issue? Answer Assuming the workers need to share the same

Accepted Answer

Assuming the workers need to share the same location to be considered standing closeby, a groupby by location can match workers efficiently:from itertools import combinationsimport pandas as pdd = {'user_id': [11, 24, 101, 214, 302, 335],     'worker_latitude': [-34.6209, -2.7572, 55.6621,                         55.114462, 55.6621, -34.6209],     'worker_longitude': [-58.3742, 52.3879, 56.6621, 38.927156,                          56.6621, 39.018]}df = pd.DataFrame(data=d)matched_workers = df.groupby(['worker_latitude', 'worker_longitude']).apply(    lambda rows: list(combinations(rows['user_id'], r=2)))matched_workers = matched_workers.loc[matched_workers.apply(bool)]Which outputs:worker_latitude  worker_longitude55.6621          56.6621             [(101, 302)]dtype: object

Advertisement

Answer