I am trying to calculate the sum of sales for stores in the same neighborhood based on their geographic coordinates. I have sample data:
data={'ID':['1','2','3','4'],'SALE':[100,120,110,95],'X':[23,22,21,24],'Y':[44,45,41,46],'X_MIN':[22,21,20,23],'Y_MIN':[43,44,40,45],'X_MAX':[24,23,22,25],'Y_MAX':[45,46,42,47]}
ID | SALE | X | Y | X_MIN | Y_MIN | X_MAX | Y_MAX |
---|---|---|---|---|---|---|---|
1 | 100 | 23 | 44 | 22 | 43 | 24 | 45 |
2 | 120 | 22 | 45 | 21 | 44 | 23 | 46 |
3 | 110 | 21 | 41 | 20 | 40 | 22 | 42 |
4 | 95 | 24 | 46 | 23 | 45 | 25 | 47 |
X and Y are the coordinates of the store. X and Y with MIN and MAX are the area they cover. For each row, I want to sum sales for all stores that are within the boundaries of the single store. I expect results similar to the table below where SUM for ID 1 is equal 220 because the coordinates (X and Y) are within the MIN and MAX limits of this store for ID 1 and ID 2 while for ID 4 only this one store is between his coordinates so the sum of sales is equal 95.
final={'ID':['1','2','3','4'],'SUM':[220,220,110,95]}
ID | SUM |
---|---|
1 | 220 |
2 | 220 |
3 | 110 |
4 | 95 |
What I’ve tried:
data['SUM'] = data.apply(lambda x: data['SALE'].sum(data[(data['X'] >= x['X_MIN'])&(data['X'] <= x['X_MAX'])&(data['Y'] >= x['Y_MIN'])&(data['Y'] <= x['Y_MAX'])]),axis=1)
Unfortunately the code does not work and I am getting the following error:
TypeError: unhashable type: ‘DataFrame’
I am asking for help in solving this problem.
Advertisement
Answer
If you put the summation at the end, your solution works:
data['SUM'] = data.apply(lambda x: (data['SALE'][(data['X'] >= x['X_MIN'])&(data['X'] <= x['X_MAX'])&(data['Y'] >= x['Y_MIN'])&(data['Y'] <= x['Y_MAX'])]).sum(),axis=1) ###output of data['SUM']: ###0 220 ###1 220 ###2 110 ###3 95