Skip to content
Advertisement

Sum of the column values if the rows meet the conditions

I am trying to calculate the sum of sales for stores in the same neighborhood based on their geographic coordinates. I have sample data:

data={'ID':['1','2','3','4'],'SALE':[100,120,110,95],'X':[23,22,21,24],'Y':[44,45,41,46],'X_MIN':[22,21,20,23],'Y_MIN':[43,44,40,45],'X_MAX':[24,23,22,25],'Y_MAX':[45,46,42,47]}
ID SALE X Y X_MIN Y_MIN X_MAX Y_MAX
1 100 23 44 22 43 24 45
2 120 22 45 21 44 23 46
3 110 21 41 20 40 22 42
4 95 24 46 23 45 25 47

X and Y are the coordinates of the store. X and Y with MIN and MAX are the area they cover. For each row, I want to sum sales for all stores that are within the boundaries of the single store. I expect results similar to the table below where SUM for ID 1 is equal 220 because the coordinates (X and Y) are within the MIN and MAX limits of this store for ID 1 and ID 2 while for ID 4 only this one store is between his coordinates so the sum of sales is equal 95.

final={'ID':['1','2','3','4'],'SUM':[220,220,110,95]}
ID SUM
1 220
2 220
3 110
4 95

What I’ve tried:

data['SUM'] = data.apply(lambda x: data['SALE'].sum(data[(data['X'] >= x['X_MIN'])&(data['X'] <= x['X_MAX'])&(data['Y'] >= x['Y_MIN'])&(data['Y'] <= x['Y_MAX'])]),axis=1)

Unfortunately the code does not work and I am getting the following error:

TypeError: unhashable type: ‘DataFrame’

I am asking for help in solving this problem.

Advertisement

Answer

If you put the summation at the end, your solution works:

data['SUM'] = data.apply(lambda x: (data['SALE'][(data['X'] >= x['X_MIN'])&(data['X'] <= x['X_MAX'])&(data['Y'] >= x['Y_MIN'])&(data['Y'] <= x['Y_MAX'])]).sum(),axis=1)

###output of data['SUM']:
###0    220
###1    220
###2    110
###3     95
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement