I’ve asked this on math overflow and used comments to clarify/overstate my question. I hope it has the intended effect and doesn’t come off as jarring.
I’m attempting to find what subset of numbers reach a known average.
I have a list of known values, negative and possible decimals. They look something like this {-.32,-.64,-.12,.08,-.54,-.43, …}
It’s around 50 numbers in some cases, though this problem would be tested for other cases as well.
The set mostly contains negative decimal numbers, while in rare cases, has a few positive decimals – it never has whole numbers.
I also have a known value, which I know to be the average of some subset of the above list.
The known value is similar to -.03.
I’m uncertain of the grouping mechanism used, but seemed to reach stack overflow trying to solve this problem when not grouping.
I’ve attempted a few ways of going about solving this problem. I’m using Python 3.6 and imported numpy as np.
I’m wondering if the “subset-avg” code I’ve adapted from another solution for subset-sum here (I’ll give due credit when i can find that question again) is not the most efficient way/if there’s any huge mistake in my even attempting to resolve this that I haven’t seen.
Thanks in advance for any thoughts.
def subset_avg(numbers, target, partial=[],depth=1): # create AVG function # set average of partial a = np.mean(partial) # check if the partial sum is equals to target if a != target: print("Currently Testing the Following Subset's " " " + "Average(%s) = %snn" % (partial, round(a,2))) print(depth) if a == target or round(a,2) == target: print('nn') print("************") print("************") print('nn') print("Found Subset AVG " + "Average(%s) = %s" % (partial, target)) print('nn') print("************") print("************") print('nn') print(depth) # for each number in range of list for i in range(len(numbers)): # set n = current iteration in list n = numbers[i] # remaining values is current iteration + 1 through end of list remaining = numbers[i+1:] # calculate mean of partial, set partial = partial plus n subset_avg(remaining, target, partial + [n],depth+1) # Example of use x = [-.32,-.64,-.12,.08,-.54,-.43] subset_avg(x,-.03)
Advertisement
Answer
Here is a solution I adapted from a subSet sum algorithm I posted for another question (here). Since the algorithm loops through potential solution sizes, it was easy to adapt it to search for an average.
The iSubSum()
function takes 3 parameters: The target average, the list of values and an optional rounding precision parameter. It is a generator so it will produce all possible solutions when used in a loop. You can also get the first solution quickly using the next()
function. This should produce results much faster than a brute force approach, especially for large lists.
The function is based on a modified version of a subset-sum algorithm that returns solutions as lists of indices. This is intended to distinguish combinations that have duplicate values coming from different indices in the original list.
from bisect import bisect_right from itertools import accumulate def iSubAverage(M,A,P=0): smallSize = 20 smallSums = set() def subSumForSize(S,A,size,failedSums=None): nextSum = A[size-2][2] if size>1 else 0 index = bisect_right([a for a,_,_ in A],S-nextSum) # max element for target A = A[:index] if len(A) < size: return # not enough elements for size if A[size-1][2] > S: return # minimum sum > target maxSum = A[-1][2] if len(A) > size: maxSum -= A[-size-1][2] if maxSum < S: return # maximum sum < target if len(A) <= smallSize and S not in smallSums: return if failedSums is None: failedSums = set() while index >= size: index -= 1 a,i,ca = A[index] if size == 1: if a == S: yield [i] continue c0 = A[index-size][2] if index>size else 0 if ca-c0 < S: break subS = S-a if subS in failedSums: continue # known unreachable sum failed = True for result in subSumForSize(subS,A[:index],size-1,failedSums): yield result+[i] failed = False if failed: failedSums.add(subS) if not A: return if M < 0: M,A = -M,[-a for a in A] # must have positive target offset = max(0,-min(A)) # circumvent negatives (requires loop on sizes) A = sorted([(round(a+offset,P),i) for i,a in enumerate(A)]) cumA = accumulate(a for a,i in A) A = [(a,i,ca) for (a,i),ca in zip(A,cumA)] for a,_,_ in A[:smallSize]: newSums = [a+s for s in smallSums] + [a] smallSums.update(newSums) for size in range(1,len(A)+1): subS = round(M*size,P) if subS != round(M*size,P*2): continue # fractional numerator subS += round(offset*size,P) for result in subSumForSize(subS,A,size): yield result
To get the actual values, the iSubAvg()
function maps indices to the corresponding values in the list:
def iSubAvg(M,A,P=0): for iA in iSubAverage(M,A,P): yield sorted([A[i] for i in iA]) L = [-.32,-.64,-.12,.08,-.54,-.43] targetL = -0.02 for solution in iSubAvg(targetL,L,2): print(solution) # [-0.12, 0.08] (there isn't a solution for -0.03) K = [0.72, 0.69, 0.81, -0.28, 0.6, 0.59, 0.77, 0.46, 0.36, 0.66, 0.88, 0.88, 0.9, -0.24, 0.5, -0.5, 0.46, 0.96, -0.22, -0.8, -0.13, 0.87, 0.78, 0.2] targetK = -0.02 for solution in iSubAvg(targetK,K,2): print(solution) # [-0.5, 0.46] # [-0.5, 0.46] # [-0.8, -0.22, 0.96] # [-0.5, -0.28, 0.72] # [-0.28, -0.24, 0.46] # [-0.28, -0.24, 0.46] # [-0.5, -0.24, 0.2, 0.46] # [-0.5, -0.24, 0.2, 0.46] # [-0.8, -0.28, -0.24, -0.22, 0.46, 0.96] # [-0.8, -0.28, -0.24, -0.22, 0.46, 0.96] next(iSubAvg(0.165,K,2)) # [-0.8, -0.28, -0.24, 0.66, 0.69, 0.96]
note that the function returns all combinations including repetitions for duplicate values in the source list. You can filter out these duplicates if you don’t need them