Skip to content
Advertisement

Finding elements in list with highest weightage compared to other elements

I have a list of values that add up to 100 (percentage). I need to find the values that constitute the highest percentages as compared to others. How do I decide the criteria for filtering the data? Help me with the logic.

Below are a few samples and expected output:

input1 = [46.34, 42.42, 5.11, 2.16, 1.23, 1.19, 0.48, 0.4, 0.22, 0.22, 0.09, 0.04, 0.04, 0.04] 
output1 = [46.34, 42.42]

input2 = [32.98, 31.82, 9.76, 3.21, 1.18, 0.43, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11] 
output2 = [32.98, 31.82]

input3 = [37.72, 30.66, 30.66, 0.72, 0.24] 
output3 = [37.72, 30.66, 30.66]

The list is already sorted. This is not a ‘top n-elements’ problem. I cannot just select (eg: top 2 or top 3) elements from the list.

P.S: I am doing this in pandas (groupby) so a logic in pandas is preferable. Thanks a lot.

Advertisement

Answer

I think you can use the outlier detection logic to your use case. you can calculate the IQR of the input list and apply the formula: outlier= input1 < q1-1.5*IQR | input1 >q3+1.5*IQR

The code for the same:

q1=pd.Series(input1).quantile(0.25)
q3=pd.Series(input1).quantile(0.75)

IQR=q3-q1
output=list(pd.Series(input1)[(input1< (q1 - 1.5 * IQR)) |(input1 > (q3 + 1.5 * IQR))])
output
[46.34, 42.42, 5.11]

You can change the quantiles to your liking and check for the best possible outcomes.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement