I have a list of values that add up to 100 (percentage). I need to find the values that constitute the highest percentages as compared to others. How do I decide the criteria for filtering the data? Help me with the logic.
Below are a few samples and expected output:
input1 = [46.34, 42.42, 5.11, 2.16, 1.23, 1.19, 0.48, 0.4, 0.22, 0.22, 0.09, 0.04, 0.04, 0.04] output1 = [46.34, 42.42] input2 = [32.98, 31.82, 9.76, 3.21, 1.18, 0.43, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11] output2 = [32.98, 31.82] input3 = [37.72, 30.66, 30.66, 0.72, 0.24] output3 = [37.72, 30.66, 30.66]
The list is already sorted. This is not a ‘top n-elements’ problem. I cannot just select (eg: top 2 or top 3) elements from the list.
P.S: I am doing this in pandas
(groupby) so a logic in pandas is preferable. Thanks a lot.
Advertisement
Answer
I think you can use the outlier detection
logic to your use case.
you can calculate the IQR
of the input list and apply the formula:
outlier= input1 < q1-1.5*IQR | input1 >q3+1.5*IQR
The code for the same:
q1=pd.Series(input1).quantile(0.25) q3=pd.Series(input1).quantile(0.75) IQR=q3-q1 output=list(pd.Series(input1)[(input1< (q1 - 1.5 * IQR)) |(input1 > (q3 + 1.5 * IQR))]) output [46.34, 42.42, 5.11]
You can change the quantiles to your liking and check for the best possible outcomes.