Find median of interval data in python

Question

I am exploring different python libraries and I wondering how to find approximate median value of the grouped data set. Here is a table for a reference. Age frequency 1-10 123 10-20 350 20-30 200 30-40 1700 40-50 360 50-60 60 Is there any function or do I need to hard code the formula and then have a loop going

Accepted Answer

If you want to approximate median for discrete grouped data, you can approximate the median of the entire data set by interpolation using the formula:median = L + interval * (N / 2 &#8211; CF) / FL = lower limit of the median intervalN = total number of data pointsCF = number of data points below the median intervalF = number of data points in the median interval# Approximating median by pure python and pandas functionsimport pandas as pddf = pd.DataFrame.from_dict({'low_range':[1,11,21,31,41,51], 'high_range':[10,20,30,40,50,60], 'frequency':[123,350,200,1700,360,60]})N = df['frequency'].sum()# calulating L1index = abs(df['frequency'].cumsum() - N/2).idxmin()L1 = df['low_range'][index + 1]cumsum_before = df['frequency'].cumsum()[index]freq_medain = df['frequency'][index + 1]width = df['high_range'][index + 1] - df['low_range'][index + 1]  + 1median = L1 + (N/2 - cumsum_before ) / freq_medain * widthprint("L1 = {} , cumsum_before = {}, freq_medain = {}, width = {}".format(L1, cumsum_before, freq_medain, width ) )print("Approximated median = ", median)L1 = 31 , cumsum_before = 673, freq_medain = 1700, width = 10Approximated median =  35.25588235294118If you have continuous data, you can use median_grouped function in statistics package.# Approximating median by statistics grouped_median for continuous values and fixed intervalsimport statistics as stimport pandas as pddf = pd.DataFrame.from_dict({'low_range':[1,10,21,31,41,51], 'high_range':[10,21,31,41,51,60], 'frequency':[123,350,200,1700,360,60]})X = ((df['low_range'] + df['high_range'])/2).tolist()f = df['frequency'].tolist()# repeating values based on their frequenciesY = [item for i,item in enumerate(X)              for count in range(f[i])]width = df['high_range'][0] - df['low_range'][0] + 1 median = st.median_grouped(Y, width)print("Approximated median = ", median)Approximated median =  35.25588235294118

Advertisement

Answer