Skip to content
Advertisement

How SelectKBest (chi2) calculates score?

I am trying to find the most valuable features by applying feature selection methods to my dataset. Im using the SelectKBest function for now. I can generate the score values and sort them as I want, but I don’t understand exactly how this score value is calculated. I know that theoretically high score is more valuable, but I need a mathematical formula or an example to calculate the score for learning this deeply.

JavaScript

Thank you in advance

Advertisement

Answer

Say you have one feature and a target with 3 possible values

JavaScript

First we binarize the target

JavaScript

Then perform a dot product between feature and target, i.e. sum all feature values by class value

JavaScript

Next take a sum of feature values and calculate class frequency

JavaScript

Now as in the first step we take the dot product, and get expected and observed matrices

JavaScript

Finally we calculate a chi^2 value:

JavaScript

We have a chi^2 value, now we need to judge how extreme it is. For that we use a chi^2 distribution with number of classes - 1 degrees of freedom and calculate the area from chi^2 to infinity to get the probability of chi^2 be the same or more extreme than what we’ve got. This is a p-value. (using chi square survival function from scipy)

JavaScript

Compare with SelectKBest:

JavaScript
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement