Skip to content
Advertisement

Determine whether the Columns of a Dataset are invariant under any given Scikit-Learn Transformer

Given an sklearn tranformer t, is there a way to determine whether t changes columns/column order of any given input dataset X, without applying it to the data?

For example with t = sklearn.preprocessing.StandardScaler there is a 1-to-1 mapping between the columns of X and t.transform(X), namely X[:, i] -> t.transform(X)[:, i], whereas this is obviously not the case for sklearn.decomposition.PCA.

A corollary of that would be: Can we know, how the columns of the input will change by applying t, e.g. which columns an already fitted sklearn.feature_selection.SelectKBest chooses.

I am not looking for solutions to specific transformers, but a solution applicable to all or at least a wide selection of transformers.

Feel free to implement your own Pipeline class or wrapper if necessary.

Advertisement

Answer

Not all your “transformers” would have the .get_feature_names_out method. Its implementation is discussed in the sklearn github. In the same link, you can see there is, to quote @thomasjpfan, a _OneToOneFeatureMixin class used by transformers with a simple one-to-one correspondence between input and output features

Restricted to sklearn, we can check whether the transformer or estimator is a subclass of _OneToOneFeatureMixin , for example:

from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import SelectKBest
from sklearn.base import _OneToOneFeatureMixin

tf = {'pca':PCA(),'standardscaler':StandardScaler(),'kbest':SelectKBest()}

[i+":"+str(issubclass(type(tf[i]),_OneToOneFeatureMixin)) for i in tf.keys()]

['pca:False', 'standardscaler:True', 'kbest:False']

These would the source code for _OneToOneFeatureMixin

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement