Skip to content
Advertisement

Determine whether the Columns of a Dataset are invariant under any given Scikit-Learn Transformer

Given an sklearn tranformer t, is there a way to determine whether t changes columns/column order of any given input dataset X, without applying it to the data?

For example with t = sklearn.preprocessing.StandardScaler there is a 1-to-1 mapping between the columns of X and t.transform(X), namely X[:, i] -> t.transform(X)[:, i], whereas this is obviously not the case for sklearn.decomposition.PCA.

A corollary of that would be: Can we know, how the columns of the input will change by applying t, e.g. which columns an already fitted sklearn.feature_selection.SelectKBest chooses.

I am not looking for solutions to specific transformers, but a solution applicable to all or at least a wide selection of transformers.

Feel free to implement your own Pipeline class or wrapper if necessary.

Advertisement

Answer

Not all your “transformers” would have the .get_feature_names_out method. Its implementation is discussed in the sklearn github. In the same link, you can see there is, to quote @thomasjpfan, a _OneToOneFeatureMixin class used by transformers with a simple one-to-one correspondence between input and output features

Restricted to sklearn, we can check whether the transformer or estimator is a subclass of _OneToOneFeatureMixin , for example:

JavaScript

These would the source code for _OneToOneFeatureMixin

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement