Given an sklearn tranformer t
, is there a way to determine whether t
changes columns/column order of any given input dataset X
, without applying it to the data?
For example with t = sklearn.preprocessing.StandardScaler
there is a 1-to-1 mapping between the columns of X
and t.transform(X)
, namely X[:, i] -> t.transform(X)[:, i]
, whereas this is obviously not the case for sklearn.decomposition.PCA
.
A corollary of that would be: Can we know, how the columns of the input will change by applying t
, e.g. which columns an already fitted sklearn.feature_selection.SelectKBest
chooses.
I am not looking for solutions to specific transformers, but a solution applicable to all or at least a wide selection of transformers.
Feel free to implement your own Pipeline class or wrapper if necessary.
Advertisement
Answer
Not all your “transformers” would have the .get_feature_names_out
method. Its implementation is discussed in the sklearn github. In the same link, you can see there is, to quote @thomasjpfan, a _OneToOneFeatureMixin
class used by transformers with a simple one-to-one correspondence between input and output features
Restricted to sklearn, we can check whether the transformer or estimator is a subclass of _OneToOneFeatureMixin
, for example:
from sklearn.decomposition import PCA from sklearn.preprocessing import StandardScaler from sklearn.feature_selection import SelectKBest from sklearn.base import _OneToOneFeatureMixin tf = {'pca':PCA(),'standardscaler':StandardScaler(),'kbest':SelectKBest()} [i+":"+str(issubclass(type(tf[i]),_OneToOneFeatureMixin)) for i in tf.keys()] ['pca:False', 'standardscaler:True', 'kbest:False']
These would the source code for _OneToOneFeatureMixin