Different output while using fit_transform vs fit and transform from sklearn

Question

The following code snippet illustrates the issue: Can someone explain as to why the first output is not zero and the second output is? Answer Using this works: Apparently svd_solver = 'random' (which is what 'auto' defaults to) has enough process difference between .fit(X).transform(X) and fit_transform(X) to give different results even with the same seed. Also remember floating point errors

Accepted Answer

Using this works:pca = PCA(n_components=28, svd_solver = 'full')transformed_X1 = pca.fit_transform(X)pca1 = pca.fit(X)transformed_X2 = pca1.transform(X)print(np.allclose(transformed_X1, transformed_X2))TrueApparently svd_solver = 'random' (which is what 'auto' defaults to) has enough process difference between .fit(X).transform(X) and fit_transform(X) to give different results even with the same seed.  Also remember floating point errors make == and /= unreliable judges of equality of different processes, so use np.allclose().It seems like StandardScaler.fit_transform() just directly uses .fit(X).transform(X) under the hood, so there were no floating point errors there to trip you up.

Advertisement

Answer