Fit/transform separate sklearn transformers to partitions of single column

Question

Use case: I have time series data for multiple assets (eg. AAPL, MSFT) and multiple features (eg. MACD, Volatility etc). I am building a ML model to make classification predictions on a subset of this data. Problem: For each asset & feature &#8211; I want to fit and apply a transformation. For example: fo…

Accepted Answer

I don&#8217;t think this is doable in an &#8220;elegant&#8221; way using Scikit&#8217;s built-in functionality, simply because the transformers are applied on the whole column. However, one could use the FunctionalTransformer (as you correctly point out) to circumvent this limitation:I am using the following example:print(df)  Ticker  Volatility  OtherCol0   AAPL           0         11   AAPL           1         12   AAPL           2         13   AAPL           3         14   AAPL           4         15   GOOG           5         16   GOOG           6         17   GOOG           7         18   GOOG           8         19   GOOG           9         1I added another column just to demonstrate.from sklearn.compose import ColumnTransformerfrom sklearn.preprocessing import FunctionTransformer# The index should dictate the groups along the column.df = df.set_index('Ticker')def A(x):    return x*xdef B(x):    return 2*xdef C(x):    return 10*x# Map groups to function. A dict for each column and each group in the index.f_dict = {'Volatility': {'AAPL':A, 'GOOG':B}, 'OtherCol': {'AAPL':A, 'GOOG':C}}def pick_transform(df):    return df.groupby(df.index).apply(lambda df: f_dict[df.columns[0]][df.index[0]](df))                   ct = ColumnTransformer(                       [(f'transformed_{col}', FunctionTransformer(func=pick_transform), [col])                        for col in f_dict]                      )df[[f'transformed_{col}' for col in f_dict]] = ct.fit_transform(df)print(df)Which results in:        Volatility  OtherCol  transformed_vol  transformed_OtherColTicker                                                             AAPL             0         1                0                     1AAPL             1         1                1                     1AAPL             2         1                4                     1AAPL             3         1                9                     1AAPL             4         1               16                     1GOOG             5         1               10                    10GOOG             6         1               12                    10GOOG             7         1               14                    10GOOG             8         1               16                    10GOOG             9         1               18                    10Here you can add other columns in f_dict and then the transformer will be created in the list comprehension.

Date	Ticker	Volatility	transformed_vol
01/01/18	AAPL	X	A(X)
01/02/18	AAPL	X	A(X)
…	AAPL	X	A(X)
12/30/22	AAPL	X	A(X)
12/31/22	AAPL	X	A(X)
01/01/18	GOOG	X	B(X)
01/02/18	GOOG	X	B(X)
…	GOOG	X	B(X)
12/30/22	GOOG	X	B(X)
12/31/22	GOOG	X	B(X)

Advertisement

Answer