I have a table with four columns: CustomerID, Recency, Frequency and Revenue.
I need to standardize (scale) the columns Recency, Frequency and Revenue and save the column CustomerID.
I used this code:
from sklearn.preprocessing import normalize, StandardScaler df.set_index('CustomerID', inplace = True) standard_scaler = StandardScaler() df = standard_scaler.fit_transform(df) df = pd.DataFrame(data = df, columns = ['Recency', 'Frequency','Revenue'])
But the result is a table without the column CustomerID. Is there any way to get a table with the corresponding CustomerID and the scaled columns?
Advertisement
Answer
fit_transform
returns an ndarray with no indices, so you are losing the index you set on df.set_index('CustomerID', inplace = True)
.
Instead of doing this, you can simply take the subset of columns you need to transform, pass them to StandardScaler
, and overwrite the original columns.
# Subset of columns to transform cols = ['Recency','Frequency','Revenue'] # Overwrite old columns with transformed columns df[cols] = StandardScaler.fit_transform(df[cols])
This way, you leave CustomerID
completely unchanged.