OneHotEncoder categorical_features deprecated, how to transform specific column

Question

I need to transform the independent field from string to arithmetical notation. I am using OneHotEncoder for the transformation. My dataset has many independent columns of which some are as: I have to encode the Country column like I succeed to get the desire transformation via using OneHotEncoder as Now I&#8…

Accepted Answer

There is actually 2 warnings :   FutureWarning: The handling of integer data will change in version  0.22. Currently, the categories are determined based on the range [0, max(values)], while in the future they will be determined based on the  unique values. If you want the future behaviour and silence this  warning, you can specify &#8220;categories=&#8217;auto'&#8221;. In case you used a  LabelEncoder before this OneHotEncoder to convert the categories to  integers, then you can now use the OneHotEncoder directly.and the second :  The &#8216;categorical_features&#8217; keyword is deprecated in version 0.20 and  will be removed in 0.22. You can use the ColumnTransformer instead.  &#8220;use the ColumnTransformer instead.&#8221;, DeprecationWarning)In the future, you should not define the columns in the OneHotEncoder directly, unless you want to use &#8220;categories=&#8217;auto'&#8221;. The first message also tells you to use OneHotEncoder directly, without the LabelEncoder first.Finally, the second message tells you to use ColumnTransformer, which is like a Pipe for columns transformations.Here is the equivalent code for your case : from sklearn.compose import ColumnTransformer ct = ColumnTransformer([("Name_Of_Your_Step", OneHotEncoder(),[0])], remainder="passthrough")) # The last arg ([0]) is the list of columns you want to transform in this stepct.fit_transform(X)    See also : ColumnTransformer documentationFor the above example;  Encoding Categorical data (Basically Changing Text to Numerical data i.e, Country Name)from sklearn.preprocessing import LabelEncoder, OneHotEncoderfrom sklearn.compose import ColumnTransformer#Encode Country Columnlabelencoder_X = LabelEncoder()X[:,0] = labelencoder_X.fit_transform(X[:,0])ct = ColumnTransformer([("Country", OneHotEncoder(), [0])], remainder = 'passthrough')X = ct.fit_transform(X)

Advertisement

Answer