im trying to learn scikit but stucked at the code which is about encoders require their input to be be uniformly string or number

Question

I have been learning python form youtube videos. im new to python just a beginner. I saw this code on video so i tried it but getting the error which i dont known how to solve. This is the following code where im getting trouble. I didint wrote the enitre code as its to long. please help me fix my

Accepted Answer

so I checked the Wine Quality dataset, and upon doing:wine['quality'].unique()I got the following output:array([6, 5, 7, 8, 4, 3, 9], dtype=int64)Now since we have values that exceed the upper bound which you have provided in your bins for pd.cut() function, the out of limits values will be replaced by NaN values. I checked it on my compiler too, so after performing your preprocessing#Preprocessingbins=(2,6.5,8)group_names=['bad','good']wine['quality'] = pd.cut(wine['quality'], bins=bins, labels=group_names)wine['quality'].unique()The result I get for wine['quality'].unique() is:['bad', 'good', NaN]Categories (2, object): ['bad' < 'good']This happens because all values that exceed 8 (the upper bound you provided) are changed to NaN, this is mentioned in the documentation for pd.cut() function too:Out of bounds values will be NA in the resulting Series or Categorical object.Therefore I would suggest that you should increase your upper bound in the bins to 9. I tried to do that and the function works fine without any issues.#Preprocessingbins=(2,6.5,9)group_names=['bad','good']wine['quality'] = pd.cut(wine['quality'], bins=bins, labels=group_names)wine['quality'].unique()And the output for wine['quality'].unique() now was:['bad', 'good']Categories (2, object): ['bad' < 'good']So, we do not have NaN values anymore, and your Label Encoder should now work fine.

Advertisement

Answer