I have been learning python form youtube videos. im new to python just a beginner. I saw this code on video so i tried it but getting the error which i dont known how to solve. This is the following code where im getting trouble. I didint wrote the enitre code as its to long.
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt from sklearn.ensemble import RandomForestClassifier from sklearn.svm import SVC from sklearn import svm from sklearn.neural_network import MLPClassifier from sklearn.metrics import confusion_matrix, classification_report from sklearn.preprocessing import StandardScaler, LabelEncoder from sklearn.model_selection import train_test_split %matplotlib inline wine = pd.read_csv('wine_quality.csv') wine.head() wine.info() wine.isnull().sum() #Preprocessing bins=(2,6.5,8) group_names=['bad','good'] wine['quality'] = pd.cut(wine['quality'], bins=bins, labels=group_names) wine['quality'].unique() label_quality=LabelEncoder() wine['quality']=label_quality.fit_transform(wine['quality']) #after this im getting that error '''TypeError Traceback (most recent call last) ~anaconda3libsite-packagessklearnpreprocessing_label.py in _encode(values, uniques, encode, check_unknown) 112 try: --> 113 res = _encode_python(values, uniques, encode) 114 except TypeError: ~anaconda3libsite-packagessklearnpreprocessing_label.py in _encode_python(values, uniques, encode) 60 if uniques is None: ---> 61 uniques = sorted(set(values)) 62 uniques = np.array(uniques, dtype=values.dtype) TypeError: '<' not supported between instances of 'float' and 'str' During handling of the above exception, another exception occurred: TypeError Traceback (most recent call last) <ipython-input-14-8e211b2c4bf8> in <module> ----> 1 wine['quality'] = label_quality.fit_transform(wine['quality']) ~anaconda3libsite-packagessklearnpreprocessing_label.py in fit_transform(self, y) 254 """ 255 y = column_or_1d(y, warn=True) --> 256 self.classes_, y = _encode(y, encode=True) 257 return y 258 ~anaconda3libsite-packagessklearnpreprocessing_label.py in _encode(values, uniques, encode, check_unknown) 115 types = sorted(t.__qualname__ 116 for t in set(type(v) for v in values)) --> 117 raise TypeError("Encoders require their input to be uniformly " 118 f"strings or numbers. Got {types}") 119 return res TypeError: Encoders require their input to be uniformly strings or numbers. Got ['float', 'str']''' ```
please help me fix my error. it will be great if you will tell me exactly what should i do.
Advertisement
Answer
so I checked the Wine Quality dataset, and upon doing:
wine['quality'].unique()
I got the following output:
array([6, 5, 7, 8, 4, 3, 9], dtype=int64)
Now since we have values that exceed the upper bound which you have provided in your bins for pd.cut()
function, the out of limits values will be replaced by NaN values. I checked it on my compiler too, so after performing your preprocessing
#Preprocessing bins=(2,6.5,8) group_names=['bad','good'] wine['quality'] = pd.cut(wine['quality'], bins=bins, labels=group_names) wine['quality'].unique()
The result I get for wine['quality'].unique()
is:
['bad', 'good', NaN] Categories (2, object): ['bad' < 'good']
This happens because all values that exceed 8 (the upper bound you provided) are changed to NaN, this is mentioned in the documentation for pd.cut()
function too:
Out of bounds values will be NA in the resulting Series or Categorical object. Therefore I would suggest that you should increase your upper bound in the bins to 9. I tried to do that and the function works fine without any issues.
#Preprocessing bins=(2,6.5,9) group_names=['bad','good'] wine['quality'] = pd.cut(wine['quality'], bins=bins, labels=group_names) wine['quality'].unique()
And the output for wine['quality'].unique()
now was:
['bad', 'good'] Categories (2, object): ['bad' < 'good']
So, we do not have NaN values anymore, and your Label Encoder should now work fine.