Not sure how to fix . Any help much appreciate. I saw thi Vectorization: Not a valid collection but not sure if i understood this
train = df1.iloc[:,[4,6]] target =df1.iloc[:,[0]] def train(classifier, X, y): X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=33) classifier.fit(X_train, y_train) print ("Accuracy: %s" % classifier.score(X_test, y_test)) return classifier trial1 = Pipeline([ ('vectorizer', TfidfVectorizer()), ('classifier', MultinomialNB()),]) train(trial1, train, target)
error below :
----> 6 train(trial1, train, target) <ipython-input-140-ac0e8d32795e> in train(classifier, X, y) 1 def train(classifier, X, y): ----> 2 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=33) 3 4 classifier.fit(X_train, y_train) 5 print ("Accuracy: %s" % classifier.score(X_test, y_test)) /home/manisha/anaconda3/lib/python3.5/site-packages/sklearn/model_selection/_split.py in train_test_split(*arrays, **options) 1687 test_size = 0.25 1688 -> 1689 arrays = indexable(*arrays) 1690 1691 if stratify is not None: /home/manisha/anaconda3/lib/python3.5/site-packages/sklearn/utils/validation.py in indexable(*iterables) 204 else: 205 result.append(np.array(X)) --> 206 check_consistent_length(*result) 207 return result 208 /home/manisha/anaconda3/lib/python3.5/site-packages/sklearn/utils/validation.py in check_consistent_length(*arrays) 175 """ 176 --> 177 lengths = [_num_samples(X) for X in arrays if X is not None] 178 uniques = np.unique(lengths) 179 if len(uniques) > 1: /home/manisha/anaconda3/lib/python3.5/site-packages/sklearn/utils/validation.py in <listcomp>(.0) 175 """ 176 --> 177 lengths = [_num_samples(X) for X in arrays if X is not None] 178 uniques = np.unique(lengths) 179 if len(uniques) > 1: /home/manisha/anaconda3/lib/python3.5/site-packages/sklearn/utils/validation.py in _num_samples(x) 124 if len(x.shape) == 0: 125 raise TypeError("Singleton array %r cannot be considered" --> 126 " a valid collection." % x) 127 return x.shape[0] 128 else: TypeError: Singleton array array(<function train at 0x7f3a311320d0>, dtype=object) cannot be considered a valid collection. ____
Not sure how to fix . Any help much appreciate. I saw thi Vectorization: Not a valid collection but not sure if i understood this
Advertisement
Answer
This error arises because your function train
masks your variable train
, and hence it is passed to itself.
Explanation:
You define a variable train like this:
train = df1.iloc[:,[4,6]]
Then after some lines, you define a method train like this:
def train(classifier, X, y):
So what actually happens is, your previous version of train
is updated with new version. That means that the train
now does not point to the Dataframe object as you wanted, but points to the function you defined. In the error it is cleared.
array(<function train at 0x7f3a311320d0>, dtype=object)
See the function train inside the error statement.
Solution:
Rename one of them (the variable or the method).
Suggestion: Rename the function to some other name like training
or training_func
or something like that.