How to go about making available the clf.best_params_ after carrying a pipeline? For the code I have below, I get an: AttributeError: ‘GridSearchCV’ object has no attribute ‘best_params_’ Here is my code: Answer Your clf is never fitted. You probably meant clf.fit(X_train,y_train). Als…
Tag: random-forest
Suspect overfitting binary classification toy problem with scikit-learn RandomForestClassifier
I’m trying to train a Random Forest to classify the species of a set of flowers from the iris dataset. However, the validation looks kind of weird to me, since it looks like the results are perfect, which is something I would not expect. Since I would like to perform a binary classification, I exclude f…
Do Machine Learning Algorithms read data top-down or bottom up?
I’m new to Machine Learning and I’m a bit confused about how data is being read for the training/testing process. Assuming my data works with date and I want the model to read the later dates first before getting to the newer dates, the data is saved in the form of earliest date on line 1 and line…
Outlier removal Isolation Forest
I’ve been trying to remove outliers from my database using isolation forest, but I can’t figure out how. I’ve seen the examples for credit card fraud and Salary but I can’t figure out how to apply them on each column as my database consists of 3862900 rows and 19 columns. I’ve up…
Random Forest tuning with RandomizedSearchCV
I have a few questions concerning Randomized grid search in a Random Forest Regression Model. My parameter grid looks like this: and my code for the RandomizedSearchCV like this: is there any way to calculate the Root mean square at each parameter set? This would be more interesting to me as the R^2 score? If…
pipeline for RandomOversampler, RandomForestClassifier & GridSearchCV
I am working on a binary text classification problem. As the classes are highly imbalanced, I am using sampling techniques like RandomOversampler(). Then for classification I would use RandomForestClassifier() whose parameters need to be tuned using GridSearchCV(). I am trying to create a pipeline to do these…
What does the value of ‘leaf’ in the following xgboost model tree diagram means?
I am guessing that it is conditional probability given that the above (tree branch) condition exists. However, I am not clear on it. If you want to read more about the data used or how do we get this diagram then go to : http://machinelearningmastery.com/visualize-gradient-boosting-decision-trees-xgboost-pyth…
Retrieve list of training features names from classifier
Is there a way to retrieve the list of feature names used for training of a classifier, once it has been trained with the fit method? I would like to get this information before applying to unseen data. The data used for training is a pandas DataFrame and in my case, the classifier is a RandomForestClassifier…
RandomForestClassifier import
I’ve installed Anaconda Python distribution with scikit-learn. While importing RandomForestClassifier: from sklearn.ensemble import RandomForestClassifier I have the following error: File “C:Anacondalibsite-packagessklearntreetree.py”, line 36, in <module> from . import _tree ImportErr…