Tag: scikit-learn

Sklearn RFE, pipeline and cross validation

I’m trying to figure out how to use RFE for regression problems, and I was reading some tutorials. I found an example on how to use RFECV to automatically select the ideal number of features, and it goes something like: which I find pretty straightforward. However, I was checking how to do the same thin…

Pipeline with count and tfidf vectorizer produces TypeError: expected string or bytes-like object

gridsearchcv pipeline python scikit-learn tf-idf

I have a corpus like the following ‘C C C 0 0 0 X 0 1 0 0 0 0’, ‘C C C 0 0 0 X 0 1 0 0 0 0’, ‘C C C 0 0 0 X 0 1 0 0 0 0’, ‘X X X’, ‘X X X’, ‘X X X’, I would like to use

How can i give Gaussian noise to my moons dataset with a deviation value of 0.2 in python?

python scikit-learn

I have make_moons dataset, generated by scikit-learn X, y = make_moons(n_samples=120) How can i give Gaussian noise to my moons dataset with a deviation value of 0.2 in python? Answer You can just pass that value to the make_moons function as noise. noise : double or None (default=None) Standard deviation of …

SGDRegressor() constantly not increasing validation performance

gradient-descent linear-regression machine-learning python scikit-learn

The model fit of my SGDRegressor wont increase or decrease its performance on the validation set (test) after around 20’000 training records. Even if I try to switch penalty, early_stopping (True/False) or alpha,eta0 to extremely high or low levels, there is no change in the behavior of the “stuck…

Why does sklearn MinMaxScaler() return an out-of-range value instead of an error?

machine-learning python scikit-learn

When I use sklearn MinMaxScaler(), I noticed some interesting behavior which shown in the following code. I noticed that when I transform the test_data with fitted MinMaxScaler(), it returns values beyond the defined range (0 – 1). Now, I intentionally make the test_data to be outside the value range of…

plotting a 3d graph of a regressor made with sklearn

3d matplotlib python scikit-learn

I have been using this tutorial to learn decision tree learning, and am now trying to understand how it works with higher dimensional datasets. Currently my regressor predicts a Z value for an (x,y) pair that you pass to it. I want to use a 3d graph to visualise it, but I have struggled with the way regressor…

I cant find why `.read_csv` cannot make a dataframe for `.shape` to recognize

matplotlib pandas python python-3.x scikit-learn

Following a machine learning guide here: https://www.pluralsight.com/guides/scikit-machine-learning/ Running Python 3.8, might have a hunch that I need to run it in IPython but I think that opens up a new can of worms. Also have all imported these libraries installed. I left %matplotlib inline as a comment be…

Fix parameters of Gaussian mixture model, instead of learning

python scikit-learn

Let us say I have a dataset data that I use to fit a Gaussian mixture model: I now store the learnt covariances fit_model.covariances_, means fit_model.means_ and weights fit_model.weights_. From a different script, I want to read in the learnt parameters and define a Gaussian mixture model using them. How do…

How many epochs does scikit learn use when cross validating?

deep-learning python scikit-learn tensorflow

I’m doing some model cross validation with scikit learn in time series data where a Multi Layer Perceptron is trained with Keras. (We are able to use cross_val_score from scikit learn thanks to the keras wrapper). Basically using: The issue is I don’t understand how many epochs its using on each t…

Does it make sense? If yes then how to handle in MSE?

data-analysis data-science linear-regression python scikit-learn

Can we do log transform to one variable and sqrt to another for LinearRegression? If yes then what to do during MSE? Should I exp or square the y_test and prediction? Answer If you transform variables in training and test sets you don’t need to care about your evaluation metric. In case you transform yo…