Tag: scikit-learn

Getting error while calculating AUC ROC for keras model predictions

I have a patient data named dat and labels (0 = No Disease, 1 = Disease) named labl both in the form of array. I predicted my model and stored the predictions named pre which is also an array, and I want to calculate and plot the AUC ROC. But I am getting this error while doing so. TypeError: Singleton

How to put importance coefficients to features before kmeans?

k-means machine-learning pca python scikit-learn

Lets say I have the given dataframe And I would like to find clusters in these rows. To do so, I want to use Kmeans. However, I would like to find clusters by giving more importance to [feature_1, feature_2] than to the other features in the dataframe. Lets say an importance coefficient of 0.5 for [feature_1, feature_2] , and 0.5

How to draw the precision-recall curve for a segmentation model?

deep-learning keras python scikit-learn tensorflow

I am using an U-Net for segmenting my data of interest. The masks are grayscale and of size (256,256,1). There are 80 images in the test set. The test images (X_ts) and their respective ground-truth masks (Y_ts) are constructed, saved, and loaded like this: The shape of Y_ts (ground truth) is therefore (80,256,256,1) and these are of type “Array of

Cant install imbalanced-learn on an Azure ML Environment

azure-machine-learning-service pip python scikit-learn

I have an Azure ML Workspace which comes by default with some pre-installed packages. I tried to install But I got this error learn) Not sure how to solve this, I have read in other posts to use conda, but that didnt work either. Answer scikit-learn 1.0.1 and up require Python >= 3.7; you use Python 3.6. You need to

Add features to the “numeric” dataset whose categorical value must be mapped using a conversion formula

machine-learning numpy pandas python scikit-learn

I have this dataset: This is the request: “Add the Mjob and Fjob attributes to the “numeric” dataset whose categorical value must be mapped using a conversion formula of your choice.” Does anyone knows how to do it? For example: if ‘at_home’ value become ‘1’ in Mjob, I want the same result in the Fjob column. Same categorical values must

Missing categorical data should be encoded with an all-zero one-hot vector

data-science machine-learning pandas python scikit-learn

I am working on a machine learning project with very sparsely labeled data. There are several categorical features, resulting in roughly one hundred different classes between the features. For example: After I put these through scikit’s OneHotEncoder I am expecting the missing data to be encoded as 00, since the docs state that handle_unknown=’ignore’ causes the encoder to return an

Is it possible to optimize hyperparameters for optional sklearn pipeline steps?

machine-learning pipeline python scikit-learn

I tried to construct a pipeline that has some optional steps. However, I would like to optimize hyperparameters for those steps as I want to get the best option between not using them and using them with different configurations (in my case SelectFromModel – sfm). The error that I get is ‘string’ object has no attribute ‘set_params’ which is understandable.

Training, Validation and Test sets for imbalanced datasets in Machine Learning

classification machine-learning nlp python scikit-learn

I am working on an NLP task for a classification problem. My dataset is imbalanced and some authors have only 1 text, and thus I want to have this text only in the training set. As for the other authors I need to split the dataset into 70% training set, 15% validation set and 15% test set. I tried to

How to specify Search Space in Auto-Sklearn

auto-sklearn hyperparameters python scikit-learn

I know how to specify Feature Selection methods and the list of the Algorithms used in Auto-Sklearn 2.0 I know that Auto-Sklearn use Bayesian Optimisation SMAC but I would like to specify the HyperParameters in AutoSklearn For example, I want to specify random_forest with Estimator = 1000 only or MLP with HiddenLayerSize = 100 only. How to do that? Answer

Gaussian Process Regression: tune hyperparameters based on validation set

cross-validation gaussian-process non-linear-regression python scikit-learn

In the standard scikit-learn implementation of Gaussian-Process Regression (GPR), the hyper-parameters (of the kernel) are chosen based on the training set. Is there an easy to use implementation of GPR (in python), where the hyperparemeters (of the kernel) are chosen based on a separate validation set? Or cross-validation would also be a nice alternative to find suitable hyperparameters (that are