Skip to content
Advertisement

Tag: scikit-learn

Machine Learning Classifier use past predictions as features

I want to built a binary classifier machine learning model. I want to use the model’s previous predictions as features for the future predictions, to take into account that my training samples are not independent. Is there a framework to achieve this with scikit-learn, or any other python ML library? I know this problem could be solved with memory-based Neural

TypeError during resampling

I am trying to apply resampling for my dataset which has unbalanced classes. What I have done is the following: Unfortunately, I am having some problems at this step: X = pd.concat([X_train, y_train], axis=1), i.e. You can think of Text column as I hope you can help me to handle with it. Answer You have to convert X_train to a

sklearn.compose.make_column_transformer(): using SimpleImputer() and OneHotEncoder() in one step on one dataframe column

I have a dataframe containing a column with categorical variables, which also includes NaNs. I’d like to to use sklearn.compose.make_column_transformer() to prepare the df in a clean way. I tried to impute nan values and OneHotEncode the column with the following code: Running the transformer on my training data raises ValueError: Input contains NaN The desired output would be something

OneHotEncoding Protein Sequences

I have an original dataframe of sequences listed below and am trying to use one-hot encoding and then store these in a new dataframe, I am trying to do it with the following code but am not able to store because I get the following output afterwards: Code: but get error Answer You get that strange array because it treats

Decision tree with a probability target

I’m currently working on a model to predict a probability of fatality once a person is infected with the Corona virus. I’m using a Dutch dataset with categorical variables: date of infection, fatality or cured, gender, age-group etc. It was suggested to use a decision tree, which I’ve already built. Since I’m new to decision trees I would like some

I keep getting ValueError: Shapes (10, 1) and (10, 3) are incompatible when training my model

Turning the number of inputs when I call makeModel from 3 to 1 allows the program to run without errors but no training actually happens and the accuracy doesn’t change. Answer LabelEncoder transforms the input to an array of encoded values. i.e if your input is [“paris”, “paris”, “tokyo”, “amsterdam”] then they can be encoded as [0, 0, 1, 2].

Isolation Forest vs Robust Random Cut Forest in outlier detection

I am examining different methods in outlier detection. I came across sklearn’s implementation of Isolation Forest and Amazon sagemaker’s implementation of RRCF (Robust Random Cut Forest). Both are ensemble methods based on decision trees, aiming to isolate every single point. The more isolation steps there are, the more likely the point is to be an inlier, and the opposite is

How to get the centroids in DBSCAN sklearn?

I am using DBSCAN for clustering. However, now I want to pick a point from each cluster that represents it, but I realized that DBSCAN does not have centroids as in kmeans. However, I observed that DBSCAN has something called core points. I am thinking if it is possible to use these core points or any other alternative to obtain

Advertisement