Tag: scikit-learn

How to build a custom scaler based on StandardScaler?

I am trying to build a custom scaler to scale only the continuous variables on a dataset (the US Adult Income: https://www.kaggle.com/uciml/adult-census-income), using StandardScaler as a base. Here is my Python code that I used: However when I tried to run the scaler, I met this problem: So what is the error that I have on building the scaler? And

TypeError: fit() missing 1 required positional argument: ‘y’,

machine-learning python python-3.x regression scikit-learn

I want to try out all regressors within the library. Since I do know, that some of the regressors require more input I build the try and expept catch block. This returns the following snipped many times: In my opinion there are two problems here. First, exept never gets called. Second, the y input is not recognized. I am gratefull

Cross-validation with time series data in sklearn

machine-learning python scikit-learn validation

I have a question with regard to cross-validation of time series data in general. The problem is macro forecasting, e.g. forecasting the 1-month ahead Price of the S&P500 using different monthly macro variables. Now I read about the following approach: One should/could use a rolling cross-validation approach. I.e. always drop an old monthly value and add a new one (=

Clustering on Python and Bokeh; select widget which allows user to change clustering algorithm

bokeh cluster-analysis numpy python scikit-learn

I am trying to build a feature in a Bokeh dashboard which allows the user to cluster data. I am using the following example as a template, here is the link:- Clustering in Bokeh example Here is the code from this example:- The example allows the user to cluster data. Within the code, you can specify which algorithm to use;

init() got an unexpected keyword argument ‘handle_unknown’

encoding machine-learning python scikit-learn

I’m trying to Ordinal Encode my categorical features using sklearn, but I get the error __init__() got an unexpected keyword argument ‘handle_unknown’ when I compile the below code: A sample data to reproduce the error: Could someone please tell me what’s wrong in my code? Answer You are most likely not using an appropriate version of scikit-learn. handle_unknown and unknown_value

Micro metrics vs macro metrics

classification python scikit-learn statistics

To test the results of my multi-label classfication model, I measured the Precision, Recall and F1 scores. I wanted to compare two different results, Micro and Macro. I have a dataset with few rows, but my label count is around 1700. Why is the macro so low even though I get a high result in micro, which one would be

How to properly cluster with HDBSCAN for 1D dataset?

hdbscan hierarchical-clustering machine-learning python scikit-learn

My dataset below shows product sales per price (link to download dataset csv): What I want to achive is clustering the dense regions (rectangles below) using HDBSCAN and sklearn. We have four regions, but regions 3 and 4 could also be grouped into a big region, which would lead to only 3 regions on the entire dataset by changing the

scikit preprocessing across entire dataframe

machine-learning pandas python scikit-learn

I have a dataframe: The data is an average response of the same question asked across 4 quarters. I am trying to create a benchmark index from this data. To do so I wanted to preprocess it first using either standardize or normalize. How would I standardize/normalize across the entire dataframe. What is the best way to go about this?

Installing scipy and scikit-learn on apple m1

apple-m1 python scikit-learn scipy

The installation on the m1 chip for the following packages: Numpy 1.21.1, pandas 1.3.0, torch 1.9.0 and a few other ones works fine for me. They also seem to work properly while testing them. However when I try to install scipy or scikit-learn via pip this error appears: ERROR: Failed building wheel for numpy Failed to build numpy ERROR: Could

cannot import name ‘stop_words’ from ‘sklearn.feature_extraction’

python scikit-learn

I’ve been trying to follow an NLP notebook, and they use: However, this is throwing the following error: My guess is that stop_words is not (or maybe no longer) part of the ‘feature_extraction’ part of sklearn, but I might be wrong. I have seen some articles that used sklearn.feature_extraction.stop_words, but at the same time I see places which have used