Skip to content
Advertisement

Tag: scikit-learn

Using PyTorch tensors with scikit-learn

Can I use PyTorch tensors instead of NumPy arrays while working with scikit-learn? I tried some methods from scikit-learn like train_test_split and StandardScalar, and it seems to work just fine, but is there anything I should know when I’m using PyTorch tensors instead of NumPy arrays? According to this question on https://scikit-learn.org/stable/faq.html#how-can-i-load-my-own-datasets-into-a-format-usable-by-scikit-learn : numpy arrays or scipy sparse matrices. Other

Mismatch of manual computation of a evaluation metrics with Sklearn functions

I wanted to compare the manual computations of the precision and recall with scikit-learn functions. However, recall_score() and precision_score() of scikit-learn functions gave me different results. Not sure why! Could you please give me some advice why I am getting different results? Thanks! My confusion matrix: Answer It should be (check return value’s ordering): Please refer: here

Cache only a single step in sklearn’s Pipeline

I want to use UMAP in my sklearn’s Pipeline, and I would like to cache that step to speed things up. However, since I have custom Transformer, the suggested method doesn’t work. Example code: If you run this, you will get a PicklingError, saying it cannot pickle the custom transformer. But I only need to cache the UMAP step. Any

RandomizedSearchCV: All estimators failed to fit

I am currently working on the “French Motor Claims Datasets freMTPL2freq” Kaggle competition (https://www.kaggle.com/floser/french-motor-claims-datasets-fremtpl2freq). Unfortunately I get a “NotFittedError: All estimators failed to fit” error whenever I am using RandomizedSearchCV and I cannot figure out why that is. Any help is much appreciated. The first five rows of the original dataframe data_freq look like this: The error I get is

GaussianProcessRegressor ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size

I am running the following code: The shape of my input is: (19142, 21) dtypes are each: float64 Added in Edit: X and y are Pandas Dataframes. After .values they’re each numpy arrays And I get the Error: ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size. I cant image a dataset of 20000

Advertisement