Tag: pipeline

Pipeline with count and tfidf vectorizer produces TypeError: expected string or bytes-like object

gridsearchcv pipeline python scikit-learn tf-idf

I have a corpus like the following ‘C C C 0 0 0 X 0 1 0 0 0 0’, ‘C C C 0 0 0 X 0 1 0 0 0 0’, ‘C C C 0 0 0 X 0 1 0 0 0 0’, ‘X X X’, ‘X X X’, ‘X X X’, I would like to use

Including Scaling and PCA as parameter of GridSearchCV

grid-search pipeline python regression scikit-learn

I want to run a logistic regression using GridSearchCV, but I want to contrast the performance when Scaling and PCA is used, so I don’t want to use it in all cases. I basically would like to include PCA and Scaling as “parameters” of the GridSearchCV I am aware I can make a pipeline like this: The thing is that,

sklearn.compose.make_column_transformer(): using SimpleImputer() and OneHotEncoder() in one step on one dataframe column

imputation one-hot-encoding pipeline python scikit-learn

I have a dataframe containing a column with categorical variables, which also includes NaNs. I’d like to to use sklearn.compose.make_column_transformer() to prepare the df in a clean way. I tried to impute nan values and OneHotEncode the column with the following code: Running the transformer on my training data raises ValueError: Input contains NaN The desired output would be something

Singleton array array(, dtype=object) cannot be considered a valid collection

pandas pipeline python scikit-learn train-test-split

Not sure how to fix . Any help much appreciate. I saw thi Vectorization: Not a valid collection but not sure if i understood this error below : Not sure how to fix . Any help much appreciate. I saw thi Vectorization: Not a valid collection but not sure if i understood this Answer This error arises because your function