I want to build deep learning classifiers for Kickstarter campaign prediction. I have a problem with the part of the model but I can not solve this.
My code:
import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import CountVectorizer from keras.models import Sequential from keras import layers df = pd.read_csv('../input/kickstarter-campaigns-dataset/kickstarter_data_full.csv') df_X = [] # for x class df_y = [] # for labels for i in range(len(df)): tmp = str(df['blurb'][i]) + " " + str(df['goal'][i]) + " " + str(df['pledged'][i]) + " " + str(df['country'][i]) + " " + str(df['currency'][i]) + " " + str(df['category'][i]) + " " + str(df['spotlight'][i]) df_X.append(tmp) df_y.append(str(df['SuccessfulBool'][i])) X_train, X_test, y_train, y_test = train_test_split(df_X, df_y, test_size=0.25, random_state=1000) vectorizer = CountVectorizer() vectorizer.fit(X_train) X_train = vectorizer.transform(X_train) X_test = vectorizer.transform(X_test) input_dim = X_train.shape[1] model = Sequential() model.add(layers.Dense(10, input_dim=input_dim, activation='relu')) model.add(layers.Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) model.summary() history = model.fit(X_train, y_train, epochs=100, verbose=False, validation_data=(X_test, y_test), batch_size=10)
In this point, I am getting ValueError: Failed to find data adapter that can handle input: <class ‘scipy.sparse.csr.csr_matrix’>, (<class ‘list’> containing values of types {“<class ‘str’>”})
I try np.asarray for solving
X_train = np.asarray(X_train) y_train = np.asarray(y_train) X_test = np.asarray(X_test) y_test = np.asarray(y_test)
I get this ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type csr_matrix).
Therefore, I use this:
np.asarray(X_train).astype(np.float32) np.asarray(y_train).astype(np.float32) np.asarray(X_test).astype(np.float32) np.asarray(y_test).astype(np.float32)
But I get ValueError: setting an array element with a sequence.
I try this:
X_train = np.expand_dims(X_train, -1) y_train = np.expand_dims(y_train, -1) X_test = np.expand_dims(X_test, -1) y_test = np.expand_dims(y_test, -1)
But I keep getting the same error in the part of history. ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type csr_matrix).
I study with Kickstarter Campaigns Dataset at the Kaggle. https://www.kaggle.com/sripaadsrinivasan/kickstarter-campaigns-dataset
I don’t have enough NLP information. I search and try solvings but I can not solve. This is my homework. Can you help me for this problem?
df_X and df_y are equal sizes and their output is as follows: x y
Advertisement
Answer
you need to add an embedding layer at the top of your NN to kind of vectorize words. something like this:
import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import CountVectorizer from keras.preprocessing.text import one_hot from keras.preprocessing.sequence import pad_sequences from keras.models import Sequential from keras import layers df = pd.read_csv('../input/kickstarter-campaigns-dataset/kickstarter_data_full.csv') df_X = [] # for x class df_y = [] # for labels for i in range(len(df)): tmp = str(df['blurb'][i]) + " " + str(df['goal'][i]) + " " + str(df['pledged'][i]) + " " + str(df['country'][i]) + " " + str(df['currency'][i]) + " " + str(df['category'][i]) + " " + str(df['spotlight'][i]) df_X.append(tmp) df_y.append(str(df['SuccessfulBool'][i])) vocab_size = 1000 encoded_docs = [one_hot(d, vocab_size) for d in df_X] max_length = 20 padded_docs = pad_sequences(encoded_docs, maxlen=max_length, padding='post') X_train, X_test, y_train, y_test = train_test_split(padded_docs, np.array(df_y)[:, None].astype(int), test_size=0.25, random_state=1000) model = Sequential() model.add(layers.Embedding(vocab_size, 100, input_length=max_length)) model.add(layers.Flatten()) model.add(layers.Dense(10, activation='relu')) model.add(layers.Dense(1, activation='sigmoid')) model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) print(model.summary()) model.fit(X_train, y_train, epochs=50, verbose=1, validation_data=(X_test, y_test), batch_size=10)