How do I load a dataframe in Python sklearn?

Question

I did some computations in an IPython Notebook and ended up with a dataframe df which isn't saved anywhere yet. In the same IPython Notebook, I want to work with this dataframe using sklearn. df is a dataframe with 4 columns: id (string), value(int), rated(bool), score(float). I am trying to determine what influences the score the most just like in

Accepted Answer

Ok, so some clarifications first:in your example, it is unclear what the load_boston() function does. they just import it. whatever that function returns has an attribute called &#8220;data&#8221;.They use this line:X = pd.DataFrame(boston.data, columns=boston.feature_names)to create a dataframe. Your situation is different because you have a dataframe already and dataframes don&#8217;t have an attribute &#8220;.data&#8221;. Hence, the error you&#8217;re getting: &#8220;DataFrame&#8217; object has no attribute &#8216;data&#8217;.What you need is simplyX = dfy = df['score']# Split the datasetX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=12)or if you need only some of the columns from you dataframe:# set datalist_of_columns = ['id','value']X = df[list_of_columns]# set targettarget_column = 'score'y = df[target_column]# Split the datasetX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=12)

Advertisement

Answer