Skip to content
Advertisement

Building a basic prediction model with the output being the sum of the two inputs but accuracy score is significantly low

I have a csv of size 12500 X 3. The first two columns (A and B) are inputs and the the final column (C) is the sum of the two columns.

I wanted to build a prediction model to get the value of C for a given A and B. This is just a basic model to imporve my understanding of machine learning.

The accuracy score is almost zero (0.00032) and the model is way to simple to get the predictions wrong. The code is below:

import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

data = pd.read_csv('Dataset.csv') #importing dataset
X = data.drop(columns=['C'])
y = data['C']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = DecisionTreeClassifier()
model.fit(X_train,y_train)
predictions = model.predict(X_test)
score = accuracy_score(y_test, predictions)
score

I did not even include outlier into the data and I create the csv using excel formulae. I used jupyter notebook to build this prediction model. Can someone please point out if/what I’m doing wrong?

Advertisement

Answer

Before you build your model, you should understand the behavior of the model and its main function. Decision Tree is used to classify data based on the criterias extracted from data. For this purpose, you should just choose the simple Linear Regression model, not the Decision Tree.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement