I have a data frame that looks like this.
Month Day Deadline_Changes Test 3 19 2 English 5 3 8 Math 3 8 34 Science 10 2 17 Science 5 9 21 Social 4 12 3 Math 8 29 1 Music 12 31 9 English
And a second dataframe that looks like this.
Month Day Test 5 30 Math 9 2 Social 12 9 Science 11 30 Music 8 24 Music 2 2 English 6 12 Music 4 9 English
My desired output is
Month Day Test Predicted_Deadline_Changes 5 30 Math 4 9 2 Social 23 12 9 Science 6 11 30 Music 18 8 24 Music 4 2 2 English 2 6 12 Music 1 4 9 English 10
Basically, I want to use my first data frame as my training data to predicted what the deadlines changes are for my second data frame.
I want my desired output to be the second data frame with an additional variable called predicted_deadline_change. I need the predicted_deadline_change variable to be based on the training data.
Using python, what would be the best approach/method to do this?
Advertisement
Answer
This is a simple regression model for predicting deadline changes.
train = pd.read_clipboard() predict = pd.read_clipboard() y = train['Deadline_Changes'] x = train.drop('Deadline_Changes',1) le = preprocessing.LabelEncoder() x['Test'] = le.fit_transform(x['Test']) model = LinearRegression() model.fit(x,y) # remove .round() if you want exact values predict['Predicted_Deadline_Changes'] = model.predict(x).round() print(predict)
Results:
Month Day Test Predicted_Deadline_Changes 0 5 30 Math 3.0 1 9 2 Social 10.0 2 12 9 Science 19.0 3 11 30 Music 20.0 4 8 24 Music 23.0 5 2 2 English 9.0 6 6 12 Music 10.0 7 4 9 English 0.0
There are a lot of different modeling techniques for predicting values, all having different advantages and disadvantages.
This would probably be your most basic model that assumes a linear relationship between your independent and dependent variables.