Skip to content
Advertisement

Python – Prediction of Variable Based On Data

I have a data frame that looks like this.

Month Day Deadline_Changes Test  
3     19  2                English 
5     3   8                Math
3     8   34               Science 
10    2   17               Science 
5     9   21               Social
4     12  3                Math
8     29  1                Music 
12    31  9                English

And a second dataframe that looks like this.

    Month Day Test  
    5     30  Math 
    9     2   Social 
    12    9   Science 
    11    30  Music  
    8     24  Music 
    2     2   English 
    6     12  Music 
    4     9   English

My desired output is

        Month Day Test     Predicted_Deadline_Changes  
        5     30  Math     4
        9     2   Social   23
        12    9   Science  6
        11    30  Music    18
        8     24  Music    4
        2     2   English  2
        6     12  Music    1
        4     9   English  10

Basically, I want to use my first data frame as my training data to predicted what the deadlines changes are for my second data frame.

I want my desired output to be the second data frame with an additional variable called predicted_deadline_change. I need the predicted_deadline_change variable to be based on the training data.

Using python, what would be the best approach/method to do this?

Advertisement

Answer

This is a simple regression model for predicting deadline changes.

train = pd.read_clipboard()
predict = pd.read_clipboard()
y = train['Deadline_Changes']
x = train.drop('Deadline_Changes',1)
le = preprocessing.LabelEncoder()
x['Test'] = le.fit_transform(x['Test'])
model = LinearRegression()
model.fit(x,y)
# remove .round() if you want exact values
predict['Predicted_Deadline_Changes'] = model.predict(x).round()
print(predict)

Results:

 Month  Day Test    Predicted_Deadline_Changes
0   5   30  Math    3.0
1   9   2   Social  10.0
2   12  9   Science 19.0
3   11  30  Music   20.0
4   8   24  Music   23.0
5   2   2   English 9.0
6   6   12  Music   10.0
7   4   9   English 0.0

There are a lot of different modeling techniques for predicting values, all having different advantages and disadvantages.

This would probably be your most basic model that assumes a linear relationship between your independent and dependent variables.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement