I was trying to plot the accuracy of my train and test set from a decision tree model. Since I am new to using python, I wasn’t sure what type of graphing package I should use. I have used a simple for loop for getting the printed results, but not sure how ]I can plot it.
Thanks!
My code:
for x in max_depth_list : dtc =DecisionTreeClassifier(max_depth=x) dtc.fit(train_x,train_y) train_z = dtc.predict(train_x) train_z_prob = dtc.predict_proba(train_x)[:,1] test_z = dtc.predict(test_x) test_z_prob = dtc.predict_proba(test_x)[:,1] print("split: {}".format(x)) print("model accuracy: {}".format(accuracy_score(test_y, test_z)))
Desired graph enter image description here
Advertisement
Answer
The plot in the image you posted was most likely created with the matplotlib.pyplot
module. You can probably plot a similar graph by executing something like this, assuming that you have imported other necessary dependencies:
import numpy as np import matplotlib.pyplot as plt max_depth_list = [1,2,3,4] train_errors = [] # Log training errors for each model test_errors = [] # Log testing errors for each model for x in max_depth_list: dtc = DecisionTreeClassifier(max_depth=x) dtc.fit(train_x,train_y) train_z = dtc.predict(train_x) test_z = dtc.predict(test_x) train_errors.append(accuracy_score(train_x, train_z)) test_errors.append(accuracy_score(test_y, test_z)) x = np.arange(len(max_depth_list)) + 1 # Create domain for plot plt.plot(x, train_errors, label='Training Error') # Plot training error over domain plt.plot(x, test_errors, label='Testing Error') # Plot testing error over domain plt.xlabel('Maximum Depth') # Label x-axis plt.ylabel('Total Error') # Label y-axis plt.legend() # Show plot labels as legend plt.show() # Show graph
I’m new to this community as well, so I am in no position to give advice to other users. However, it’s probably a good idea to format your source code for better readability and presentation. Just a heads up.
I hope this helps. Let me know if anything is unclear.