Decision tree with a probability target

Question

I&#8217;m currently working on a model to predict a probability of fatality once a person is infected with the Corona virus. I&#8217;m using a Dutch dataset with categorical variables: date of infection, fatality or cured, gender, age-group etc. It was suggested to use a decision tree, which I&#8217;ve alread…

Accepted Answer

you can use the method &#8220;predict_proba&#8221; of the DecisionTreeClassifier to compute the probabilities instead of the binary classification values.In order to test individual data, that you can create by hand, you have to create an array of the shape of your X_test data (just that it only has one entry). Then you can use that with model.predict(array) or model.predict_proba(array).By the way, your tree is currently not useful for retrieving probabilities. There is an article that explains the problem very well: https://rpmcruz.github.io/machine%20learning/2018/02/09/probabilities-trees.htmlSo you can fix your code by defining the max_depths of your tree:from pandas import read_csvfrom sklearn.model_selection import train_test_splitfrom sklearn.tree import DecisionTreeClassifier import pandas as pdimport random as rndfilename = 'pima-indians-diabetes.csv'names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']dataframe = read_csv(filename, names=names)array = dataframe.valuesX = array[:,0:8]Y = array[:,8]X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=1234)model = DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=1,            max_features=None, max_leaf_nodes=None,            min_impurity_decrease=0.0, min_impurity_split=None,            min_samples_leaf=1, min_samples_split=2,            min_weight_fraction_leaf=0.0, presort=False, random_state=None,            splitter='best')model.fit(X_train, Y_train)rnd.seed(123458)X_new = X[rnd.randrange(X.shape[0])]X_new = X_new.reshape(1,8)YHat = model.predict_proba(X_new)df = pd.DataFrame(X_new, columns = names[:-1])df["predicted"] = list(YHat)print(df)

Advertisement

Answer