I’m trying to classify a dataset using python with pandas.
The iris flower dataset consists of 50 samples from each of three species of Iris and contains four features.The goal is to distiguish between the species of irises based on these features.
Question:
Generate a scatter plot with sepal_length feature in x-axis and petal_width feature in y-axis, which shows the points corresponding to different classes with different colors. You can use <matplotlib>
library here.
What i have done:
setosa_df = df[df['name']=='setosa'] virginica_df = df[df['name']=='virginica'] versicolor_df = df[df['name']=='versicolor'] plt.plot(setosa_df['sepal_length'],setosa_df['petal_width'],color= 'green', linestyle = 'none',marker = 'o') plt.plot(virginica_df['sepal_length'],setosa_df['petal_width'],color= 'blue', linestyle = 'none',marker = 'o') plt.plot(versicolor_df['sepal_length'],setosa_df['petal_width'],color= 'red', linestyle = 'none',marker = 'o') plt.xlabel(r'$x$') plt.xlabel('sepal_length') plt.ylabel('petal_width') plt.ylabel(r'$y$') plt.show()
The problem is, it only displays a part([4.0,7.0]*[0,0.65]) of the whole data, what should I do to make it complete?
Thanks in advance!
Advertisement
Answer
I think you made a mistake, in the second and third plots you use setosa_df['petal_width']
instead of their data for Y axis
.
Here is the fixed version:
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns df = sns.load_dataset('iris') setosa_df = df[df['species']=='setosa'] virginica_df = df[df['species']=='virginica'] versicolor_df = df[df['species']=='versicolor'] plt.plot( setosa_df['sepal_length'], setosa_df['petal_width'], color= 'green', linestyle = 'none',marker = 'o') plt.plot( virginica_df['sepal_length'], virginica_df['petal_width'], color= 'blue', linestyle = 'none',marker = 'o') plt.plot( versicolor_df['sepal_length'], versicolor_df['petal_width'], color= 'red', linestyle = 'none',marker = 'o') plt.xlabel(r'$x$') plt.xlabel('sepal_length') plt.ylabel('petal_width') plt.ylabel(r'$y$') plt.show()
If you wonder why your code works in this way, then run setosa_df.describe()
. You see the min and max of setosa_df['petal_width']
is between [0.1,0.6] which is the exact range that plot
shows you.
The plot
by default adjusts the range. But if you need it for your future work, you can do it like below:
plt.xlim([df['sepal_length'].min()-0.1, df['sepal_length'].max()+0.1]) plt.ylim([df['petal_width'].min()-0.1, df['petal_width'].max()+0.1])