Skip to content
Advertisement

How to visualize Classification using pandas and matplotlib?

I’m trying to classify a dataset using python with pandas.

The iris flower dataset consists of 50 samples from each of three species of Iris and contains four features.The goal is to distiguish between the species of irises based on these features.

Question: Generate a scatter plot with sepal_length feature in x-axis and petal_width feature in y-axis, which shows the points corresponding to different classes with different colors. You can use <matplotlib> library here.

What i have done:

setosa_df = df[df['name']=='setosa']

virginica_df = df[df['name']=='virginica']

versicolor_df = df[df['name']=='versicolor']

plt.plot(setosa_df['sepal_length'],setosa_df['petal_width'],color= 'green', linestyle = 
'none',marker = 'o')

plt.plot(virginica_df['sepal_length'],setosa_df['petal_width'],color= 'blue', linestyle = 'none',marker = 'o')

plt.plot(versicolor_df['sepal_length'],setosa_df['petal_width'],color= 'red', linestyle = 'none',marker = 'o')

plt.xlabel(r'$x$')

plt.xlabel('sepal_length')

plt.ylabel('petal_width')

plt.ylabel(r'$y$')

plt.show()

The problem is, it only displays a part([4.0,7.0]*[0,0.65]) of the whole data, what should I do to make it complete?

Thanks in advance!

Advertisement

Answer

I think you made a mistake, in the second and third plots you use setosa_df['petal_width'] instead of their data for Y axis.

Here is the fixed version:

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt


import seaborn as sns


df = sns.load_dataset('iris')

setosa_df = df[df['species']=='setosa']

virginica_df = df[df['species']=='virginica']

versicolor_df = df[df['species']=='versicolor']

plt.plot(    setosa_df['sepal_length'],     setosa_df['petal_width'],  color= 'green', linestyle = 'none',marker = 'o')
plt.plot(    virginica_df['sepal_length'],  virginica_df['petal_width'],  color= 'blue', linestyle = 'none',marker = 'o')
plt.plot(    versicolor_df['sepal_length'], versicolor_df['petal_width'],  color= 'red', linestyle = 'none',marker = 'o')



plt.xlabel(r'$x$')
plt.xlabel('sepal_length')
plt.ylabel('petal_width')
plt.ylabel(r'$y$')

plt.show()

If you wonder why your code works in this way, then run setosa_df.describe(). You see the min and max of setosa_df['petal_width'] is between [0.1,0.6] which is the exact range that plot shows you.

The plot by default adjusts the range. But if you need it for your future work, you can do it like below:

plt.xlim([df['sepal_length'].min()-0.1, df['sepal_length'].max()+0.1])
plt.ylim([df['petal_width'].min()-0.1, df['petal_width'].max()+0.1])
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement