I am trying to plot a line graph comparing the Murder Rates of particular States through the years 1960-1962 using Pandas in a Jupyter Notebook.
A little context about where I am now, and how I arrived here:
I’m using a crime csv file, which looks like this:
I’m only interested in 3 columns for the time being: State, Year, and Murder Rate. Specifically I was interested in only 5 states – Alaska, Michigan, Minnesota, Maine, Wisconsin.
So to produce the desired table, I did this (only showing top 5 row entries):
al_mi_mn_me_wi = crimes[(crimes['State'] == 'Alaska') | (crimes['State'] =='Michigan') | (crimes['State'] =='Minnesota') | (crimes['State'] =='Maine') | (crimes['State'] =='Wisconsin')] control_df = al_mi_mn_me_wi[['State', 'Year', 'Murder Rate']]
From here I used the pivot function
df = control_1960_to_1962.pivot(index = 'Year', columns = 'State',values= 'Murder Rate' )
And this is where I get stuck. I received KeyError when doing (KeyError was Year):
df.plot(x='Year', y='Murder Rate', kind='line')
and when attempting just
df.plot()
I get this wonky graph.
How do I get my desired graph?
Advertisement
Answer
Given a dataframe in a long (tidy) format, pandas.DataFrame.pivot
is used to transform to a wide format, which can be plotted directly with pandas.DataFrame.plot
Tested in python 3.8.11
, pandas 1.3.3
, matplotlib 3.4.3
import numpy as np import pandas as pd control_1960_to_1962 = pd.DataFrame({ 'State': np.repeat(['Alaska', 'Maine', 'Michigan', 'Minnesota', 'Wisconsin'], 3), 'Year': [1960, 1961, 1962]*5, 'Murder Rate': [10.2, 11.5, 4.5, 1.7, 1.6, 1.4, 4.5, 4.1, 3.4, 1.2, 1.0, .9, 1.3, 1.6, .9] }) df = control_1960_to_1962.pivot(index='Year', columns='State', values='Murder Rate') # display(df) State Alaska Maine Michigan Minnesota Wisconsin Year 1960 10.2 1.7 4.5 1.2 1.3 1961 11.5 1.6 4.1 1.0 1.6 1962 4.5 1.4 3.4 0.9 0.9
The plots
You can tell Pandas (and through it the matplotlib
package that actually does the plotting) what xticks you want explicitly:
ax = df.plot(xticks=df.index, ylabel='Murder Rate')
Output:
ax
is a matplotlib.axes.Axes
object, and there are many, many customizations you can make to your plot through it.
Here’s how to plot with the States
on the x axis:
ax = df.T.plot(kind='bar', ylabel='Murder Rate')
Output: