Skip to content
Advertisement

How can I plot a stacked bar chart of median of a column in pandas dataframe?

So I am a newbie learning about data visualization in pandas (python) , My task is to Create a stacked chart of median WeekHrs and CodeRevHrs for the age group 30 to 35.

following is my code where I extracted the data applying filter on age column and below are the first five rows of my dataset

age_filter= agework [(agework["age"]>= 30 )&(agework["age"]<=35)] 
median_weekhrs= age_filter["Weekhrs"].median()
median_coderev= age_filter["CodeRevHrs"].median()

age_filter.head()

    CodeRevHrs  Weekhrs age
5   3.0          8.0    31.0
11  2.0         40.0    34.0
12  2.0         40.0    32.0
18  15.0        42.0    34.0
22  2.0         40.0    33.0

How can I plot a stacked bar chart with a median?

Please help

Advertisement

Answer

First, to filter for age (and also convert age to int as it makes for cleaner labels):

df = agework.query('30 <= age <= 35')
df['age'] = df['age'].astype(int)

Then, you could plot a bar chart of the median of the two quantities in each age group:

df.groupby('age').median().plot.bar(stacked=True)
plt.title('Median hours, by age')

BTW, you can impose an arbitrary order in how the values are stacked. For example, if you’d rather have 'Weekhrs' at the bottom, you can say:

order = ['Weekhrs', 'CodeRevHrs']
df.groupby('age')[order].median().plot.bar(stacked=True)
plt.title('Median hours, by age')

Now, if you’d rather plot the overall median of these quantities for the entire filtered age range (as you say: a single number for each quantity), then one way (among many) would be:

label = f"{df['age'].min()}-{df['age'].max()}"
df.median().drop('age').to_frame(label).T.plot.bar(stacked=True)
plt.title(f'Median hours for age {label}')
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement