So I am a newbie learning about data visualization in pandas (python) , My task is to Create a stacked chart of median WeekHrs and CodeRevHrs for the age group 30 to 35.
following is my code where I extracted the data applying filter on age column and below are the first five rows of my dataset
age_filter= agework [(agework["age"]>= 30 )&(agework["age"]<=35)]
median_weekhrs= age_filter["Weekhrs"].median()
median_coderev= age_filter["CodeRevHrs"].median()
age_filter.head()
CodeRevHrs Weekhrs age
5 3.0 8.0 31.0
11 2.0 40.0 34.0
12 2.0 40.0 32.0
18 15.0 42.0 34.0
22 2.0 40.0 33.0
How can I plot a stacked bar chart with a median?
Please help
Advertisement
Answer
First, to filter for age (and also convert age to int
as it makes for cleaner labels):
df = agework.query('30 <= age <= 35')
df['age'] = df['age'].astype(int)
Then, you could plot a bar chart of the median of the two quantities in each age group:
df.groupby('age').median().plot.bar(stacked=True)
plt.title('Median hours, by age')

BTW, you can impose an arbitrary order in how the values are stacked. For example, if you’d rather have 'Weekhrs'
at the bottom, you can say:
order = ['Weekhrs', 'CodeRevHrs']
df.groupby('age')[order].median().plot.bar(stacked=True)
plt.title('Median hours, by age')

Now, if you’d rather plot the overall median of these quantities for the entire filtered age range (as you say: a single number for each quantity), then one way (among many) would be:
label = f"{df['age'].min()}-{df['age'].max()}"
df.median().drop('age').to_frame(label).T.plot.bar(stacked=True)
plt.title(f'Median hours for age {label}')
