I’m trying to look in a dataframe, and find the median of data within a column based on another column.
I have a dataframe with ‘zipcode’ data and ‘price’ data. I want to find the median of the ‘price’ based on the ‘zipcode’, and report it in a new column. When I run the program as is, I get a column that reports the median of the whole dataset, but I want to add the column such that we would get the median of each zip code reported. What is the piece I am missing?
”’
JavaScript
x
9
1
d = {'zipcode': [99516, 99516, 99516, 99516, 89507, 89507, 89507],
2
'price': [15000, 14000, 13000, 78000, 3000, 4000, 500]}
3
df = pd.DataFrame(data=d)
4
5
medians = df.groupby(['zipcode','price'])['price'].transform('median')
6
7
df['median'] = df['price'].median()
8
df
9
”’
Advertisement
Answer
You should groupby
with zip code only
JavaScript
1
2
1
df['median_cal'] = df.groupby('zipcode')['price'].transform('median')
2