I’m trying to look in a dataframe, and find the median of data within a column based on another column.
I have a dataframe with ‘zipcode’ data and ‘price’ data. I want to find the median of the ‘price’ based on the ‘zipcode’, and report it in a new column. When I run the program as is, I get a column that reports the median of the whole dataset, but I want to add the column such that we would get the median of each zip code reported. What is the piece I am missing?
”’
d = {'zipcode': [99516, 99516, 99516, 99516, 89507, 89507, 89507], 'price': [15000, 14000, 13000, 78000, 3000, 4000, 500]} df = pd.DataFrame(data=d) medians = df.groupby(['zipcode','price'])['price'].transform('median') df['median'] = df['price'].median() df
”’
Advertisement
Answer
You should groupby
with zip code only
df['median_cal'] = df.groupby('zipcode')['price'].transform('median')