How to vectorize pandas operation

Question

I have a dataset of house sales with timestamped Periods(per quarter). I want to adjust the price according to the house pricing index change per region. I have a separate dataframe with 3 columns, the Quarter, the Region and the % change in price. I am currently achieving this by iterating over both dataframes. Is there a better way? Minimal

Accepted Answer

Use DataFrame.merge with left_on and right_on, then get all 4 column in output:df = houses_df.merge(HPindex_df,                      left_on=['Period','Region'],                      right_on=['Periods','Regions'],                      how='left')df['HousePrice'] = df['HousePrice'] * df['PriceIndex']print (df)   HousePrice  Period   Region Periods  Regions  PriceIndex0    110000.0  2020Q1  NY-West  2020Q1  NY-West        1.101    302500.0  2020Q2  NY-East  2020Q2  NY-East        1.212    137500.0  2020Q1  NY-West  2020Q1  NY-West        1.103    355200.0  2020Q3  NY-East  2020Q3  NY-East        1.11For avoid it is possible use rename:d = {'Periods':'Period','Regions':'Region'}df = houses_df.merge(HPindex_df.rename(columns=d), on=['Period','Region'], how='left')df['HousePrice'] = df['HousePrice'] * df['PriceIndex']print (df)   HousePrice  Period   Region  PriceIndex0    110000.0  2020Q1  NY-West        1.101    302500.0  2020Q2  NY-East        1.212    137500.0  2020Q1  NY-West        1.103    355200.0  2020Q3  NY-East        1.11Or DataFrame.join with DataFrame.set_index:df = houses_df.join(HPindex_df.set_index(['Periods','Regions']), on=['Period','Region'])df['HousePrice'] = df['HousePrice'] * df['PriceIndex']print (df)   HousePrice  Period   Region  PriceIndex0    110000.0  2020Q1  NY-West        1.101    302500.0  2020Q2  NY-East        1.212    137500.0  2020Q1  NY-West        1.103    355200.0  2020Q3  NY-East        1.11

Advertisement

Answer