I want to divide a dataframe column by a specific cell in the same dataframe.
I have a dataframe like this:
date type score 20201101 experiment1 30 20201101 experiment2 20 20201101 baseline 10 20201102 experiment1 60 20201102 experiment2 50 20201102 baseline 10
I want to compute the score_ratio by dividing the score by the ‘baseline’ score of that date.
date type score score_ratio 20201101 experiment1 30 3 20201101 experiment2 20 2 20201101 baseline 10 1 20201102 experiment1 60 6 20201102 experiment2 50 5 20201102 baseline 10 1
The score_ratio for (date, type) = (20201101, experiment1)
should be obtained by dividing its score by the score of (20201101, baseline)
. In this case, it should be 30 / 10 = 3. Similarly. for (20201101, experiment2)
, we should divide the score by the same thing, (20201101, baseline)
. For a different date, say (20201102, experiment1)
, it should be divided by the baseline of that date, (20201102, baseline)
.
How do I add this column with dataframe operations?
So far, I have this but am unsure of what expression I should be dividing by: df['score_ratio'] = df['score'].div(...)
Edit:
I get the error for the last line ValueError: Length of values does not match length of index
ID date type room score 0 id1 20201120 baseline 1 450.25 0 id2 20201120 experiment1 1 -3637.24 0 id3 20201121 baseline 1 200.00 1 id4 20201121 experiment1 1 300.00 2 id5 20201120 baseline 2 600.00 3 id6 20201120 experiment1 2 800.00 _df = df.set_index('date', 'room') d = _df.query('type=="baseline"') print(_df['score'].div(d['score']).values) df['score_ratio'] = _df['score'].div(d['score']).values
Advertisement
Answer
#Mask all whose type is baseline into a new datframe and merge to the main df g=pd.merge(df, df[df.type.eq('baseline')].drop(columns='type'),how='left', on='date', suffixes=('', '_right')) #Calculate the score_ratio and drop the extra column acquired during merge df=g.assign(score_ratio=g.score.div(g.score_right).astype(int)).drop(columns=['score_right']) print(df) date type score score_ratio 0 20201101 experiment1 30 3 1 20201101 experiment2 20 2 2 20201101 baseline 10 1 3 20201102 experiment1 60 6 4 20201102 experiment2 50 5 5 20201102 baseline 10 1
How it works
#New dataframe with baselines only df1=df[df.type.eq('baseline')].drop(columns='type') #Modified original dataframe with baselines added g=pd.merge(df, df1,how='left', on='date', suffixes=('', '_right')) #new column called score_ratio g=g.assign(score_ratio=g.score.div(g.score_right).astype(int)) #drop column called score_right which was acquired during merge g=g.drop(columns=['score_right'])