Skip to content
Advertisement

Divide dataframe column by a specific cell

I want to divide a dataframe column by a specific cell in the same dataframe.

I have a dataframe like this:

 date      type          score   
 20201101  experiment1   30      
 20201101  experiment2   20      
 20201101  baseline      10      
 20201102  experiment1   60      
 20201102  experiment2   50      
 20201102  baseline      10      

I want to compute the score_ratio by dividing the score by the ‘baseline’ score of that date.

 date      type          score   score_ratio
 20201101  experiment1   30      3
 20201101  experiment2   20      2
 20201101  baseline      10      1
 20201102  experiment1   60      6
 20201102  experiment2   50      5
 20201102  baseline      10      1

The score_ratio for (date, type) = (20201101, experiment1) should be obtained by dividing its score by the score of (20201101, baseline). In this case, it should be 30 / 10 = 3. Similarly. for (20201101, experiment2), we should divide the score by the same thing, (20201101, baseline). For a different date, say (20201102, experiment1), it should be divided by the baseline of that date, (20201102, baseline).

How do I add this column with dataframe operations?

So far, I have this but am unsure of what expression I should be dividing by: df['score_ratio'] = df['score'].div(...)

Edit:

I get the error for the last line ValueError: Length of values does not match length of index

     ID    date        type          room    score         
0  id1     20201120    baseline      1     450.25    
0  id2     20201120    experiment1   1     -3637.24   
0  id3     20201121    baseline      1     200.00   
1  id4     20201121    experiment1   1     300.00   
2  id5     20201120    baseline      2     600.00   
3  id6     20201120    experiment1   2     800.00   


    _df = df.set_index('date', 'room')
    d = _df.query('type=="baseline"')
    print(_df['score'].div(d['score']).values)
    df['score_ratio'] = _df['score'].div(d['score']).values
            

Advertisement

Answer

 #Mask all whose type is baseline into a new datframe and merge to the main df
 g=pd.merge(df, df[df.type.eq('baseline')].drop(columns='type'),how='left', on='date', suffixes=('', '_right'))

 #Calculate the score_ratio and drop the extra column acquired during merge
    df=g.assign(score_ratio=g.score.div(g.score_right).astype(int)).drop(columns=['score_right'])

print(df)

     date         type  score    score_ratio
0  20201101  experiment1     30            3
1  20201101  experiment2     20            2
2  20201101     baseline     10            1
3  20201102  experiment1     60            6
4  20201102  experiment2     50            5
5  20201102     baseline     10            1

How it works

#New dataframe with baselines only
df1=df[df.type.eq('baseline')].drop(columns='type')

#Modified original dataframe with baselines added
g=pd.merge(df, df1,how='left', on='date', suffixes=('', '_right'))

#new column called score_ratio
g=g.assign(score_ratio=g.score.div(g.score_right).astype(int))

#drop column called score_right which was acquired during merge
g=g.drop(columns=['score_right'])
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement