Skip to content
Advertisement

Pandas: Conditionally replace values based on other columns values

I have a dataframe (df) that looks like this:

                    environment     event   
time                    
2017-04-28 13:08:22     NaN         add_rd  
2017-04-28 08:58:40     NaN         add_rd  
2017-05-03 07:59:35     test        add_env
2017-05-03 08:05:14     prod        add_env
...

Now my goal is for each add_rd in the event column, the associated NaN-value in the environment column should be replaced with a string RD.

                    environment     event   
time                    
2017-04-28 13:08:22     RD          add_rd  
2017-04-28 08:58:40     RD          add_rd  
2017-05-03 07:59:35     test        add_env
2017-05-03 08:05:14     prod        add_env
...

What I did so far

I stumbled across df['environment'] = df['environment].fillna('RD') which replaces every NaN (which is not what I am looking for), pd.isnull(df['environment']) which is detecting missing values and np.where(df['environment'], x,y) which seems to be what I want but isn’t working. Furthermore did I try this:

import pandas as pd

for env in df['environment']:
    if pd.isnull(env) and df['event'] == 'add_rd':
        env = 'RD'

The indexes are missing or some kind of iterator to access the equivalent value in the event column.
And I tried this:

df['environment'] = np.where(pd.isnull(df['environment']), df['environment'] = 'RD', df['environment'])

SyntaxError: keyword can't be an expression

which obviously didn’t worked.

I took a look at several questions but couldn’t build on the suggestions in the answers. Black’s question Simon’s question szli’s question Jan Willems Tulp’s question

So, how do I replace a value in a column based on another columns values?

Advertisement

Answer

Now my goal is for each add_rd in the event column, the associated NaN-value in the environment column should be replaced with a string RD.

As per @Zero’s comment, use pd.DataFrame.loc and Boolean indexing:

df.loc[df['event'].eq('add_rd') & df['environment'].isnull(), 'environment'] = 'RD'
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement