Skip to content
Advertisement

How am I able to replace duplicates in a dataframe column in python?

say my column is something like this:

trade_signal
buy
buy
buy
buy
sell
sell
sell
sell
buy
buy
buy
sell
sell
buy
sell
buy

I would like to drop the duplicate elements in the column and replace them with NAN or 0 so it would end up with something like:

trade_signal
buy
nan
nan
nan
sell
nan
nan
nan
buy
nan
nan
sell
nan
buy
sell
buy

I am completely unsure of the logic I can use to do this, I think I would forward fill up until the next change in signal with NAN values somehow?

Advertisement

Answer

Try mask with shift:

df['trade_signal'] = df['trade_signal'].mask(df['trade_signal'].eq(
                                            df['trade_signal'].shift())
                                             )

  trade_signal
0           buy
1           NaN
2           NaN
3           NaN
4          sell
5           NaN
6           NaN
7           NaN
8           buy
9           NaN
10          NaN
11         sell
12          NaN
13          buy
14         sell
15          buy
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement