I have this dataframe: I want to replace the non-First values of the columns with NaN, for each day. This is how should the dataframe look like: This is what i tried: #i'm trying to use replace, but this does not consider the date Answer groupby + rank First create boolean mask with isna, then use groupby + rank with

How to replace all the non-first values of columns with NaN based on date?

I have this dataframe:

I want to replace the non-First values of the columns with NaN, for each day.

This is how should the dataframe look like:

This is what i tried:

import pandas as pd
from datetime import datetime
tbl = {"date" :["2022-02-27", "2022-02-27", "2022-02-27", "2022-02-27","2022-02-27", "2022-02- 
                28", "2022-02-28","2022-02-28", "2022-02-28"],
      "value1" : ["NaN", 0.1, 0.1, "NaN", "NaN", "NaN", "NaN", 0.3, "NaN"],
      "value2" : ["NaN", "NaN", 0.2, 0.3, "NaN", 0.3, 0.4, "NaN", "NaN"]}


df = pd.DataFrame(tbl)
df = df.replace('NaN', float('nan'))
pd.to_datetime(df['date'], format='%Y-%m-%d')

JavaScript
​x
 
import pandas as pd
from datetime import datetime
tbl = {"date" :["2022-02-27", "2022-02-27", "2022-02-27", "2022-02-27","2022-02-27", "2022-02- 
                28", "2022-02-28","2022-02-28", "2022-02-28"],
      "value1" : ["NaN", 0.1, 0.1, "NaN", "NaN", "NaN", "NaN", 0.3, "NaN"],
      "value2" : ["NaN", "NaN", 0.2, 0.3, "NaN", 0.3, 0.4, "NaN", "NaN"]}
​
​
df = pd.DataFrame(tbl)
df = df.replace('NaN', float('nan'))
pd.to_datetime(df['date'], format='%Y-%m-%d')
​

#i’m trying to use replace, but this does not consider the date

DataFrame.replace(to_replace=None, value=NoDefault.no_default, inplace=False, limit=None, 
regex=False, method=NoDefault.no_default)

JavaScript
 
DataFrame.replace(to_replace=None, value=NoDefault.no_default, inplace=False, limit=None, 
regex=False, method=NoDefault.no_default)
​

Answer

`groupby` + `rank`

First create boolean mask with isna, then use groupby + rank with method='first' to assign numerical ranks, finally mask the values in the original dataframe where rank is 1

df = df.set_index('date')
df[df.isna().groupby('date').rank(method='first').eq(1)]

JavaScript
 
df = df.set_index('date')
df[df.isna().groupby('date').rank(method='first').eq(1)]
​

Result

            value1  value2
date                      
2022-02-27     NaN     NaN
2022-02-27     0.1     NaN
2022-02-27     NaN     0.2
2022-02-27     NaN     NaN
2022-02-27     NaN     NaN
2022-02-28     NaN     0.3
2022-02-28     NaN     NaN
2022-02-28     0.3     NaN
2022-02-28     NaN     NaN

JavaScript
 
            value1  value2
date                      
2022-02-27     NaN     NaN
2022-02-27     0.1     NaN
2022-02-27     NaN     0.2
2022-02-27     NaN     NaN
2022-02-27     NaN     NaN
2022-02-28     NaN     0.3
2022-02-28     NaN     NaN
2022-02-28     0.3     NaN
2022-02-28     NaN     NaN
​

Advertisement

Answer

groupby + rank

`groupby` + `rank`