I have this dataframe:
I want to replace the non-First values of the columns with NaN, for each day.
This is how should the dataframe look like:
This is what i tried:
import pandas as pd from datetime import datetime tbl = {"date" :["2022-02-27", "2022-02-27", "2022-02-27", "2022-02-27","2022-02-27", "2022-02- 28", "2022-02-28","2022-02-28", "2022-02-28"], "value1" : ["NaN", 0.1, 0.1, "NaN", "NaN", "NaN", "NaN", 0.3, "NaN"], "value2" : ["NaN", "NaN", 0.2, 0.3, "NaN", 0.3, 0.4, "NaN", "NaN"]} df = pd.DataFrame(tbl) df = df.replace('NaN', float('nan')) pd.to_datetime(df['date'], format='%Y-%m-%d')
#i’m trying to use replace, but this does not consider the date
DataFrame.replace(to_replace=None, value=NoDefault.no_default, inplace=False, limit=None, regex=False, method=NoDefault.no_default)
Advertisement
Answer
groupby
+ rank
First create boolean mask with isna
, then use groupby
+ rank
with method='first'
to assign numerical ranks, finally mask the values in the original dataframe where rank is 1
df = df.set_index('date') df[df.isna().groupby('date').rank(method='first').eq(1)]
Result
value1 value2 date 2022-02-27 NaN NaN 2022-02-27 0.1 NaN 2022-02-27 NaN 0.2 2022-02-27 NaN NaN 2022-02-27 NaN NaN 2022-02-28 NaN 0.3 2022-02-28 NaN NaN 2022-02-28 0.3 NaN 2022-02-28 NaN NaN