I have this dataframe:
I want to replace the non-First values of the columns with NaN, for each day.
This is how should the dataframe look like:
This is what i tried:
JavaScript
x
12
12
1
import pandas as pd
2
from datetime import datetime
3
tbl = {"date" :["2022-02-27", "2022-02-27", "2022-02-27", "2022-02-27","2022-02-27", "2022-02-
4
28", "2022-02-28","2022-02-28", "2022-02-28"],
5
"value1" : ["NaN", 0.1, 0.1, "NaN", "NaN", "NaN", "NaN", 0.3, "NaN"],
6
"value2" : ["NaN", "NaN", 0.2, 0.3, "NaN", 0.3, 0.4, "NaN", "NaN"]}
7
8
9
df = pd.DataFrame(tbl)
10
df = df.replace('NaN', float('nan'))
11
pd.to_datetime(df['date'], format='%Y-%m-%d')
12
#i’m trying to use replace, but this does not consider the date
JavaScript
1
3
1
DataFrame.replace(to_replace=None, value=NoDefault.no_default, inplace=False, limit=None,
2
regex=False, method=NoDefault.no_default)
3
Advertisement
Answer
groupby
+ rank
First create boolean mask with isna
, then use groupby
+ rank
with method='first'
to assign numerical ranks, finally mask the values in the original dataframe where rank is 1
JavaScript
1
3
1
df = df.set_index('date')
2
df[df.isna().groupby('date').rank(method='first').eq(1)]
3
Result
JavaScript
1
12
12
1
value1 value2
2
date
3
2022-02-27 NaN NaN
4
2022-02-27 0.1 NaN
5
2022-02-27 NaN 0.2
6
2022-02-27 NaN NaN
7
2022-02-27 NaN NaN
8
2022-02-28 NaN 0.3
9
2022-02-28 NaN NaN
10
2022-02-28 0.3 NaN
11
2022-02-28 NaN NaN
12