Title may be confusing: I have a dataframe that displays user_id sign in’s during the week. My goal is to display the de-duped ID along with the de-duped dates per employee, in order to get a count of # days the user uniquely signed in for the week. So I’ve been trying to enforce a rule to make sure I’m only getting the distinct sign ins per day, per employee.
JavaScript
x
14
14
1
print(tempdf)
2
ID date
3
10301 1/4/2021
4
10301 1/4/2021
5
10301 1/5/2021
6
10301 1/6/2021
7
10301 1/7/2021
8
10301 1/8/2021
9
10302 1/4/2021
10
10302 1/5/2021
11
10302 1/6/2021
12
10302 1/7/2021
13
10302 1/8/2021
14
The only thing I can think of is trying to pass some form of .nunique(), but this instance would apply that count to my entire dataframe, as well as filter out all of my ID’s before I can de-dupe them.
JavaScript
1
5
1
tempdf1=tempdf.groupby(['date'], as_index = False).nunique()
2
tempdf1['# days signed in for week'] = tempdf1.groupby('ID')['ID'].transform('count')
3
df=tempdf1.drop_duplicates(['ID']).copy()
4
5
This is the result I’m going for:
JavaScript
1
5
1
print(df)
2
ID date # days signed in for week
3
10301 1/4/2021 5
4
10302 1/4/2021 5
5
Advertisement
Answer
- calculate start of week
- then it’s a simple use of
count()
JavaScript
1
18
18
1
df = pd.read_csv(io.StringIO("""ID date # days signed in for week
2
10301 1/4/2021 6
3
10301 1/4/2021 6
4
10301 1/5/2021 6
5
10301 1/6/2021 6
6
10301 1/7/2021 6
7
10301 1/8/2021 6
8
10302 1/4/2021 5
9
10302 1/5/2021 5
10
10302 1/6/2021 5
11
10302 1/7/2021 5
12
10302 1/8/2021 5"""), sep="ss+", engine="python")
13
14
df.date = pd.to_datetime(df.date)
15
df["weekStart"] = df['date'] - pd.to_timedelta(df['date'].dt.dayofweek, unit='d')
16
df.groupby(["ID","weekStart"])["date"].count().reset_index()
17
.rename(columns={"weekStart":"date","date":"# days signed in for week"})
18