Dataset is something like this (there will be duplicate rows in the original):
Code:
JavaScript
x
43
43
1
import pandas as pd
2
3
df_in = pd.DataFrame({'email_ID': {0: 'sachinlaltaprayoohoo',
4
1: 'sachinlaltaprayoohoo',
5
2: 'sachinlaltaprayoohoo',
6
3: 'sachinlaltaprayoohoo',
7
4: 'sachinlaltaprayoohoo',
8
5: 'sachinlaltaprayoohoo',
9
6: 'sheldon.yokoohoo',
10
7: 'sheldon.yokoohoo',
11
8: 'sheldon.yokoohoo',
12
9: 'sheldon.yokoohoo',
13
10: 'sheldon.yokoohoo',
14
11: 'sheldon.yokoohoo'},
15
'time_stamp': {0: '2021-09-10 09:01:56.340259',
16
1: '2021-09-10 09:01:56.672814',
17
2: '2021-09-10 09:01:57.471423',
18
3: '2021-09-10 09:01:57.480891',
19
4: '2021-09-10 09:01:57.484644',
20
5: '2021-09-10 09:01:57.984644',
21
6: '2021-09-10 09:01:56.340259',
22
7: '2021-09-10 09:01:56.672814',
23
8: '2021-09-10 09:01:57.471423',
24
9: '2021-09-10 09:01:57.480891',
25
10: '2021-09-10 09:01:57.484644',
26
11: '2021-09-10 09:01:57.984644'},
27
'screen': {0: 'rewardapp.SplashActivity',
28
1: 'i1',
29
2: 'rewardapp.Signup_in',
30
3: 'rewardapp.PaymentFinalConfirmationActivity',
31
4: 'rewardapp.Signup_in',
32
5: 'i1',
33
6: 'rewardapp.SplashActivity',
34
7: 'i1',
35
8: 'rewardapp.Signup_in',
36
9: 'i1',
37
10: 'rewardapp.Signup_in',
38
11: 'rewardapp.PaymentFinalConfirmationActivity'}})
39
40
df_in['time_stamp'] = df_in['time_stamp'].astype('datetime64[ns]')
41
42
df_in
43
Output should be this:
Code:
JavaScript
1
67
67
1
import pandas as pd
2
3
df_out = pd.DataFrame({'email_ID': {0: 'sachinlaltaprayoohoo',
4
1: 'sachinlaltaprayoohoo',
5
2: 'sachinlaltaprayoohoo',
6
3: 'sachinlaltaprayoohoo',
7
4: 'sachinlaltaprayoohoo',
8
5: 'sachinlaltaprayoohoo',
9
6: 'sheldon.yokoohoo',
10
7: 'sheldon.yokoohoo',
11
8: 'sheldon.yokoohoo',
12
9: 'sheldon.yokoohoo',
13
10: 'sheldon.yokoohoo',
14
11: 'sheldon.yokoohoo'},
15
'time_stamp': {0: '2021-09-10 09:01:56.340259',
16
1: '2021-09-10 09:01:56.672814',
17
2: '2021-09-10 09:01:57.471423',
18
3: '2021-09-10 09:01:57.480891',
19
4: '2021-09-10 09:01:57.484644',
20
5: '2021-09-10 09:01:57.984644',
21
6: '2021-09-10 09:01:56.340259',
22
7: '2021-09-10 09:01:56.672814',
23
8: '2021-09-10 09:01:57.471423',
24
9: '2021-09-10 09:01:57.480891',
25
10: '2021-09-10 09:01:57.484644',
26
11: '2021-09-10 09:01:57.984644'},
27
'screen': {0: 'rewardapp.SplashActivity',
28
1: 'i1',
29
2: 'rewardapp.Signup_in',
30
3: 'rewardapp.PaymentFinalConfirmationActivity',
31
4: 'rewardapp.Signup_in',
32
5: 'i1',
33
6: 'rewardapp.SplashActivity',
34
7: 'i1',
35
8: 'rewardapp.Signup_in',
36
9: 'i1',
37
10: 'rewardapp.Signup_in',
38
11: 'rewardapp.PaymentFinalConfirmationActivity'},
39
'series1': {0: 0,
40
1: 1,
41
2: 2,
42
3: 3,
43
4: 0,
44
5: 1,
45
6: 0,
46
7: 1,
47
8: 2,
48
9: 3,
49
10: 4,
50
11: 5},
51
'series2': {0: 0,
52
1: 0,
53
2: 0,
54
3: 0,
55
4: 1,
56
5: 1,
57
6: 2,
58
7: 2,
59
8: 2,
60
9: 2,
61
10: 2,
62
11: 2}})
63
64
df_out['time_stamp'] = df['time_stamp'].astype('datetime64[ns]')
65
66
df_out
67
‘series1’ column values starts row by row as 0, 1, 2, and so on but resets to 0 when:
- ’email_ID’ column value changes.
- ‘screen’ column value == ‘rewardapp.PaymentFinalConfirmationActivity’
‘series2’ column values starts with 0 and increments by 1 whenever ‘series1’ resets.
My progress:
JavaScript
1
41
41
1
series1 = [0]
2
3
x = 0
4
5
for index in df[1:].index:
6
7
if ((df._get_value(index - 1, 'email_ID')) == df._get_value(index, 'email_ID')) and (df._get_value(index - 1, 'screen') != 'rewardapp.PaymentFinalConfirmationActivity'):
8
9
x += 1
10
11
series1.append(x)
12
13
else:
14
x = 0
15
16
series1.append(x)
17
18
19
df['series1'] = series1
20
df
21
22
series2 = [0]
23
24
x = 0
25
26
for index in df[1:].index:
27
28
if df._get_value(index, 'series1') - df._get_value(index - 1, 'series1') == 1:
29
30
series2.append(x)
31
32
else:
33
34
x += 1
35
36
series2.append(x)
37
38
39
df['series2'] = series2
40
df
41
I think the code above is working, I’ll test answered codes and select the best in a few hours, thank you.
Advertisement
Answer
Let’s try
JavaScript
1
6
1
m = (df_in['email_ID'].ne(df_in['email_ID'].shift().bfill()) |
2
df_in['screen'].shift().eq('rewardapp.PaymentFinalConfirmationActivity'))
3
4
df_in['series1'] = df_in.groupby(m.cumsum()).cumcount()
5
df_in['series2'] = m.cumsum()
6
JavaScript
1
16
16
1
print(df_in)
2
3
email_ID time_stamp screen series1 series2
4
0 sachinlaltaprayoohoo 2021-09-10 09:01:56.340259 rewardapp.SplashActivity 0 0
5
1 sachinlaltaprayoohoo 2021-09-10 09:01:56.672814 i1 1 0
6
2 sachinlaltaprayoohoo 2021-09-10 09:01:57.471423 rewardapp.Signup_in 2 0
7
3 sachinlaltaprayoohoo 2021-09-10 09:01:57.480891 rewardapp.PaymentFinalConfirmationActivity 3 0
8
4 sachinlaltaprayoohoo 2021-09-10 09:01:57.484644 rewardapp.Signup_in 0 1
9
5 sachinlaltaprayoohoo 2021-09-10 09:01:57.984644 i1 1 1
10
6 sheldon.yokoohoo 2021-09-10 09:01:56.340259 rewardapp.SplashActivity 0 2
11
7 sheldon.yokoohoo 2021-09-10 09:01:56.672814 i1 1 2
12
8 sheldon.yokoohoo 2021-09-10 09:01:57.471423 rewardapp.Signup_in 2 2
13
9 sheldon.yokoohoo 2021-09-10 09:01:57.480891 i1 3 2
14
10 sheldon.yokoohoo 2021-09-10 09:01:57.484644 rewardapp.Signup_in 4 2
15
11 sheldon.yokoohoo 2021-09-10 09:01:57.984644 rewardapp.PaymentFinalConfirmationActivity 5 2
16