Dataset is something like this (there will be duplicate rows in the original):
Code:
import pandas as pd
df_in = pd.DataFrame({'email_ID': {0: 'sachinlaltaprayoohoo',
  1: 'sachinlaltaprayoohoo',
  2: 'sachinlaltaprayoohoo',
  3: 'sachinlaltaprayoohoo',
  4: 'sachinlaltaprayoohoo',
  5: 'sachinlaltaprayoohoo',
  6: 'sheldon.yokoohoo',
  7: 'sheldon.yokoohoo',
  8: 'sheldon.yokoohoo',
  9: 'sheldon.yokoohoo',
  10: 'sheldon.yokoohoo',
  11: 'sheldon.yokoohoo'},
 'time_stamp': {0: '2021-09-10 09:01:56.340259',
  1: '2021-09-10 09:01:56.672814',
  2: '2021-09-10 09:01:57.471423',
  3: '2021-09-10 09:01:57.480891',
  4: '2021-09-10 09:01:57.484644',
  5: '2021-09-10 09:01:57.984644',
  6: '2021-09-10 09:01:56.340259',
  7: '2021-09-10 09:01:56.672814',
  8: '2021-09-10 09:01:57.471423',
  9: '2021-09-10 09:01:57.480891',
  10: '2021-09-10 09:01:57.484644',
  11: '2021-09-10 09:01:57.984644'},
 'screen': {0: 'rewardapp.SplashActivity',
  1: 'i1',
  2: 'rewardapp.Signup_in',
  3: 'rewardapp.PaymentFinalConfirmationActivity',
  4: 'rewardapp.Signup_in',
  5: 'i1',
  6: 'rewardapp.SplashActivity',
  7: 'i1',
  8: 'rewardapp.Signup_in',
  9: 'i1',
  10: 'rewardapp.Signup_in',
  11: 'rewardapp.PaymentFinalConfirmationActivity'}})
df_in['time_stamp'] = df_in['time_stamp'].astype('datetime64[ns]')
df_in
Output should be this:
Code:
import pandas as pd
df_out = pd.DataFrame({'email_ID': {0: 'sachinlaltaprayoohoo',
  1: 'sachinlaltaprayoohoo',
  2: 'sachinlaltaprayoohoo',
  3: 'sachinlaltaprayoohoo',
  4: 'sachinlaltaprayoohoo',
  5: 'sachinlaltaprayoohoo',
  6: 'sheldon.yokoohoo',
  7: 'sheldon.yokoohoo',
  8: 'sheldon.yokoohoo',
  9: 'sheldon.yokoohoo',
  10: 'sheldon.yokoohoo',
  11: 'sheldon.yokoohoo'},
 'time_stamp': {0: '2021-09-10 09:01:56.340259',
  1: '2021-09-10 09:01:56.672814',
  2: '2021-09-10 09:01:57.471423',
  3: '2021-09-10 09:01:57.480891',
  4: '2021-09-10 09:01:57.484644',
  5: '2021-09-10 09:01:57.984644',
  6: '2021-09-10 09:01:56.340259',
  7: '2021-09-10 09:01:56.672814',
  8: '2021-09-10 09:01:57.471423',
  9: '2021-09-10 09:01:57.480891',
  10: '2021-09-10 09:01:57.484644',
  11: '2021-09-10 09:01:57.984644'},
 'screen': {0: 'rewardapp.SplashActivity',
  1: 'i1',
  2: 'rewardapp.Signup_in',
  3: 'rewardapp.PaymentFinalConfirmationActivity',
  4: 'rewardapp.Signup_in',
  5: 'i1',
  6: 'rewardapp.SplashActivity',
  7: 'i1',
  8: 'rewardapp.Signup_in',
  9: 'i1',
  10: 'rewardapp.Signup_in',
  11: 'rewardapp.PaymentFinalConfirmationActivity'},
 'series1': {0: 0,
  1: 1,
  2: 2,
  3: 3,
  4: 0,
  5: 1,
  6: 0,
  7: 1,
  8: 2,
  9: 3,
  10: 4,
  11: 5},
 'series2': {0: 0,
  1: 0,
  2: 0,
  3: 0,
  4: 1,
  5: 1,
  6: 2,
  7: 2,
  8: 2,
  9: 2,
  10: 2,
  11: 2}})
df_out['time_stamp'] = df['time_stamp'].astype('datetime64[ns]')
df_out
‘series1’ column values starts row by row as 0, 1, 2, and so on but resets to 0 when:
- ’email_ID’ column value changes.
- ‘screen’ column value == ‘rewardapp.PaymentFinalConfirmationActivity’
‘series2’ column values starts with 0 and increments by 1 whenever ‘series1’ resets.
My progress:
series1 = [0]
x = 0
for index in df[1:].index:
  if ((df._get_value(index - 1, 'email_ID')) == df._get_value(index, 'email_ID')) and (df._get_value(index - 1, 'screen') != 'rewardapp.PaymentFinalConfirmationActivity'):
    x += 1
    series1.append(x)
  
  else:
    x = 0
    series1.append(x)
df['series1'] = series1
df
series2 = [0]
x = 0
for index in df[1:].index:
  if df._get_value(index, 'series1') - df._get_value(index - 1, 'series1') == 1:
    series2.append(x)
  
  else:
    
    x += 1
    series2.append(x)
df['series2'] = series2
df
I think the code above is working, I’ll test answered codes and select the best in a few hours, thank you.
Advertisement
Answer
Let’s try
m = (df_in['email_ID'].ne(df_in['email_ID'].shift().bfill()) |
     df_in['screen'].shift().eq('rewardapp.PaymentFinalConfirmationActivity'))
df_in['series1'] = df_in.groupby(m.cumsum()).cumcount()
df_in['series2'] = m.cumsum()
print(df_in)
                email_ID                 time_stamp                                      screen  series1  series2
0   sachinlaltaprayoohoo 2021-09-10 09:01:56.340259                    rewardapp.SplashActivity        0        0
1   sachinlaltaprayoohoo 2021-09-10 09:01:56.672814                                          i1        1        0
2   sachinlaltaprayoohoo 2021-09-10 09:01:57.471423                         rewardapp.Signup_in        2        0
3   sachinlaltaprayoohoo 2021-09-10 09:01:57.480891  rewardapp.PaymentFinalConfirmationActivity        3        0
4   sachinlaltaprayoohoo 2021-09-10 09:01:57.484644                         rewardapp.Signup_in        0        1
5   sachinlaltaprayoohoo 2021-09-10 09:01:57.984644                                          i1        1        1
6       sheldon.yokoohoo 2021-09-10 09:01:56.340259                    rewardapp.SplashActivity        0        2
7       sheldon.yokoohoo 2021-09-10 09:01:56.672814                                          i1        1        2
8       sheldon.yokoohoo 2021-09-10 09:01:57.471423                         rewardapp.Signup_in        2        2
9       sheldon.yokoohoo 2021-09-10 09:01:57.480891                                          i1        3        2
10      sheldon.yokoohoo 2021-09-10 09:01:57.484644                         rewardapp.Signup_in        4        2
11      sheldon.yokoohoo 2021-09-10 09:01:57.984644  rewardapp.PaymentFinalConfirmationActivity        5        2

