I try to match the offer_id
to the corresponding transaction. This is the dataset:
JavaScript
x
12
12
1
time event offer_id amount
2
2077 0 offer received f19421c1d4aa40978ebb69ca19b0e20d NaN
3
15973 6 offer viewed f19421c1d4aa40978ebb69ca19b0e20d NaN
4
15974 6 transaction NaN 3.43
5
18470 12 transaction NaN 6.01
6
18471 12 offer completed f19421c1d4aa40978ebb69ca19b0e20d NaN
7
43417 108 transaction NaN 11.00
8
44532 114 transaction NaN 1.69
9
50587 150 transaction NaN 3.23
10
55277 168 offer received 9b98b8c7a33c4b65b9aebfe6a799e6d9 NaN
11
96598 258 transaction NaN 2.18
12
The rule is that when the offer is viewed, the transaction belongs to this offer id. If the offer is reveived, but not viewed, the transaction does not belong to the offer id. I hope the time
variable makes it clear. This is the desired result:
JavaScript
1
12
12
1
time event offer_id amount
2
2077 0 offer received f19421c1d4aa40978ebb69ca19b0e20d NaN
3
15973 6 offer viewed f19421c1d4aa40978ebb69ca19b0e20d NaN
4
15974 6 transaction f19421c1d4aa40978ebb69ca19b0e20d 3.43
5
18470 12 transaction f19421c1d4aa40978ebb69ca19b0e20d 6.01
6
18471 12 offer completed f19421c1d4aa40978ebb69ca19b0e20d NaN
7
43417 108 transaction NaN 11.00
8
44532 114 transaction NaN 1.69
9
50587 150 transaction NaN 3.23
10
55277 168 offer received 9b98b8c7a33c4b65b9aebfe6a799e6d9 NaN
11
96598 258 transaction NaN 2.18
12
Advertisement
Answer
Example code:
JavaScript
1
27
27
1
import pandas as pd
2
import numpy as np
3
4
d = {'time': [0, 6, 6, 12, 12, 108, 144, 150, 168, 258],
5
'event': ["offer received", "offer viewed", "transaction", "transaction", "offer completed", "transaction", "transaction", "transaction", "offer received", "transaction"],
6
'offer_id': ["f19421c1d4aa40978ebb69ca19b0e20d", "f19421c1d4aa40978ebb69ca19b0e20d", np.nan, np.nan, "f19421c1d4aa40978ebb69ca19b0e20d", np.nan, np.nan, np.nan, "9b98b8c7a33c4b65b9aebfe6a799e6d9", np.nan]}
7
8
df = pd.DataFrame(d)
9
10
print("Original data:n{}n".format(df))
11
12
is_offer_viewed = False
13
now_offer_id = np.nan
14
for index, row in df.iterrows():
15
if row['event'] == "offer viewed":
16
is_offer_viewed = True
17
now_offer_id = row['offer_id']
18
19
elif row['event'] == "transaction" and is_offer_viewed:
20
df.at[index, 'offer_id'] = now_offer_id
21
22
elif row['event'] == "offer completed":
23
is_offer_viewed = False
24
now_offer_id = np.nan
25
26
print("Processed data:n{}n".format(df))
27
Outputs:
JavaScript
1
26
26
1
Original data:
2
time event offer_id
3
0 0 offer received f19421c1d4aa40978ebb69ca19b0e20d
4
1 6 offer viewed f19421c1d4aa40978ebb69ca19b0e20d
5
2 6 transaction NaN
6
3 12 transaction NaN
7
4 12 offer completed f19421c1d4aa40978ebb69ca19b0e20d
8
5 108 transaction NaN
9
6 144 transaction NaN
10
7 150 transaction NaN
11
8 168 offer received 9b98b8c7a33c4b65b9aebfe6a799e6d9
12
9 258 transaction NaN
13
14
Processed data:
15
time event offer_id
16
0 0 offer received f19421c1d4aa40978ebb69ca19b0e20d
17
1 6 offer viewed f19421c1d4aa40978ebb69ca19b0e20d
18
2 6 transaction f19421c1d4aa40978ebb69ca19b0e20d
19
3 12 transaction f19421c1d4aa40978ebb69ca19b0e20d
20
4 12 offer completed f19421c1d4aa40978ebb69ca19b0e20d
21
5 108 transaction NaN
22
6 144 transaction NaN
23
7 150 transaction NaN
24
8 168 offer received 9b98b8c7a33c4b65b9aebfe6a799e6d9
25
9 258 transaction NaN
26