I have a dataset with supplier and its invoices and I need to determinate which of the invoices are consecutives marking it with a 1 or a 0.
For example:
JavaScript
x
4
1
df1 = pd.DataFrame()
2
df1['Supplier'] = ['ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC']
3
df1['Invoice'] = ['101','102','105','109','110','114','120','121','122','130','131']
4
JavaScript
1
13
13
1
Supplier Invoice
2
0 ABC 101
3
1 ABC 102
4
2 ABC 105
5
3 ABC 109
6
4 ABC 110
7
5 ABC 114
8
6 ABC 120
9
7 ABC 121
10
8 ABC 122
11
9 ABC 130
12
10 ABC 131
13
And what I want is a third column like this:
JavaScript
1
13
13
1
Supplier Invoice Consecutive
2
0 ABC 101 0
3
1 ABC 102 1
4
2 ABC 105 0
5
3 ABC 109 0
6
4 ABC 110 1
7
5 ABC 114 0
8
6 ABC 120 0
9
7 ABC 121 1
10
8 ABC 122 1
11
9 ABC 130 0
12
10 ABC 131 1
13
EDIT
Thanks for your answers, this options works great, but when I tried it in a real database I realized that I need to compare the suppliers, if the next row is a new supplier the consecutive must be 0.
For example:
JavaScript
1
16
16
1
Supplier Invoice Consecutive
2
0 ABC 101 0
3
1 ABC 102 1
4
2 ABC 105 0
5
3 ABC 109 0
6
4 ABC 110 1
7
5 ABC 114 0
8
6 ABC 120 0
9
7 ABC 121 1
10
8 ABC 122 1
11
9 ABC 130 0
12
10 ABC 131 1
13
11 DEF 132 0
14
12 DEF 133 1
15
13 GHI 134 0
16
Thanks in advance!
Advertisement
Answer
using npwhere and diff
JavaScript
1
9
1
# take difference b/w consecutive rows using diff
2
# update to 0 when difference is not 1
3
4
df1['Consecutive']=(df1.groupby('Supplier')['Invoice']
5
.transform(lambda x: np.where(
6
x.astype(int).diff() !=1, 0, 1)) )
7
df1
8
df1
9
JavaScript
1
16
16
1
Supplier Invoice Consecutive
2
0 ABC 101 0
3
1 ABC 102 1
4
2 ABC 105 0
5
3 ABC 109 0
6
4 ABC 110 1
7
5 ABC 114 0
8
6 ABC 120 0
9
7 ABC 121 1
10
8 ABC 122 1
11
9 ABC 130 0
12
10 ABC 131 1
13
11 DEF 132 0
14
12 DEF 133 1
15
13 GHI 134 0
16