I have a dataset with supplier and its invoices and I need to determinate which of the invoices are consecutives marking it with a 1 or a 0.
For example:
df1 = pd.DataFrame() df1['Supplier'] = ['ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC'] df1['Invoice'] = ['101','102','105','109','110','114','120','121','122','130','131']
Supplier Invoice 0 ABC 101 1 ABC 102 2 ABC 105 3 ABC 109 4 ABC 110 5 ABC 114 6 ABC 120 7 ABC 121 8 ABC 122 9 ABC 130 10 ABC 131
And what I want is a third column like this:
Supplier Invoice Consecutive 0 ABC 101 0 1 ABC 102 1 2 ABC 105 0 3 ABC 109 0 4 ABC 110 1 5 ABC 114 0 6 ABC 120 0 7 ABC 121 1 8 ABC 122 1 9 ABC 130 0 10 ABC 131 1
EDIT
Thanks for your answers, this options works great, but when I tried it in a real database I realized that I need to compare the suppliers, if the next row is a new supplier the consecutive must be 0.
For example:
Supplier Invoice Consecutive 0 ABC 101 0 1 ABC 102 1 2 ABC 105 0 3 ABC 109 0 4 ABC 110 1 5 ABC 114 0 6 ABC 120 0 7 ABC 121 1 8 ABC 122 1 9 ABC 130 0 10 ABC 131 1 11 DEF 132 0 12 DEF 133 1 13 GHI 134 0
Thanks in advance!
Advertisement
Answer
using npwhere and diff
# take difference b/w consecutive rows using diff # update to 0 when difference is not 1 df1['Consecutive']=(df1.groupby('Supplier')['Invoice'] .transform(lambda x: np.where( x.astype(int).diff() !=1, 0, 1)) ) df1 df1
Supplier Invoice Consecutive 0 ABC 101 0 1 ABC 102 1 2 ABC 105 0 3 ABC 109 0 4 ABC 110 1 5 ABC 114 0 6 ABC 120 0 7 ABC 121 1 8 ABC 122 1 9 ABC 130 0 10 ABC 131 1 11 DEF 132 0 12 DEF 133 1 13 GHI 134 0