What is best way to loop through Pandas dataframe employing a sequentially counted value in each row where condition is true?

Question

Business Problem: For each row in a Pandas data frame where condition is true, set value in a column. When successive rows meet condition, then increase the value by one. The end goal is to create a column containing integers (e.g., 1, 2, 3, 4, &#8230; , n) upon which a pivot table can be made. As a side note…

Accepted Answer

You can try:import pandas as pd# sample DataFramedf = pd.DataFrame(np.random.randint(0,2, 15).astype(str), columns=["Duplicate"])df = df.replace({'1': 'TRUE', '0':'FALSE'})df['sales_index'] = ((df['Duplicate'] == 'TRUE')             .groupby((df['Duplicate'] != 'TRUE')             .cumsum()).cumsum() + 1)print(df)This gives:   Duplicate  sales_index0      FALSE            11      FALSE            12       TRUE            23       TRUE            34       TRUE            45       TRUE            56       TRUE            67       TRUE            78       TRUE            89      FALSE            110     FALSE            111      TRUE            212      TRUE            313      TRUE            414     FALSE            1

Advertisement

Answer