Skip to content
Advertisement

What is best way to loop through Pandas dataframe employing a sequentially counted value in each row where condition is true?

Business Problem: For each row in a Pandas data frame where condition is true, set value in a column. When successive rows meet condition, then increase the value by one. The end goal is to create a column containing integers (e.g., 1, 2, 3, 4, … , n) upon which a pivot table can be made. As a side note, there will be a second index upon which the pivot will be made.

Below is my attempt, but I’m new to using Pandas.

sales_data_cleansed_2.loc[sales_data_cleansed_2['Duplicate'] == 'FALSE', 'sales_index'] = 1 
j = 2

    # loop through whether duplicate exists.
for i in range(0, len(sales_data_cleansed_2)):
    while sales_data_cleansed_2.loc[i,'Duplicate'] == 'TRUE':
        sales_data_cleansed_2.loc[i,'sales_index'] = j 
        j = j + 1
        break
    j = 2

Advertisement

Answer

You can try:

import pandas as pd

# sample DataFrame
df = pd.DataFrame(np.random.randint(0,2, 15).astype(str), columns=["Duplicate"])
df = df.replace({'1': 'TRUE', '0':'FALSE'})

df['sales_index'] = ((df['Duplicate'] == 'TRUE')
             .groupby((df['Duplicate'] != 'TRUE')
             .cumsum()).cumsum() + 1)
print(df)

This gives:

   Duplicate  sales_index
0      FALSE            1
1      FALSE            1
2       TRUE            2
3       TRUE            3
4       TRUE            4
5       TRUE            5
6       TRUE            6
7       TRUE            7
8       TRUE            8
9      FALSE            1
10     FALSE            1
11      TRUE            2
12      TRUE            3
13      TRUE            4
14     FALSE            1
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement