Business Problem: For each row in a Pandas data frame where condition is true, set value in a column. When successive rows meet condition, then increase the value by one. The end goal is to create a column containing integers (e.g., 1, 2, 3, 4, … , n) upon which a pivot table can be made. As a side note, there will be a second index upon which the pivot will be made.
Below is my attempt, but I’m new to using Pandas.
sales_data_cleansed_2.loc[sales_data_cleansed_2['Duplicate'] == 'FALSE', 'sales_index'] = 1 j = 2 # loop through whether duplicate exists. for i in range(0, len(sales_data_cleansed_2)): while sales_data_cleansed_2.loc[i,'Duplicate'] == 'TRUE': sales_data_cleansed_2.loc[i,'sales_index'] = j j = j + 1 break j = 2
Advertisement
Answer
You can try:
import pandas as pd # sample DataFrame df = pd.DataFrame(np.random.randint(0,2, 15).astype(str), columns=["Duplicate"]) df = df.replace({'1': 'TRUE', '0':'FALSE'}) df['sales_index'] = ((df['Duplicate'] == 'TRUE') .groupby((df['Duplicate'] != 'TRUE') .cumsum()).cumsum() + 1) print(df)
This gives:
Duplicate sales_index 0 FALSE 1 1 FALSE 1 2 TRUE 2 3 TRUE 3 4 TRUE 4 5 TRUE 5 6 TRUE 6 7 TRUE 7 8 TRUE 8 9 FALSE 1 10 FALSE 1 11 TRUE 2 12 TRUE 3 13 TRUE 4 14 FALSE 1