Business Problem: For each row in a Pandas data frame where condition is true, set value in a column. When successive rows meet condition, then increase the value by one. The end goal is to create a column containing integers (e.g., 1, 2, 3, 4, … , n) upon which a pivot table can be made. As a side note, there will be a second index upon which the pivot will be made.
Below is my attempt, but I’m new to using Pandas.
JavaScript
x
11
11
1
sales_data_cleansed_2.loc[sales_data_cleansed_2['Duplicate'] == 'FALSE', 'sales_index'] = 1
2
j = 2
3
4
# loop through whether duplicate exists.
5
for i in range(0, len(sales_data_cleansed_2)):
6
while sales_data_cleansed_2.loc[i,'Duplicate'] == 'TRUE':
7
sales_data_cleansed_2.loc[i,'sales_index'] = j
8
j = j + 1
9
break
10
j = 2
11
Advertisement
Answer
You can try:
JavaScript
1
11
11
1
import pandas as pd
2
3
# sample DataFrame
4
df = pd.DataFrame(np.random.randint(0,2, 15).astype(str), columns=["Duplicate"])
5
df = df.replace({'1': 'TRUE', '0':'FALSE'})
6
7
df['sales_index'] = ((df['Duplicate'] == 'TRUE')
8
.groupby((df['Duplicate'] != 'TRUE')
9
.cumsum()).cumsum() + 1)
10
print(df)
11
This gives:
JavaScript
1
17
17
1
Duplicate sales_index
2
0 FALSE 1
3
1 FALSE 1
4
2 TRUE 2
5
3 TRUE 3
6
4 TRUE 4
7
5 TRUE 5
8
6 TRUE 6
9
7 TRUE 7
10
8 TRUE 8
11
9 FALSE 1
12
10 FALSE 1
13
11 TRUE 2
14
12 TRUE 3
15
13 TRUE 4
16
14 FALSE 1
17