Skip to content
Advertisement

pandas how to iteratively count instances of a category by row and reset them when the other category appears?

I have a DataFrame that shows the behavior of a machine. This machine can be in two states: Production or cleaning. Hence, I have a dummy variable called “Production”, that shows 1 when the machine is producing and 0 when it is not. I would like to know the production cycles (how many hours does the machine stay producing until it stops, and how much time it stops until it starts the whole process again). Therefore, I would like to create a column that counts how much time (how many rows) the machine is under each state, but it should reset itself when the other category appears again.

Example:

production production_cycle
1          5
1          5
1          5
1          5
1          5
0          2
0          2
1          1
0          3
0          3
0          3

Advertisement

Answer

You can first detect the turning points by looking at the points where it differs from the previous one. Then cumulative sum of this gives the needed groupings. We transform this with count to get the size of each group:

>>> grouper = df.production.diff().ne(0).cumsum()
>>> df["production_cycle"] = df.groupby(grouper).transform("count")
>>> df

    production  production_cycle
0            1                 5
1            1                 5
2            1                 5
3            1                 5
4            1                 5
5            0                 2
6            0                 2
7            1                 1
8            0                 3
9            0                 3
10           0                 3

the grouper is

>>> grouper

0     1
1     1
2     1
3     1
4     1
5     2
6     2
7     3
8     4
9     4
10    4 
Advertisement