I have a DataFrame that shows the behavior of a machine. This machine can be in two states: Production or cleaning. Hence, I have a dummy variable called “Production”, that shows 1 when the machine is producing and 0 when it is not. I would like to know the production cycles (how many hours does the machine stay producing until it stops, and how much time it stops until it starts the whole process again). Therefore, I would like to create a column that counts how much time (how many rows) the machine is under each state, but it should reset itself when the other category appears again.
Example:
production production_cycle 1 5 1 5 1 5 1 5 1 5 0 2 0 2 1 1 0 3 0 3 0 3
Advertisement
Answer
You can first detect the turning points by looking at the points where it diff
ers from the previous one. Then cumulative sum of this gives the needed groupings. We transform
this with count
to get the size of each group:
>>> grouper = df.production.diff().ne(0).cumsum() >>> df["production_cycle"] = df.groupby(grouper).transform("count") >>> df production production_cycle 0 1 5 1 1 5 2 1 5 3 1 5 4 1 5 5 0 2 6 0 2 7 1 1 8 0 3 9 0 3 10 0 3
the grouper
is
>>> grouper 0 1 1 1 2 1 3 1 4 1 5 2 6 2 7 3 8 4 9 4 10 4