Skip to content
Advertisement

what is the best way to create running total columns in pandas

What is the most pandastic way to create running total columns at various levels (without iterating over the rows)?

input:

JavaScript

output:

JavaScript

The test column can only contain X’s or NaNs. The number of consecutive X’s is random.

In the ‘desired_output_level_1’ column, trying to count up the number of series of X’s.

In the ‘desired_output_level_2’ column, trying to find the duration of each series.

Can anyone help? Thanks in advance.

Advertisement

Answer

Perhaps not the most pandastic way, but seems to yield what you are after.

Three key points:

  1. we are operating on only rows that are not NaN, so let’s create a mask:
JavaScript
  1. For level 1 computation, it’s easy to compare when there is a change from NaN to not NaN by shifting rows by one:
JavaScript
  1. For level 2 computation, it’s a bit trickier. One way to do it is to run the computation for each level_1 group and do .transform to preserve the indexing:
JavaScript

Last step (if needed) is to transform columns to strings:

JavaScript
Advertisement