what is the best way to create running total columns in pandas

Question

What is the most pandastic way to create running total columns at various levels (without iterating over the rows)? input: output: The test column can only contain X's or NaNs. The number of consecutive X's is random. In the 'desired_output_level_1' column, trying to count up the number of series of X's. In the 'desired_output_level_2' column, trying to find the duration

Accepted Answer

Perhaps not the most pandastic way, but seems to yield what you are after.Three key points:we are operating on only rows that are not NaN, so let&#8217;s create a mask:mask = df['test'].notna()For level 1 computation, it&#8217;s easy to compare when there is a change from NaN to not NaN by shifting rows by one:df.loc[mask, "level_1"] = (df["test"].isna() & df["test"].shift(-1).notna()).cumsum()For level 2 computation, it&#8217;s a bit trickier. One way to do it is to run the computation for each level_1 group and do .transform to preserve the indexing:df.loc[mask, "level_2"] = (    df.loc[mask, ["level_1"]]    .assign(level_2=1)    .groupby("level_1")["level_2"]    .transform("cumsum"))Last step (if needed) is to transform columns to strings:df['level_1'] = df['level_1'].astype('Int64').astype('str')df['level_2'] = df['level_2'].astype('Int64').astype('str')

Advertisement

Answer