Skip to content

Tag: cumsum

pandas cumsum on lag-differenced dataframe

Say I have a pd.DataFrame() that I differenced with .diff(5), which works like “new number at idx i = (number at idx i) – (number at idx i-5)” Now I want to undo this operation using the first 5 entries of example_df, and using df_diff. If i had done .diff(1), I would simply use .cumsum(). But how can I achieve

Pandas cumsum with keys

I have two DataFrames (first, second): index_first value_1 value_2 0 100 1 1 200 2 2 300 3 index_second value_1 value_2 0 50 10 1 100 20 2 150 30 Next I concat the two DataFrames with keys: My goal is to calculate the cumulative sum of value_1 and value_2 in z considering the keys. So the final DataFrame should

Split a dataframe based on a specifc cumsum value

I have a solution working, but it seems cumbersome and I am wondering if there is a better way to achieve what I want. I need to achieve two things: Split a dataframe into two dataframes based on a specifc cumsum value. If a row needs to be split to fulfill the cumsum condition, than this must happen. An example

How can I use cumsum skipping the first entry?

I have a DF that contains the ids of several creators of certain projects and the outcomes of their projects over time. Each project can either be a success (outcome = 1) or a failure (outcome=0). The DF looks like this: I’m looking for a way to create two new columns: previous projects and previous successes. The first should be

Pandas sum() with character condition

I have the following dataframe: I want to use cumsum() in order to sum the values in column “1”, but only for specific variables: I want to sum all the variables that start with tt and all the variable that start with bb in my dataframe, so in the end i’ll have the folowing table : I know how to

Python pandas cumsum with reset everytime there is a 0

I have a matrix with 0s and 1s, and want to do a cumsum on each column that resets to 0 whenever a zero is observed. For example, if we have the following: The result I desire is: However, when I try df.cumsum() * df, I am able to correctly identify the 0 elements, but the counter does not reset:
