Split a dataframe based on a specifc cumsum value

Question

I have a solution working, but it seems cumbersome and I am wondering if there is a better way to achieve what I want. I need to achieve two things: Split a dataframe into two dataframes based on a specifc cumsum value. If a row needs to be split to fulfill the cumsum condition, than this must happen. An exam…

Accepted Answer

Calculate the columns you have notedfind row where cumsum() goes above magic number 2500on that row make vol a list which is the split to cap a cumsum() to magic numberexpand list back out using explode()calc derived numbers again and re-use split column to identify which target DF it isfinally generate target DFs as a dictdf = pd.DataFrame({'Age': [30, 20, 22, 40, 32, 28, 39],                   'vol': [165, 70, 120, 80, 180, 172, 150],                   'price': [4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],                   }, index=['A', 'B', 'C', 'D', 'E',                            'F', 'G']                  )magicv = 2500df = (df.assign(eurvol=df.vol*df.price,         eurvol_cs=lambda dfa: dfa.eurvol.cumsum(),           # find row where cumsum goes above magic number         split=lambda dfa: dfa.eurvol_cs.gt(magicv) & dfa.eurvol_cs.shift().lt(magicv),           # split vol on row where it goes above magic number into a list          vol=lambda dfa: np.where(dfa.split,                                    dfa.apply(lambda r: [r.vol-((r.eurvol_cs-magicv)/r.price),                                                             (r.eurvol_cs-magicv)/r.price], axis=1),                                    dfa.vol),         ) # explode list .explode("vol") # recalc and group DF .assign(eurvol=lambda dfa: dfa.vol*dfa.price,         split=lambda dfa: dfa.eurvol.cumsum().gt(magicv),        ) .drop(columns="eurvol_cs"))# finally a dict of multiple dataframesdfs = {f"df_{i+1}":df.loc[df.split.eq(v), [c for c in df.columns if c!="split"]] for i,v in enumerate(df.split.unique())}output dict{'df_1':    Age        vol  price  eurvol A   30        165    4.6   759.0 B   20         70    8.3   581.0 C   22        120    9.0  1080.0 D   40  24.242424    3.3    80.0, 'df_2':    Age        vol  price  eurvol D   40  55.757576    3.3   184.0 E   32        180    1.8   324.0 F   28        172    9.5  1634.0 G   39        150    2.2   330.0}

Advertisement

Answer

output dict