Assume I have the dataframe df and I want to slice this in multiple dataframes and store each in a list (list_of_dfs).
Each sub-dataframe should only contain the rows “Result”. One sub-dataframe starts, when in column “Point” the value “P1” and in column “X_Y” the value “X” is given.
I tried this with first finding the indicies of each “P1” and then slicing the overall dataframe within a list comprehension using the indicies of “P1”. But I receive a list with two empty dataframes. Can someone advise? Thanks!
import pandas as pd df = pd.DataFrame( { "Step": ( "1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "Result", "Result", "Result", "Result", "Result", "1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "Result", "Result", "Result", "Result", "Result" ), "Point": ( "P1", "P2", "P2", "P3", "P3", "P1", "P2", "P2", "P3", "P3", "P1", "P2", "P2", "P3", "P3", "P1", "P2", "P2", "P3", "P3", "P1", "P2", "P2", "P3", "P3", "P1", "P2", "P2", "P3", "P3", ), "X_Y": ( "X", "X", "Y", "X", "Y", "X", "X", "Y", "X", "Y", "X", "X", "Y", "X", "Y", "X", "X", "Y", "X", "Y", "X", "X", "Y", "X", "Y", "X", "X", "Y", "X", "Y", ), "Value A": ( 70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72, ), "Value B": ( 70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72, ), } ) dff = df.loc[df["Step"] == "Result"] value = "P1" tuple_of_positions = list() result = dff.isin([value]) seriesObj = result.any() columnNames = list(seriesObj[seriesObj == True].index) for col in columnNames: rows = list(result[col][result[col] == True].index) for row in rows: tuple_of_positions.append((row, col)) length_of_one_df = (len(dff["Point"].unique().tolist()) * 2 ) - 1 list_of_dfs = [dff.iloc[x : x + length_of_one_df] for x in rows] print(list_of_dfs)
Advertisement
Answer
sub = df.query("Step == "Result"") pivots = sub[["Point", "X_Y"]].eq(["P1", "X"]).all(axis=1) out = [fr for _, fr in sub.groupby(pivots.cumsum())]
- get the subset of the frame where Step is equal to “Result”
- check in which rows there is “P1” and “X” sequence
- that gives a True/False series
- cumulative sum of it determines the group as the “pivoting” (turning) points will be True since False == 0 in numeric context
- iterating over a GroupBy object yields “group_label, sub_frame” pairs, out of which we pull the sub_frames
to get
>>> out [ Step Point X_Y Value A Value B 10 Result P1 X 70.00 70.00 11 Result P2 X 68.00 68.00 12 Result P2 Y 66.75 66.75 13 Result P3 X 68.08 68.08 14 Result P3 Y 66.72 66.72, Step Point X_Y Value A Value B 25 Result P1 X 70.00 70.00 26 Result P2 X 68.00 68.00 27 Result P2 Y 66.75 66.75 28 Result P3 X 68.08 68.08 29 Result P3 Y 66.72 66.72]
where the intermediares were
>>> sub Step Point X_Y Value A Value B 10 Result P1 X 70.00 70.00 11 Result P2 X 68.00 68.00 12 Result P2 Y 66.75 66.75 13 Result P3 X 68.08 68.08 14 Result P3 Y 66.72 66.72 25 Result P1 X 70.00 70.00 26 Result P2 X 68.00 68.00 27 Result P2 Y 66.75 66.75 28 Result P3 X 68.08 68.08 29 Result P3 Y 66.72 66.72
>>> pivots 10 True 11 False 12 False 13 False 14 False 25 True 26 False 27 False 28 False 29 False dtype: bool
# groups >>> pivots.cumsum() 10 1 11 1 12 1 13 1 14 1 25 2 26 2 27 2 28 2 29 2 dtype: int32