Assume I have the dataframe df and I want to slice this in multiple dataframes and store each in a list (list_of_dfs).
Each sub-dataframe should only contain the rows “Result”. One sub-dataframe starts, when in column “Point” the value “P1” and in column “X_Y” the value “X” is given.
I tried this with first finding the indicies of each “P1” and then slicing the overall dataframe within a list comprehension using the indicies of “P1”. But I receive a list with two empty dataframes. Can someone advise? Thanks!
JavaScript
x
48
48
1
import pandas as pd
2
3
df = pd.DataFrame(
4
{
5
"Step": (
6
"1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "Result", "Result", "Result", "Result", "Result",
7
"1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "Result", "Result", "Result", "Result", "Result"
8
),
9
"Point": (
10
"P1", "P2", "P2", "P3", "P3", "P1", "P2", "P2", "P3", "P3", "P1", "P2", "P2", "P3", "P3",
11
"P1", "P2", "P2", "P3", "P3", "P1", "P2", "P2", "P3", "P3", "P1", "P2", "P2", "P3", "P3",
12
),
13
"X_Y": (
14
"X", "X", "Y", "X", "Y", "X", "X", "Y", "X", "Y", "X", "X", "Y", "X", "Y",
15
"X", "X", "Y", "X", "Y", "X", "X", "Y", "X", "Y", "X", "X", "Y", "X", "Y",
16
),
17
"Value A": (
18
70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72,
19
70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72,
20
),
21
"Value B": (
22
70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72,
23
70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72,
24
),
25
}
26
)
27
28
dff = df.loc[df["Step"] == "Result"]
29
30
value = "P1"
31
tuple_of_positions = list()
32
33
result = dff.isin([value])
34
35
seriesObj = result.any()
36
columnNames = list(seriesObj[seriesObj == True].index)
37
38
for col in columnNames:
39
rows = list(result[col][result[col] == True].index)
40
for row in rows:
41
tuple_of_positions.append((row, col))
42
43
length_of_one_df = (len(dff["Point"].unique().tolist()) * 2 ) - 1
44
45
list_of_dfs = [dff.iloc[x : x + length_of_one_df] for x in rows]
46
47
print(list_of_dfs)
48
Advertisement
Answer
JavaScript
1
4
1
sub = df.query("Step == "Result"")
2
pivots = sub[["Point", "X_Y"]].eq(["P1", "X"]).all(axis=1)
3
out = [fr for _, fr in sub.groupby(pivots.cumsum())]
4
- get the subset of the frame where Step is equal to “Result”
- check in which rows there is “P1” and “X” sequence
- that gives a True/False series
- cumulative sum of it determines the group as the “pivoting” (turning) points will be True since False == 0 in numeric context
- iterating over a GroupBy object yields “group_label, sub_frame” pairs, out of which we pull the sub_frames
to get
JavaScript
1
15
15
1
>>> out
2
3
[ Step Point X_Y Value A Value B
4
10 Result P1 X 70.00 70.00
5
11 Result P2 X 68.00 68.00
6
12 Result P2 Y 66.75 66.75
7
13 Result P3 X 68.08 68.08
8
14 Result P3 Y 66.72 66.72,
9
Step Point X_Y Value A Value B
10
25 Result P1 X 70.00 70.00
11
26 Result P2 X 68.00 68.00
12
27 Result P2 Y 66.75 66.75
13
28 Result P3 X 68.08 68.08
14
29 Result P3 Y 66.72 66.72]
15
where the intermediares were
JavaScript
1
14
14
1
>>> sub
2
3
Step Point X_Y Value A Value B
4
10 Result P1 X 70.00 70.00
5
11 Result P2 X 68.00 68.00
6
12 Result P2 Y 66.75 66.75
7
13 Result P3 X 68.08 68.08
8
14 Result P3 Y 66.72 66.72
9
25 Result P1 X 70.00 70.00
10
26 Result P2 X 68.00 68.00
11
27 Result P2 Y 66.75 66.75
12
28 Result P3 X 68.08 68.08
13
29 Result P3 Y 66.72 66.72
14
JavaScript
1
14
14
1
>>> pivots
2
3
10 True
4
11 False
5
12 False
6
13 False
7
14 False
8
25 True
9
26 False
10
27 False
11
28 False
12
29 False
13
dtype: bool
14
JavaScript
1
15
15
1
# groups
2
>>> pivots.cumsum()
3
4
10 1
5
11 1
6
12 1
7
13 1
8
14 1
9
25 2
10
26 2
11
27 2
12
28 2
13
29 2
14
dtype: int32
15