I have the following DataFrame:
play_id | position | frame | x | y |
---|---|---|---|---|
1 | A_1 | 1 | 0.1 | 0.1 |
1 | A_2 | 1 | 0.1 | 0.1 |
1 | B_1 | 1 | 0.1 | 0.1 |
1 | A_1 | 2 | 0.1 | 0.1 |
1 | A_2 | 2 | 0.1 | 0.1 |
1 | B_1 | 2 | 0.1 | 0.1 |
2 | A_1 | 1 | 0.1 | 0.1 |
2 | B_1 | 1 | 0.1 | 0.1 |
2 | B_2 | 1 | 0.1 | 0.1 |
2 | A_1 | 2 | 0.1 | 0.1 |
2 | B_1 | 2 | 0.1 | 0.1 |
2 | B_2 | 2 | 0.1 | 0.1 |
And I want to reformat to (Multi-Index columns):
position | A_1 | A_1 | A_1 | A_1 | A_2 | A_2 | A_2 | A_2 | B_1 | B_1 | B_1 | B_1 | B_2 | B_2 | B_2 | B_2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
coord | x | x | y | y | x | x | y | y | x | x | y | y | x | x | y | y |
frame | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 2 |
play_id | ||||||||||||||||
1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | NaN | NaN | NaN | NaN |
2 | 0.1 | 0.1 | 0.1 | 0.1 | NaN | NaN | NaN | NaN | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 |
Importantly, note that not all positions exist for all play_ids. This will result in some cells being empty.
Advertisement
Answer
sort_values()
so index is in order you wantset_index()
existing columnssick()
the coords- name everything
unstack()
to get multi-index columns
df = pd.read_csv(io.StringIO("""play_id position frame x y 1 A_1 1 0.1 0.1 1 A_2 1 0.1 0.1 1 B_1 1 0.1 0.1 1 A_1 2 0.1 0.1 1 A_2 2 0.1 0.1 1 B_1 2 0.1 0.1 2 A_1 1 0.1 0.1 2 B_1 1 0.1 0.1 2 B_2 1 0.1 0.1 2 A_1 2 0.1 0.1 2 B_1 2 0.1 0.1 2 B_2 2 0.1 0.1"""), sep="t") df = df.sort_values(["position","frame","play_id"]).set_index(["position","frame","play_id"]).stack() df.reindex(df.index.set_names(["position","frame","play_id","coord"])).unstack([0,1,3])
output
position A_1 A_2 B_1 B_2 frame 1 2 1 2 1 2 1 2 coord x y x y x y x y x y x y x y x y play_id 1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 NaN NaN NaN NaN 2 0.1 0.1 0.1 0.1 NaN NaN NaN NaN 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1