I have the following DataFrame:
| play_id | position | frame | x | y |
|---|---|---|---|---|
| 1 | A_1 | 1 | 0.1 | 0.1 |
| 1 | A_2 | 1 | 0.1 | 0.1 |
| 1 | B_1 | 1 | 0.1 | 0.1 |
| 1 | A_1 | 2 | 0.1 | 0.1 |
| 1 | A_2 | 2 | 0.1 | 0.1 |
| 1 | B_1 | 2 | 0.1 | 0.1 |
| 2 | A_1 | 1 | 0.1 | 0.1 |
| 2 | B_1 | 1 | 0.1 | 0.1 |
| 2 | B_2 | 1 | 0.1 | 0.1 |
| 2 | A_1 | 2 | 0.1 | 0.1 |
| 2 | B_1 | 2 | 0.1 | 0.1 |
| 2 | B_2 | 2 | 0.1 | 0.1 |
And I want to reformat to (Multi-Index columns):
| position | A_1 | A_1 | A_1 | A_1 | A_2 | A_2 | A_2 | A_2 | B_1 | B_1 | B_1 | B_1 | B_2 | B_2 | B_2 | B_2 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| coord | x | x | y | y | x | x | y | y | x | x | y | y | x | x | y | y |
| frame | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 2 |
| play_id | ||||||||||||||||
| 1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | NaN | NaN | NaN | NaN |
| 2 | 0.1 | 0.1 | 0.1 | 0.1 | NaN | NaN | NaN | NaN | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 |
Importantly, note that not all positions exist for all play_ids. This will result in some cells being empty.
Advertisement
Answer
sort_values()so index is in order you wantset_index()existing columnssick()the coords- name everything
unstack()to get multi-index columns
df = pd.read_csv(io.StringIO("""play_id position frame x y
1 A_1 1 0.1 0.1
1 A_2 1 0.1 0.1
1 B_1 1 0.1 0.1
1 A_1 2 0.1 0.1
1 A_2 2 0.1 0.1
1 B_1 2 0.1 0.1
2 A_1 1 0.1 0.1
2 B_1 1 0.1 0.1
2 B_2 1 0.1 0.1
2 A_1 2 0.1 0.1
2 B_1 2 0.1 0.1
2 B_2 2 0.1 0.1"""), sep="t")
df = df.sort_values(["position","frame","play_id"]).set_index(["position","frame","play_id"]).stack()
df.reindex(df.index.set_names(["position","frame","play_id","coord"])).unstack([0,1,3])
output
position A_1 A_2 B_1 B_2 frame 1 2 1 2 1 2 1 2 coord x y x y x y x y x y x y x y x y play_id 1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 NaN NaN NaN NaN 2 0.1 0.1 0.1 0.1 NaN NaN NaN NaN 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1