I have a bunch of small dataframes each representing a single match in a game. I would like to take these dataframes and consolidate them into a single dataframe for each player without knowing the player’s names ahead of time.
The starting dataframes look like this:
NAME VAL1 VAL2 VAL3
player1 3 5 7
player2 2 6 8
player3 3 6 7
NAME VAL1 VAL2 VAL3
player2 5 7 7
player3 2 6 8
player5 3 6 7
And I would like to get to a series of frames looking like this
NAME VAL1 VAL2 VAL3
player1 3 5 7
NAME VAL1 VAL2 VAL3
player2 2 6 8
player2 5 7 7
NAME VAL1 VAL2 VAL3
player3 3 6 7
player3 2 6 8
NAME VAL1 VAL2 VAL3
player5 3 6 7
My problem is that the solutions that I’ve found so far all require me to know the player names ahead of time and manually set up a dataframe for each player. Since I’ll be working with 40-50 players and I won’t know all their names until I have the raw data I’d like to avoid that if at all possible.
I have a loose plan to create a dictionary of players with each player key containing a dict of their rows from the dataframes. Once all the match dataframes are processed I would convert the dict of dicts into individual player dataframes. I’m not sure if this is the best approach though and am hoping that there’s a more efficient way to do this.
Advertisement
Answer
Let’s try concat
+ groupby
then build out a dict
:
dfs = {group_name: df_
for group_name, df_ in pd.concat([df1, df2]).groupby('NAME')}
dfs
:
{'player1': NAME VAL1 VAL2 VAL3
0 player1 3 5 7,
'player2': NAME VAL1 VAL2 VAL3
1 player2 2 6 8
0 player2 5 7 7,
'player3': NAME VAL1 VAL2 VAL3
2 player3 3 6 7
1 player3 2 6 8,
'player5': NAME VAL1 VAL2 VAL3
2 player5 3 6 7}
Each player’s DataFrame can then be accessed like:
dfs['player1']
:
NAME VAL1 VAL2 VAL3
0 player1 3 5 7
Or as a list
:
dfs = [df_ for _, df_ in pd.concat([df1, df2]).groupby('NAME')]
dfs
:
[ NAME VAL1 VAL2 VAL3
0 player1 3 5 7,
NAME VAL1 VAL2 VAL3
1 player2 2 6 8
0 player2 5 7 7,
NAME VAL1 VAL2 VAL3
2 player3 3 6 7
1 player3 2 6 8,
NAME VAL1 VAL2 VAL3
2 player5 3 6 7]
Each player’s DataFrame can then be accessed like:
dfs[1]
:
NAME VAL1 VAL2 VAL3
1 player2 2 6 8
0 player2 5 7 7