I have a bunch of small dataframes each representing a single match in a game. I would like to take these dataframes and consolidate them into a single dataframe for each player without knowing the player’s names ahead of time.
The starting dataframes look like this:
NAME VAL1 VAL2 VAL3 player1 3 5 7 player2 2 6 8 player3 3 6 7 NAME VAL1 VAL2 VAL3 player2 5 7 7 player3 2 6 8 player5 3 6 7
And I would like to get to a series of frames looking like this
NAME VAL1 VAL2 VAL3 player1 3 5 7 NAME VAL1 VAL2 VAL3 player2 2 6 8 player2 5 7 7 NAME VAL1 VAL2 VAL3 player3 3 6 7 player3 2 6 8 NAME VAL1 VAL2 VAL3 player5 3 6 7
My problem is that the solutions that I’ve found so far all require me to know the player names ahead of time and manually set up a dataframe for each player. Since I’ll be working with 40-50 players and I won’t know all their names until I have the raw data I’d like to avoid that if at all possible.
I have a loose plan to create a dictionary of players with each player key containing a dict of their rows from the dataframes. Once all the match dataframes are processed I would convert the dict of dicts into individual player dataframes. I’m not sure if this is the best approach though and am hoping that there’s a more efficient way to do this.
Advertisement
Answer
Let’s try concat
+ groupby
then build out a dict
:
dfs = {group_name: df_ for group_name, df_ in pd.concat([df1, df2]).groupby('NAME')}
dfs
:
{'player1': NAME VAL1 VAL2 VAL3 0 player1 3 5 7, 'player2': NAME VAL1 VAL2 VAL3 1 player2 2 6 8 0 player2 5 7 7, 'player3': NAME VAL1 VAL2 VAL3 2 player3 3 6 7 1 player3 2 6 8, 'player5': NAME VAL1 VAL2 VAL3 2 player5 3 6 7}
Each player’s DataFrame can then be accessed like:
dfs['player1']
:
NAME VAL1 VAL2 VAL3 0 player1 3 5 7
Or as a list
:
dfs = [df_ for _, df_ in pd.concat([df1, df2]).groupby('NAME')]
dfs
:
[ NAME VAL1 VAL2 VAL3 0 player1 3 5 7, NAME VAL1 VAL2 VAL3 1 player2 2 6 8 0 player2 5 7 7, NAME VAL1 VAL2 VAL3 2 player3 3 6 7 1 player3 2 6 8, NAME VAL1 VAL2 VAL3 2 player5 3 6 7]
Each player’s DataFrame can then be accessed like:
dfs[1]
:
NAME VAL1 VAL2 VAL3 1 player2 2 6 8 0 player2 5 7 7