Skip to content
Advertisement

Loop through multiple small Pandas dataframes and create summary dataframes based on a single column

I have a bunch of small dataframes each representing a single match in a game. I would like to take these dataframes and consolidate them into a single dataframe for each player without knowing the player’s names ahead of time.

The starting dataframes look like this:

NAME     VAL1  VAL2  VAL3
player1  3     5     7
player2  2     6     8
player3  3     6     7

NAME     VAL1  VAL2  VAL3
player2  5     7     7
player3  2     6     8
player5  3     6     7

And I would like to get to a series of frames looking like this

NAME     VAL1  VAL2  VAL3
player1  3     5     7

NAME     VAL1  VAL2  VAL3
player2  2     6     8
player2  5     7     7

NAME     VAL1  VAL2  VAL3
player3  3     6     7
player3  2     6     8

NAME     VAL1  VAL2  VAL3
player5  3     6     7

My problem is that the solutions that I’ve found so far all require me to know the player names ahead of time and manually set up a dataframe for each player. Since I’ll be working with 40-50 players and I won’t know all their names until I have the raw data I’d like to avoid that if at all possible.

I have a loose plan to create a dictionary of players with each player key containing a dict of their rows from the dataframes. Once all the match dataframes are processed I would convert the dict of dicts into individual player dataframes. I’m not sure if this is the best approach though and am hoping that there’s a more efficient way to do this.

Advertisement

Answer

Let’s try concat + groupby then build out a dict:

dfs = {group_name: df_
       for group_name, df_ in pd.concat([df1, df2]).groupby('NAME')}

dfs:

{'player1':       NAME  VAL1  VAL2  VAL3
0  player1     3     5     7,
 'player2':       NAME  VAL1  VAL2  VAL3
1  player2     2     6     8
0  player2     5     7     7,
 'player3':       NAME  VAL1  VAL2  VAL3
2  player3     3     6     7
1  player3     2     6     8,
 'player5':       NAME  VAL1  VAL2  VAL3
2  player5     3     6     7}

Each player’s DataFrame can then be accessed like:

dfs['player1']:

      NAME  VAL1  VAL2  VAL3
0  player1     3     5     7

Or as a list:

dfs = [df_ for _, df_ in pd.concat([df1, df2]).groupby('NAME')]

dfs:

[      NAME  VAL1  VAL2  VAL3
0  player1     3     5     7,
       NAME  VAL1  VAL2  VAL3
1  player2     2     6     8
0  player2     5     7     7,
       NAME  VAL1  VAL2  VAL3
2  player3     3     6     7
1  player3     2     6     8,
       NAME  VAL1  VAL2  VAL3
2  player5     3     6     7]

Each player’s DataFrame can then be accessed like:

dfs[1]:

      NAME  VAL1  VAL2  VAL3
1  player2     2     6     8
0  player2     5     7     7
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement