I have this dataframe, df_match
:
# Column Non-Null Count Dtype --- ------ -------------- ----- 0 match_id 680 non-null int64 1 league_id 680 non-null object 2 from_home_player_1_to_home_player_11 680 non-null object
where each row on from_home_player_1_to_home_player_11
column keeps a list of tuples, like so:
df_match.sample(1)
:
... None match_id league_id from_home_player_1_to_home_player_11 167 243221 26 [(79066, GKP), (82634, MID), (79578, FWD), (34765, DEF), (23476, WING), (32456, MID),(55897, DEF),(45675, MID),(32345, FWD),(45765,FWD),(12354, WING)]
GOAL
Now I would like to set X/Y coordinates for each player on the field (using only coord X here in order to simplify it), per match (row)
Each player on from_home_player_1_to_home_player_11
needs an X value. So I need a list of newly created X columns, like so:
X_columns = ["home_player_X1", "home_player_X2", "home_player_X3","home_player_X4", "home_player_X5", "home_player_X6", "home_player_X7", "home_player_X8", "home_player_X9","home_player_X10", "home_player_X11",
Lastly, each position has an arbitrary set of X values. (When there’s more than one option, it can be ANY one of them, randomly chosen)
GKP = 1 DEF = [3,4] WING = [2,5] MID = [6,7,8] FWD = [9,10,11]
My aim here is to map -at each row, players positions to an X coordinate, ending up with:
None match_id league_id from_away_player_1_to_away_player_11 / 167 243221 26 [(79066, GKP), (82634, MID), (79578, FWD), (34765, DEF), (23476, WING), (32456, MID),(55897, DEF),(45675, MID),(32345, FWD),(45765,FWD),(12354, WING)] / home_player_X1 home_player_X2 home_player_X3 home_player_X4 1 7 10 3 home_player_X5 home_player_X6 home_player_X7 home_player_X8 5 7 4 7 home_player_X9 home_player_X10 home_player_X11 10 10 2
How can I do this mapping based on the position/value condition with pandas?
I started thinking of iterating through the dataframe with:
for index, value in df_match.iterrows(): pos = value.from_home_player_1_to_home_player_11[1][1] print (index, value)
But i haven’t gone very far with that.
Advertisement
Answer
Something like your data:
df_match = pd.DataFrame( { "match_id" : [243221, 234251], 'league_id' : [26, 11], 'from_home_player_1_to_home_player_11' : [ [(79066, 'GKP'), (82634, 'MID'), (79578, 'FWD'), (34765, 'DEF'), (23476, 'WING'), (32456, 'MID'), (55897, 'DEF'), (45675, 'MID'), (32345, 'FWD'), (45765,'FWD'), (12354, 'WING')], [(14825, 'GKP'), (82634, 'MID'), (79578, 'FWD'), (34765, 'DEF'), (23476, 'WING'), (32456, 'MID'), (55897, 'MID'), (45675, 'MID'), (32345, 'DEF'), (45765,'FWD'), (12354, 'WING')], ] }, index=[167, 1999])
Build a position mapping, noting that all are lists:
pmap = {'GKP' : [1], 'DEF': [3,4], 'WING' : [2,5], 'MID' : [6,7,8], 'FWD' : [9,10,11] }
Apply a lookup from the dictionary, choosing a random option, and then blowing up into individual columns. Rename the columns:
import random tmp = df_match['from_home_player_1_to_home_player_11'].apply(lambda x: [ random.choice(pmap.get(pos, -1)) for n, pos in x]).apply(pd.Series) tmp.columns = [f"home_player_X{i}" for i in range(1,12)]
Note that it puts -1
in the position if the key isn’t found. Then pd.concat()
them together:
df2 = pd.concat([df_match, tmp], axis=1)