I have a dataframe like this structure (in the real one there are more columns Game x, around 30, but for explaining I think it’s ok with these 2 columns):
JavaScript
x
7
1
Name Game 1 Game 2
2
0 Player 1 Starting 68 Starting
3
1 Player 2 Bench 74 Starting 80
4
2 Player 3 Starting Bench
5
3 Player 4 Bench Bench 50
6
4 Player 5 NaN Starting
7
I need new columns for counting the minutes of any player in the columns “Game x” based in these conditions:
- Starting: means the player has played 90 minutes
- Starting 68 (or whichever): means the player has played 68 minutes (or whichever)
- Bench and NaN: means the player has played 0 minutes
- Bench 74 (or whichever): means the player has played 16 minutes (the total is 90 so he started at the minute 74 and then is 90 – 74 = 16)
There would be 2 columns counting the number of the minutes the player has played when he started the game and when he entered the game from the bench.
The final dataframe would be:
JavaScript
1
7
1
Name Game 1 Game 2 Minutes Starting Minutes Bench
2
0 Player 1 Starting 68 Starting 158 0
3
1 Player 2 Bench 74 Starting 80 80 16
4
2 Player 3 Starting Bench 90 0
5
3 Player 4 Bench Bench 50 0 40
6
4 Player 5 NaN Starting 60 60 0
7
Advertisement
Answer
If you write a function that parses a text field and returns the corresponding number of minutes, you can apply that function to each game column and add up the results. For example, the time played from start:
JavaScript
1
20
20
1
def played_from_start(entry):
2
entry = str(entry) # Without this, np.nan is a float.
3
if entry == 'nan' or entry == '':
4
return 0
5
if entry.startswith('Bench'):
6
return 0
7
if entry == 'Starting':
8
return 90
9
if entry.startswith('Starting'):
10
return int(entry[9:])
11
print(f"Warning: Entry '{entry}' not recognized.")
12
return np.nan
13
14
15
games = ['Game 1', 'Game 2']
16
17
df['Minutes Starting'] = np.sum(np.array([df[game].apply(played_from_start).values
18
for game in games]),
19
axis=0)
20