I have a dataframe like this structure (in the real one there are more columns Game x, around 30, but for explaining I think it’s ok with these 2 columns):
Name Game 1 Game 2 0 Player 1 Starting 68 Starting 1 Player 2 Bench 74 Starting 80 2 Player 3 Starting Bench 3 Player 4 Bench Bench 50 4 Player 5 NaN Starting
I need new columns for counting the minutes of any player in the columns “Game x” based in these conditions:
- Starting: means the player has played 90 minutes
- Starting 68 (or whichever): means the player has played 68 minutes (or whichever)
- Bench and NaN: means the player has played 0 minutes
- Bench 74 (or whichever): means the player has played 16 minutes (the total is 90 so he started at the minute 74 and then is 90 – 74 = 16)
There would be 2 columns counting the number of the minutes the player has played when he started the game and when he entered the game from the bench.
The final dataframe would be:
Name Game 1 Game 2 Minutes Starting Minutes Bench 0 Player 1 Starting 68 Starting 158 0 1 Player 2 Bench 74 Starting 80 80 16 2 Player 3 Starting Bench 90 0 3 Player 4 Bench Bench 50 0 40 4 Player 5 NaN Starting 60 60 0
Advertisement
Answer
If you write a function that parses a text field and returns the corresponding number of minutes, you can apply that function to each game column and add up the results. For example, the time played from start:
def played_from_start(entry): entry = str(entry) # Without this, np.nan is a float. if entry == 'nan' or entry == '': return 0 if entry.startswith('Bench'): return 0 if entry == 'Starting': return 90 if entry.startswith('Starting'): return int(entry[9:]) print(f"Warning: Entry '{entry}' not recognized.") return np.nan games = ['Game 1', 'Game 2'] df['Minutes Starting'] = np.sum(np.array([df[game].apply(played_from_start).values for game in games]), axis=0)