Skip to content
Advertisement

Creating new columns within a dataframe, based on the latest value from previous columns

I’ve just completed a beginner’s course in python, so please bear with me if the code below doesn’t make sense or my issue is because of some rookie mistake.

I’ve been trying to put the learning to use by working with college production of NFL players, with a view to understanding which statistics can be predictive or at least correlate to NFL production. It turns out that there’s a lot of data out there so I have about 200 columns of data for 600 odd prospects from the last 20 years (just for running backs so far). However, one of the problems with this data is that each stat is only provided by the age the prospect was in that season giving me something like this:

JavaScript

What I want to do at the moment is to be able to take the last year of college production and put it into a new column (for 17 different statistics). I’ve therefore defined the following function:

JavaScript

Which I think should go backwards through the columns until I find a value which isn’t NaN, and then take that value as the output. I’ve then defined the columns via a list:

JavaScript

and have then run the function through a for loop based on this list:

JavaScript

The result I’m getting back is a slightly bizarre one – the for loop appears to work, as all the new columns I’m expecting are created, however they are only populated with data where the player had an age 23 season. The remainder of indexes are filled with ‘NaN’:

JavaScript

This suggests to me that the first ‘if’ statement in my function is working fine, but that all of the ‘elif’ statements aren’t triggering and I can’t work out why. I’m wondering whether it’s because I need to be more explicit about why they would trigger, rather than just relying on a logical test of ‘if the column is not, not equal to NaN, go to the next one’, or if I’m misunderstanding the elif aspect all together. I’ve put the whole segment of code in also, just because when I’ve run into issues so far the problem has often not been where I originally thought.

By all means tell me if you think I’ve gone about this in a weird way – this just seemed like a logical approach to the problem but open to other ways of getting the desired result.

Thanks in advance!

Advertisement

Answer

Use DataFrame.filter with regex parameter with ^ for start of strings, then forward filling missing values and last select last value by position:

JavaScript

Another alternative solution with Series.str.startswith for select columns:

JavaScript
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement