Skip to content
Advertisement

Is there a better more readable way to coalese columns in pandas

I often need a new column that is the best I can achieve from other columns and I have a specific list of preference priorities. I am willing to take the first non null value.

JavaScript

Results

JavaScript

this code works (and the result are what I want) but it is not very fast.
I get to pick my priorities if I need to [[‘second’,’third’,’first’]]

Coalesce somewhat like the function of the same name from tsql.
I suspect that I may have overlooked an easy way to achieve it with good performance on large DataFrames (+400,000 rows)

I know there are lots of ways to fill in missing data which I often use on axis=0 this is what makes me think I may have missed an easy option for axis=1

Can you suggest something nicer/faster… or confirm that this is as good as it gets.

Advertisement

Answer

You could use pd.isnull to find the null — in this case None — values:

JavaScript

and then use np.argmin to find the index of the first non-null value. If all the values are null, np.argmin returns 0:

JavaScript

Then you could select the desired values from df using NumPy integer-indexing:

JavaScript

For example,

JavaScript

yields

JavaScript

Using argmin instead of df3.apply(coalesce, ...) is significantly quicker if the DataFrame has a lot of rows:

JavaScript
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement