Skip to content
Advertisement

Dataframe count set of conditions passed by several columns on a per row basis

I have a dataframe which looks something like this:

JavaScript

I am trying to compute a value based on a condition for every row which will apply across the column groupings of A, B, C, D, etc. and count how many of those groups passed the condition, for example, some pseudo-code:

JavaScript

Expected output:

JavaScript

This would mean the example dataframe would end up with a new column with a value of 2 because in this example column C and D pass the set of conditions.

Later I plan to do something for each row with a count value above some number.

The best way I was able to come up with looked something like this:

JavaScript

Not only is this incorrect, but it is also extremely slow. What I like about it, is that it is readable and adding additional conditions if there are more columns/different data is relatively straightforward.

I am not sure if this type of operation is too complex to perform on a dataframe the way I am trying to and perhaps I need to change my data structure or not, but I figured I’d see if there is some way of going about this before I start to re-write things.

Ideally the logic being applied to the column groups (A, B, C, D in this case) can be modular and whatever method I use to do this should be easy to add/remove conditions because I want to be able to add/remove data columns and logic conditions over time.

What I am trying to do is test an arbitrary set of conditions when one of the columns for a group (which is a correlation coefficient) is above a certain value. If enough of the column groups pass the conditions, do something.

Thanks in advance, I am a python and pandas novice and this has been giving me a headache for days.

Advertisement

Answer

Let’s ignore date and handle the other columns first.

Remove date, create a MultiIndex using str.split, and stack to long form:

JavaScript
JavaScript

Apply the condition row-wise:

JavaScript

Then sum level 0 and assign back to the original df:

JavaScript

(Alternatively sum the conditions directly rather than assigning to both new_df and df)

JavaScript

df:

JavaScript

Complete Working Example:

JavaScript
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement