Skip to content
Advertisement

A shorter & more efficient pandas code for cumulative based data selection & column based data selection

Below is the requirement.

There are 2 tables: brand_df (Brands’ value) and score_df (containing Subject score for each brand). [Generating samples below]

JavaScript

What is being done :-

  1. Pick only the top brands that make 75% of the cumulative value
JavaScript
  1. Pick the subjects where 75% of the selected brand has a score (i.e > 0)
JavaScript

I get the below output using the above scripts.

JavaScript

What is a more efficient or shorter way to achieve this?

Advertisement

Answer

You creating cum_percent is good so far. What’s next is to remove the two loops:

JavaScript
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement