Skip to content
Advertisement

Selecting rows based on condition in python pandas

I have a data-frame with columns as ['ID','Title','Category','Company','Field'] and it has both blank values and at some places missing values are put as N/A. I have to pick the row which has maximum information available.

For example one case could be.

JavaScript

In this case i want to select the row number 2 as it has maximum information available. I tried to add one length columns to the data-frame where i will add all the string lengths and will pick the one with maximum length with help of group-by. For Example catalog_11=catalog_1.groupby(['ID'], as_index = False).agg({'combined_len': max}) . And I will keep the index of that row in this case 2 and will filter out the row with index 2 from the old data-frame. But this is not working as i am loosing the index of the old data-frame after group-by. Not sure how to solve this. Please help me to pick the right row. Not Sure how to solve this with some other workaround.

If there is any other alternative than the combined length which i am using then please suggest.

Advertisement

Answer

You can use (~df.isin(["", "N/A"])).sum(1) to get the number of valid values in a row by summing the boolean values on axis=1. Combine it with groupby and idxmax:

Example data (added an extra ID to showcase the groupby):

JavaScript

Code:

JavaScript
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement