Selecting rows based on condition in python pandas

Question

I have a data-frame with columns as [&#8216;ID&#8217;,&#8217;Title&#8217;,&#8217;Category&#8217;,&#8217;Company&#8217;,&#8217;Field&#8217;] and it has both blank values and at some places missing values are put as N/A. I have to pick the row which has maximum information available. For example one case could …

Accepted Answer

You can use (~df.isin(["", "N/A"])).sum(1) to get the number of valid values in a row by summing the boolean values on axis=1. Combine it with groupby and idxmax:Example data (added an extra ID to showcase the groupby):  ID             Title      Category       Company       Field0 ABD12567       Title1                    Company1   1 ABD12567       Title1     N/A                          Field12 ABD12567       Title1     Category1      Company1      Field13 ABD12567       Title1                    Company1   4 ABD12567       Title1     N/A            Company1      Field15 ABD12568       Title1     N/A            Company1      Field1Code:idx = (df.assign(max=(~df.isin(["", "N/A"])).sum(1)) # assign temp column         .groupby("ID")["max"].idxmax()) #retrieve index of max value within each groupprint (df.loc[idx])         ID   Title   Category   Company   Field2  ABD12567  Title1  Category1  Company1  Field15  ABD12568  Title1        N/A  Company1  Field1

Advertisement

Answer