How to use a pandas groupby to filter this dataframe?

Question

Using Python how can you use a group-by to filter this dataset Start How can I make it so that where either the two conditions are accepted, filtering everything else that doesn't meet these two criteria ID1 - Matches another ID1 and the Last3 are the same ID2 - Matches another ID2 and the First 3 are the same End

Accepted Answer

Based on comment for clarification of the problem statement &#8211;trying to groupby ID1 or ID2. And then depending which ID filter if Last3 col and First3 Col are the same respectivelyTry this approach &#8211;#group by ID1 and check if duplicates in last3. Then extract the index number that satisfies conditionc1 = df.groupby('ID1').apply(pd.DataFrame.duplicated, subset=['Last3'], keep=False)c1_idx = c1[c1].droplevel(0).index#group by ID2 and check if duplicates in first3. Then extract the index number that satisfies conditionc2 = df.groupby('ID2').apply(pd.DataFrame.duplicated, subset=['First3'], keep=False)c2_idx = c2[c2].droplevel(0).index#take a union of the 2 indexes and then ..#filter dataframe for the indexes that meet the 2 independent conditionsoutput = df.iloc[c1_idx.union(c2_idx)]print(output)   First   Last   Location             ID1            ID2 First3 Last30   John  Smith    Toronto     JohnToronto   SmithToronto    Joh   Smi1    Joh  Smith    Toronto      JohToronto   SmithToronto    Joh   Smi2  Steph    Sax  Vancouver  StephVancouver   SaxVancouver    Ste   Sax3  Steph     Sa  Vancouver  StephVancouver  SaxeVancouver    Ste   Sax4  Stacy    Lee    Markham    StacyMarkham     LeeMarkham    Sta   Lee5   Stac    Lee    Markham     StacMarkham     LeeMarkham    Sta   LeeEDIT: Modifying the above answer provided by @SomeDude, you can run this as 2 independent conditions without a groupby and take an OR between them as well &#8211;m1 = df.duplicated(subset=['ID1','Last3'],keep=False)m2 = df.duplicated(subset=['ID2','First3'],keep=False)df[m1 | m2]

Advertisement

Answer