Pandas return count of values and all the matching/associated values in another column

Question

I have two columns in a pandas dataframe FeatureID, and Issue ID. There can be multiple issues for a feature. An IssueID is unique and never repeated. For example (actual data is 1500 rows so let's say it's as follows): IssueID FeatureID 5612 65002 5613 65401 5614 65002 5615 65002 5616 65401 5617 65432 5618 65536 5619 65002 5620 65536

Accepted Answer

Use aggregate function.df2 = df.groupby('FeatureID').agg({'IssueID': [len, list]}).droplevel(0, axis=1).reset_index()    FeatureID  len                      list0      65002    4  [5612, 5614, 5615, 5619]1      65401    2              [5613, 5616]2      65432    1                    [5617]3      65536    3        [5618, 5620, 5621]df2[df2['len']>1]   FeatureID  len                      list0      65002    4  [5612, 5614, 5615, 5619]1      65401    2              [5613, 5616]3      65536    3        [5618, 5620, 5621]

IssueID	FeatureID
5612	65002
5613	65401
5614	65002
5615	65002
5616	65401
5617	65432
5618	65536
5619	65002
5620	65536
5621	65536
etc..	etc..

Advertisement

Answer