Skip to content
Advertisement

Pandas return count of values and all the matching/associated values in another column

I have two columns in a pandas dataframe FeatureID, and Issue ID. There can be multiple issues for a feature. An IssueID is unique and never repeated.

For example (actual data is 1500 rows so let’s say it’s as follows):

IssueID FeatureID
5612 65002
5613 65401
5614 65002
5615 65002
5616 65401
5617 65432
5618 65536
5619 65002
5620 65536
5621 65536
etc.. etc..

I am trying to return all of the features that have 2 or more Issue IDs associated with them and I would like to list all of the IssueIDs in a new column.

Using the following I can get the count of the FeatureIDs, but I don’t know how to get from that to also printing the IssueIDs (or just listing them in a new column).

(Assume I’ve already read the csv into a DataFrame)

df2 = pd.DataFrame([['IssueID', 'FeatureID']]

Features = df2['FeatureID'].value_counts().loc[lambda x : x>2]
print(FeatureS)

How can I display the FeatureIDs, the number of issues related to them, and then the IssueIDs for each FeatureID?

Advertisement

Answer

Use aggregate function.

df2 = df.groupby('FeatureID').agg({'IssueID': [len, list]}).droplevel(0, axis=1).reset_index() 

   FeatureID  len                      list
0      65002    4  [5612, 5614, 5615, 5619]
1      65401    2              [5613, 5616]
2      65432    1                    [5617]
3      65536    3        [5618, 5620, 5621]

df2[df2['len']>1]

   FeatureID  len                      list
0      65002    4  [5612, 5614, 5615, 5619]
1      65401    2              [5613, 5616]
3      65536    3        [5618, 5620, 5621]
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement