Skip to content
Advertisement

Counting unique mentions in Pandas dataframe column while grouped by multiple other columns

For a school project I am attempting to determine the number of mentions specific words have in Reddit titles and comments. More specifically, stock ticker mentions. Currently the dataframe looks like this (where type could be a string of either title or comment):

JavaScript

Where the mentions column contains a set of tickers mentioned in the body (could be multiple). What I wish to do is to count the number of unique mentions on a per-subreddit per-type (either comment or title) basis. The result I am looking for would be similar to this:

JavaScript

Repeated for all unique tickers mentioned.

I had used counters to figure this out utilizing dataframes specific to each instance (ie one dataframe for wallstreetbets comments, one dataframe for wallstreetbets titles) but I could not figure out how to make it work in this fashion when confined to a singular dataframe.

Advertisement

Answer

Sound like a simple groupby should do it:

JavaScript

produces

JavaScript
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement