I am working with tweets and I would like to report the mean sentiment score by topic and by community.
This is what my dataframe looks like where each row is a document (tweet):
tweet_text sentiment community_id topic_1 topic_2 topic_3 ... topic_k "blah blah blah" 0.7 1233 1 0 0 ... 1 "blah blah blah" -0.4 9845 0 1 1 ... 0 "blah blah blah" 0.1 1233 1 0 1 ... 0
I want to create a dataframe that contains a mean sentiment value in each cell like this:
community_id topic 1 topic 2 topic 3 ... topic k 1233 0.1 -0.8 0.5 ... 0.9 9845 -0.3 0.2 0.4 ... 0.1 ... ... ... ... ... ...
Any thoughts on how to go about this please? Thanks!
Advertisement
Answer
First you want to propagate the sentiment through the topic, then average out by community_id
:
(df.filter(like='topic') .mul(df.sentiment, axis=0) .groupby(df.community_id) .mean() )