I am working with tweets and I would like to report the mean sentiment score by topic and by community.
This is what my dataframe looks like where each row is a document (tweet):
JavaScript
x
5
1
tweet_text sentiment community_id topic_1 topic_2 topic_3 topic_k
2
"blah blah blah" 0.7 1233 1 0 0 1
3
"blah blah blah" -0.4 9845 0 1 1 0
4
"blah blah blah" 0.1 1233 1 0 1 0
5
I want to create a dataframe that contains a mean sentiment value in each cell like this:
JavaScript
1
5
1
community_id topic 1 topic 2 topic 3 topic k
2
1233 0.1 -0.8 0.5 0.9
3
9845 -0.3 0.2 0.4 0.1
4
5
Any thoughts on how to go about this please? Thanks!
Advertisement
Answer
First you want to propagate the sentiment through the topic, then average out by community_id
:
JavaScript
1
6
1
(df.filter(like='topic')
2
.mul(df.sentiment, axis=0)
3
.groupby(df.community_id)
4
.mean()
5
)
6