Skip to content
Advertisement

Create table by grouping mean values by column and list of one-hot encoded columns (Python, pandas)

I am working with tweets and I would like to report the mean sentiment score by topic and by community.

This is what my dataframe looks like where each row is a document (tweet):

tweet_text        sentiment  community_id   topic_1   topic_2   topic_3    ...    topic_k
"blah blah blah"      0.7      1233             1       0         0        ...       1
"blah blah blah"     -0.4      9845             0       1         1        ...       0
"blah blah blah"      0.1      1233             1       0         1        ...       0

I want to create a dataframe that contains a mean sentiment value in each cell like this:

community_id   topic 1   topic 2   topic 3   ...    topic k
 1233           0.1       -0.8       0.5     ...       0.9
 9845          -0.3        0.2       0.4     ...       0.1
 ...            ...        ...       ...     ...       ...

Any thoughts on how to go about this please? Thanks!

Advertisement

Answer

First you want to propagate the sentiment through the topic, then average out by community_id:

(df.filter(like='topic')
   .mul(df.sentiment, axis=0)
   .groupby(df.community_id)
   .mean()
)
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement