I have a dataframe and a sample of it looks like this
review_id ngram date rating attraction indo 4 bigram 2021 10 uss sangat lengkap 359 bigram 2019 10 uss sangat lengkap 911 bigram 2018 10 uss sangat lengkap 977 bigram 2018 10 uss sangat lengkap 1062 bigram 2019 10 uss agak bingung 2919 bigram 2019 9 uss agak bingung 3531 bigram 2018 10 uss sangat lengkap 4282 bigram 2019 10 sea_aquarium sangat lengkap
I would like to extract the review_id into a list for each word in indo column such that the output would be something like this
I tried the following code but it does not work as it returns the review_id of all counts that are more that one which may or may not be the same words in the indo column.
df_sentiment['count'] = df_sentiment['indo'].value_counts() def get_all_review_id(): all_review_id = [] for i in range(len(df_sentiment)): if df_sentiment['count'][i] > 1: all_review_id.append(df_sentiment['review_id'][i]) return all_review_id df_sentiment["all_review_id"] = df_sentiment['indo'].progress_apply(lambda x: get_all_review_id(x))
Any suggestions / code I can use? Thank you!
Advertisement
Answer
if you share the data, I can reproduce and add the result
This hopefully will answer your question
df.groupby(['ngram','date','rating','attraction','indo'])['review_id'].agg(list).reset_index()
ngram date rating attraction indo review_id 0 bigram 2018 10 uss sangat lengkap [911, 977, 3531] 1 bigram 2019 9 uss agak bingung [2919] 2 bigram 2019 10 sea_aquarium sangat blengkap [4282] 3 bigram 2019 10 uss agak bingung [1062] 4 bigram 2019 10 uss sangat lengkap [359] 5 bigram 2021 10 uss sangat lengkap [4]