I have a dataframe and a sample of it looks like this
JavaScript
x
10
10
1
review_id ngram date rating attraction indo
2
4 bigram 2021 10 uss sangat lengkap
3
359 bigram 2019 10 uss sangat lengkap
4
911 bigram 2018 10 uss sangat lengkap
5
977 bigram 2018 10 uss sangat lengkap
6
1062 bigram 2019 10 uss agak bingung
7
2919 bigram 2019 9 uss agak bingung
8
3531 bigram 2018 10 uss sangat lengkap
9
4282 bigram 2019 10 sea_aquarium sangat lengkap
10
I would like to extract the review_id into a list for each word in indo column such that the output would be something like this
I tried the following code but it does not work as it returns the review_id of all counts that are more that one which may or may not be the same words in the indo column.
JavaScript
1
12
12
1
df_sentiment['count'] = df_sentiment['indo'].value_counts()
2
3
def get_all_review_id():
4
all_review_id = []
5
for i in range(len(df_sentiment)):
6
7
if df_sentiment['count'][i] > 1:
8
all_review_id.append(df_sentiment['review_id'][i])
9
return all_review_id
10
11
df_sentiment["all_review_id"] = df_sentiment['indo'].progress_apply(lambda x: get_all_review_id(x))
12
Any suggestions / code I can use? Thank you!
Advertisement
Answer
if you share the data, I can reproduce and add the result
This hopefully will answer your question
JavaScript
1
2
1
df.groupby(['ngram','date','rating','attraction','indo'])['review_id'].agg(list).reset_index()
2
JavaScript
1
8
1
ngram date rating attraction indo review_id
2
0 bigram 2018 10 uss sangat lengkap [911, 977, 3531]
3
1 bigram 2019 9 uss agak bingung [2919]
4
2 bigram 2019 10 sea_aquarium sangat blengkap [4282]
5
3 bigram 2019 10 uss agak bingung [1062]
6
4 bigram 2019 10 uss sangat lengkap [359]
7
5 bigram 2021 10 uss sangat lengkap [4]
8