extract values into new column for each unique values in another column

I have a dataframe and a sample of it looks like this

review_id   ngram   date    rating          attraction   indo
4           bigram  2021        10          uss          sangat lengkap
359         bigram  2019        10          uss          sangat lengkap
911         bigram  2018        10          uss          sangat lengkap
977         bigram  2018        10          uss          sangat lengkap
1062        bigram  2019        10          uss          agak bingung
2919        bigram  2019        9           uss          agak bingung
3531        bigram  2018        10          uss          sangat lengkap
4282        bigram  2019        10          sea_aquarium sangat lengkap

JavaScript
​x
 
review_id   ngram   date    rating          attraction   indo
4           bigram  2021        10          uss          sangat lengkap
359         bigram  2019        10          uss          sangat lengkap
911         bigram  2018        10          uss          sangat lengkap
977         bigram  2018        10          uss          sangat lengkap
1062        bigram  2019        10          uss          agak bingung
2919        bigram  2019        9           uss          agak bingung
3531        bigram  2018        10          uss          sangat lengkap
4282        bigram  2019        10          sea_aquarium sangat lengkap
​

I would like to extract the review_id into a list for each word in indo column such that the output would be something like this

I tried the following code but it does not work as it returns the review_id of all counts that are more that one which may or may not be the same words in the indo column.

df_sentiment['count'] = df_sentiment['indo'].value_counts()

def get_all_review_id():
    all_review_id = []
    for i in range(len(df_sentiment)):

        if df_sentiment['count'][i] > 1:
            all_review_id.append(df_sentiment['review_id'][i])
    return all_review_id

df_sentiment["all_review_id"] = df_sentiment['indo'].progress_apply(lambda x: get_all_review_id(x))

JavaScript
 
df_sentiment['count'] = df_sentiment['indo'].value_counts()
​
def get_all_review_id():
    all_review_id = []
    for i in range(len(df_sentiment)):
​
        if df_sentiment['count'][i] > 1:
            all_review_id.append(df_sentiment['review_id'][i])
    return all_review_id
​
df_sentiment["all_review_id"] = df_sentiment['indo'].progress_apply(lambda x: get_all_review_id(x))
​

Any suggestions / code I can use? Thank you!

Answer

if you share the data, I can reproduce and add the result

This hopefully will answer your question

df.groupby(['ngram','date','rating','attraction','indo'])['review_id'].agg(list).reset_index()

JavaScript
 
df.groupby(['ngram','date','rating','attraction','indo'])['review_id'].agg(list).reset_index()
​

    ngram   date    rating  attraction   indo               review_id
0   bigram  2018    10      uss          sangat lengkap     [911, 977, 3531]
1   bigram  2019    9       uss          agak bingung       [2919]
2   bigram  2019    10      sea_aquarium sangat blengkap    [4282]
3   bigram  2019    10      uss          agak bingung       [1062]
4   bigram  2019    10      uss          sangat lengkap     [359]
5   bigram  2021    10      uss          sangat lengkap     [4]

JavaScript
 
    ngram   date    rating  attraction   indo               review_id
0   bigram  2018    10      uss          sangat lengkap     [911, 977, 3531]
1   bigram  2019    9       uss          agak bingung       [2919]
2   bigram  2019    10      sea_aquarium sangat blengkap    [4282]
3   bigram  2019    10      uss          agak bingung       [1062]
4   bigram  2019    10      uss          sangat lengkap     [359]
5   bigram  2021    10      uss          sangat lengkap     [4]
​

Advertisement

Answer