Skip to content
Advertisement

How do I traverse through a dataframe and get polarity score of existing text(transcript) so I have 1 row per id in python?

I am able to traverse through files in a directory with my script but unable to apply the same logic to when all the transcriptions are in a table/dataframe. My earlier script –

import os    
from glob import glob
import pandas as pd
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

files = glob('C:/Users/jj/Desktop/Bulk_Wav_Completed_CancelsvsSaves/*.csv')
sid = SentimentIntensityAnalyzer()
# use dict comprehension to apply you analysis
data = {os.path.basename(file): sid.polarity_scores(' '.join(pd.read_csv(file, encoding="utf-8")['transcript'])) for file in files}
# create a data frame from the dictionary above
df = pd.DataFrame.from_dict(data, orient='index')
df.to_csv("sentimentcancelvssaves.csv")

How do I apply the above to the below table where

 dfo
    Out[52]: 
                InteractionId             Agent  Transcript  
    0      100392327420210105      David Michel  hi how are you          
    1      100392327420210105      David Michel  yes i am not fine       
    2      100390719220210104    Mindy Campbell  .,xyz..        
    3      100390719220210104    Mindy Campbell  no       
    4      100390719220210104    Mindy Campbell  maybe    
                      ...               ...  ...       ...      ...
    93407  300390890320200915    Sandra Yacklin  ...   
    93408  300390890320200915    Sandra Yacklin  ...     
    93409  300390890320200915    Sandra Yacklin  ...     

So as you see here, I have a column interaction id which is unique. I my final data set to give me 1 row per id and I require to get the polarity scores of the sentiments attached to that id.

Desired output for 100390719220210104 –

        InteractionId             Agent     Transcript       Positive     Compound

2      100390719220210104    Mindy Campbell  xyz no maybe     0.190   0.5457

How can I do this for all interaction id? I was able to do it when i had to apply my script to all transcripts csvs in a directory and iterate through them all. However, how can I apply that to a dataframe where all the data is in one place and not different csvs

Advertisement

Answer

So rather than looping through the files, you are looping through the unique InteractionIds. You can get that using: for interaction_id in dfo['InteractionId'].unique()

And then you are joining the values in that column for that ID which you can get by:
' '.join(dfo[dfo['InteractionId'] == interaction_id]['Transcript'])

Putting it together you have:

import os
from glob import glob

import nltk
import pandas as pd
from nltk.sentiment.vader import SentimentIntensityAnalyzer

dfo = pd.DataFrame(
    data={
        'InteractionId': [
            100392327420210105,
            100390719220210104,
            100390719220210104,
            100390719220210104,
        ],
        'Transcript': ['hi how are you', '.,xyz..', 'no', 'maybe'],
    }
)

sid = SentimentIntensityAnalyzer()
# use dict comprehension to apply you analysis
data = {
    interaction_id: sid.polarity_scores(
        ' '.join(dfo[dfo['InteractionId'] == interaction_id]['Transcript'])
    )
    for interaction_id in dfo['InteractionId'].unique()
}

# create a data frame from the dictionary above
df = pd.DataFrame.from_dict(data, orient='index')
df.to_csv("sentimentcancelvssaves.csv")
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement