I have a list of terms in Python that look like this.
Fruit apple banana grape orange
As well as a list of individual sentences that may contain the name of that fruit in a data frame. Something similar to this:
Customer Review 1 ['the banana was delicious','he called the firetruck','I had only half an orange'] 2 ['I liked the banana','there was a worm in my apple','Cantaloupes are better then melons'] 3 ['It could use some more cheese','the grape and orange was sour']
And I want to take the sentences in the review column, match them with the fruit mentioned in the text and print out a data frame of that as a final result. So, something like this:
Fruit Review apple ['the banana was delicious','I liked the banana'] banana ['there was a worm in my apple'] grape ['the grape and orange was sour'] orange ['the grape and orange was sour','I had only half an orange']
Hoe could I go about doing this?
Advertisement
Answer
While the exact answer depends on how you’re storing the data, I think the methodology is the same:
- Create and store an empty list for every fruit name to store its reviews
- For each review, check each of the fruits to see if they appear. If a fruit appears in the comment at all, add the review to that fruit’s list
Here’s an example of what that would look like:
#The list of fruits fruits = ['apple', 'banana', 'grape', 'orange'] #The collection of reviews (based on the way it was presented, I'm assuming it was in a dictionary) reviews = { '1':['the banana was delicious','he called the firetruck','I had only half an orange'], '2':['I liked the banana','there was a worm in my apple','Cantaloupes are better then melons'], '3':['It could use some more cheese','the grape and orange was sour'] } fruitDictionary = {} #1. Create and store an empty list for every fruit name to store its reviews for fruit in fruits: fruitDictionary[fruit] = [] for customerReviews in reviews.values(): #2. For each review,... for review in customerReviews: #...check each of the fruits to see if they appear. for fruit in fruits: # If a fruit appears in the comment at all,... if fruit.lower() in review: #...add the review to that fruit's list fruitDictionary[fruit].append(review)
This differs from previous answers in that sentences like “I enjoyed this grape. I thought the grape was very juicy” are only added to the grape section once.
If your data is stored as a list of lists, the process is very similar:
#The list of fruits fruits = ['apple', 'banana', 'grape', 'orange'] #The collection of reviews reviews = [ ['the banana was delicious','he called the firetruck','I had only half an orange'], ['I liked the banana','there was a worm in my apple','Cantaloupes are better then melons'], ['It could use some more cheese','the grape and orange was sour'] ] fruitDictionary = {} #1. Create and store an empty list for every fruit name to store its reviews for fruit in fruits: fruitDictionary[fruit] = [] for customerReviews in reviews: #2. For each review,... for review in customerReviews: #...check each of the fruits to see if they appear. for fruit in fruits: # If a fruit appears in the comment at all,... if fruit.lower() in review: #...add the review to that fruit's list fruitDictionary[fruit].append(review)