I have two columns – one with sentences and the other with single words.
Sentence | word |
---|---|
“Such a day! It’s a beautiful day out there” | “beautiful” |
“Such a day! It’s a beautiful day out there” | “day” |
“I am sad by the sad weather” | “weather” |
“I am sad by the sad weather” | “sad” |
I want to count the frequency of the “word” column in the “sentence” column and achieve this output:
Sentence | word | n |
---|---|---|
“Such a day! It’s a beautiful day out there” | “beautiful” | 1 |
“Such a day! It’s a beautiful day out there” | “day” | 2 |
“I am sad by the sad weather” | “weather” | 1 |
“I am sad by the sad weather” | “sad” | 2 |
I tried:
ok = [] for l in [x.split() for x in df['Sentence']]: for y in df['word']: ok.append(l.count(y))
However it does NOT stop running and takes A VERY long time, so is not feasible for my actual dataset as it has 50k rows.
Anyone can help to achieve this?
Advertisement
Answer
You can do it with zip
df['new'] = [x.count(y) for x, y in zip(df.Sentence,df.word)] df Out[419]: Sentence word new 0 Such a day! It's a beautiful day out there beautiful 1 1 Such a day! It's a beautiful day out there day 2 2 I am sad by the sad weather weather 1 3 I am sad by the sad weather sad 2