Skip to content
Advertisement

Less Frequent Words appearing bigger – WordCloud in Python

I have been plotting the wordcloud using the wordcloud package from Python. Here’s a sample of the code:

from wordcloud import WordCloud, STOPWORDS
import matplotlib
import matplotlib.pyplot as plt
stopwords = set(STOPWORDS)

def show_wordcloud(data, title = None):
    wordcloud = WordCloud(
        background_color='black',
        stopwords=stopwords,
        max_words=200,
        max_font_size=40, 
        scale=3,
        random_state=1 # chosen at random by flipping a coin; it was heads
).generate(str(data))

    fig = plt.figure(1, figsize=(15, 15))
    plt.axis('off')
    if title: 
        fig.suptitle(title, fontsize=20)
        fig.subplots_adjust(top=2.3)
    matplotlib.rcParams.update({'font.size': 22})
    plt.title('Most Used Words for Emotion Tag 2 (What is the highlight?)')    
    plt.imshow(wordcloud)
    plt.savefig('2.jpg')
    plt.show()

show_wordcloud(df2['words'])

enter image description here

Now, what I understood from the official documentation of Wordcloud is that, most frequent non-stop words appear to be bigger, but here chirping is appearing than Bengal. But then when I check out the frequency of chirping:

In [20]: df2[df2['words'].str.contains("Chirping")]
Out[20]:    words             tagid
           Chirping of birds    2
           Chirping of birds    2

And now, when I check the frequency of Bengal:

In [21]: df2[df2['words'].str.contains("Bengal")]
Out[21]:     words                 tagid
        The mighty Bay Of Bengal    2
        Royal Bengal Tigers🐯       2
        #NammaBengaluru             2
        Traditional Bengali Meal    2
        Royal Bengal Tiger          2
        Enterning Taj Bengal.       2

“Bengal” is appearing small in “yellow” color just below the word “Part” left of “Trekking”. Now I’m not able to understand why is that happening, or how I can fix that. Also I want to know is there a way to remove prepositions from wordcloud, like at, beside, inside, etc.

Is there a way I can assign weightage or frequency and then plot the wordcloud?

Advertisement

Answer

Can you post a sample output of the ‘data’ variable? It might be possible that the entire text while passing it to the canvas object.

You can assign weightage based on the frequency of the words in a text using getFrequencyDictForText() API to get the frequency of the text and makeImage() to generate the canvas.

Please refer to the API documentation here: https://amueller.github.io/word_cloud/auto_examples/frequency.html#sphx-glr-auto-examples-frequency-py

Advertisement