Skip to content
Advertisement

String after not visible when scraping beautifulsoup

I’m scraping news article. Here is the link. enter image description here

So I want to get that “13” string inside comment__counter total_comment_share class. As you can see that string is visible on inspect element and you can try it yourself from the link above. But when I did find() and print, that string is invisible so I can’t scrape it. This is my code:

a = 'https://tekno.kompas.com/read/2020/11/12/08030087/youtube-down-pagi-ini-tidak-bisa-memutar-video'
b = requests.get(a)
c = (b.content)
d = BeautifulSoup(c)
e = d.find('div', {'class', 'social--inline eee'})

f = d.find('div', {'class', 'comment__read__text'})
print(f)

From my code I’m using find() on comment__read__text class to make it more clear I can find the elements but that “13” string. The result is same if I’m using find() on comment__counter total_comment_share class. This is the output from code above:

<div class="comment__read__text">
<a href="http://tekno.kompas.com/komentar/2020/11/12/08030087/youtube-down-pagi-ini-tidak-bisa-memutar-video">Komentar <div class="comment__counter total_comment_share"></div></a>
</div>

As you can see the “13” string is not there. Anyone knows why? Any help would be appreciated.

Advertisement

Answer

it’s because a request was made while the page was loading which makes the page renders the content dynamically. Try this out:

import requests

a = 'https://tekno.kompas.com/read/2020/11/12/08030087/youtube-down-pagi-ini-tidak-bisa-memutar-video'
b = requests.get('https://apis.kompas.com/api/comment/list?urlpage={}&json&limit=1'.format(a))
c = b.json()
f = c["result"]["total"]
print(f)

PS: if you’re interested in scraping all the comments, just change limit to 100000 which will get you all the comments in one request as JSON.

Advertisement