I’m trying to scrape video from the website. I can find the video link using Chrome DevTools. But when I use BeautifulSoup to get the video link. The link is hidden. Please help modify the code below to get the video link.
There is the screenshot of the Chrome DevTools. Basically, I need the ‘src’ of the ‘video’ tag.
import re import urllib.request from bs4 import BeautifulSoup as BS url_video='http://s.weibo.com/video?q=%23%E6%AC%A7%E9%98%B3%E5%A6%AE%E5%A6%AE%23&xsort=hot&hasvideo=1&tw=video&Refer=weibo_video' #open and read page page=urllib.request.urlopen(url_video) html=page.read() #create BeautifulSoup parse-able "soup" soup = BS(html, "lxml") lst_url_video=[] print(soup.body.find_all('div',class_='thumbnail')[0])
Please help modify the code to get the video link.
Advertisement
Answer
There is a possibility that the site is using some client-side javascript to load some of its html content. When you make a request using urllib.request, it wont execute any client-side javascript. So if the site does load some of its html content via client-side javascript, you’ll need a javascript engine in order to run it (i.e. a web browser). You can use a headless browser to execute client-side javascript while scraping a web page. Here’s a guide to using chrome headless with puppeteer