Scraping #document from an iframe tag using beautifulsoup

Tags: , , , ,

I am trying to scrape a website for COVID related data. The data is enclosed in an iframe tag. I tried to scrape the results using beautifulsoup but couldn’t extract #document. Here’s my approach

import requests
from bs4 import BeautifulSoup
with requests.Session() as s:
    coo = s.get("", headers={'User-Agent': 'Mozilla/5.0'})
    cookies = dict(coo.cookies)
    url = ""
    webpage = s.get(url, headers={'User-Agent': 'Mozilla/5.0'}, cookies = cookies)
    soup = BeautifulSoup(webpage.content, "html.parser")
    frame = soup.find("iframe", class_ = "interactive-atom-fence")

My results:

The end part

Inspect Data from website:

Website HTML code

Can somebody explain that why the #document part is missing from my results?


However, The Guardian offers an entire .csv file up for grabs, if you take a look at what’s going on in the Developer Tool.

Here’s how to grab data for Covid19 Gloabal Deaths:

import shutil

import requests

url = ""
data = requests.get(url, stream=True)
if data.status_code == 200:
    with open("covid19_data.csv", 'wb') as f:
        data.raw.decode_content = True
        shutil.copyfileobj(data.raw, f)

And if you swap the last part of the URL with time_series_covid19_confirmed_global.csv that’s what you’re going to get back as a .csv file.

Source: stackoverflow