Web scraping the data from multiple TOC using python or R

Question

I am new to web scraping. I would like to collect the data from: https://www.sec.gov/Archives/edgar/data/814453/000119312518067603/d494599d10k.htm#tx494599_11 I can see a lot of TOCs are there. I would like to scrape the "Income before income taxes" word with the amount. Please share idea and throw some lights on this. Answer This will give your all the things from the table, you can

Accepted Answer

This will give your all the things from the table, you can just find the specific you want:import urllib2from bs4 import BeautifulSoupquote_page = 'https://www.sec.gov/Archives/edgar/data/814453/000119312518067603/d494599d10k.htm#tx494599_11'page = urllib2.urlopen(quote_page)soup = BeautifulSoup(page, 'html.parser')header = soup.find("b", text="2017 (1)")table = header.find_parent("table")for row in table.find_all("tr")[2:]:    print([cell.get_text(strip=True) for cell in row.find_all("td")])

Advertisement

Answer