Struggling with Selenium as a new backend developer

Question

I&#8217;m very new to web scraping and am trying to build an algorithm to pull all of the information from my school&#8217;s course catalog. What I have so far is: I&#8217;ve had much more but keep running into Selenium errors about not being able to locate the information when it is correct. Can anyone get m…

Accepted Answer

I&#8217;ve played around with your code and used it as a base for something a bit more what you&#8217;d expect.Try this:import textwrapfrom bs4 import BeautifulSoupfrom selenium import webdriverfrom selenium.webdriver.chrome.options import Optionsoptions = Options()options.headless = Falsedriver = webdriver.Chrome(options=options)driver.get("https://webapps.lsa.umich.edu/CrsMaint/Public/CB_PublicBulletin.aspx")driver.find_element_by_xpath('//*[@id="ContentPlaceHolder1_ddlPage"]/option[4]').click()driver.find_element_by_xpath('//*[@name="ctl00$ContentPlaceHolder1$ddlTerm"]/option[1]').click()driver.find_element_by_xpath('//*[@name="ctl00$ContentPlaceHolder1$ddlSubject"]/option[8]').click()driver.find_element_by_xpath('//*[@id="ContentPlaceHolder1_btnSearch"]').click()tables = BeautifulSoup(driver.page_source, "html.parser").find_all("td", {"valign": "top"})def wrapper(text: str, width: int = 120) -> str:    return "n".join(textwrap.wrap(text, width=width)) + "n"for table in tables:    try:        title = table.find("b").getText(strip=True)        course_info = " ".join(table.find("i").text.split())        desc = table.find("p").getText(strip=True)        urls = [f"{a.text.strip()} - {a['href']}" for a in table.find_all("a")]        print(title)        print(wrapper(course_info))        print(wrapper(desc))        print("n".join(urls) if urls else "No URL's found.")        print("-" * 120)    except AttributeError:        continuedriver.close()In my terminal the output (it&#8217;s just a small part of it, as there&#8217;s lots) looks like this:

Advertisement

Answer