Skip to content
Advertisement

Getting availabilities from a dynamic website with BeatifulSoup

I am trying to scrape a website like this: https://seeksophie.com/options/1-5hr-basic-candle-workshop. From this website, I’d like to get all date schedules (for 1 year) for the activity, and all of dates in the website are in form of span component. It is important for me to get notAllowed and flatpickr-disabled class from the component as I will have to filter available dates from all of them by using those attributes. While on that, I have to also try to get all the times available for a certain date (helps will be very much appreciated), but I think that getting the span is the priority first. My approach for this is to iteratively clicking the next month button and get all spans while on it. Something like this:

    def find_all_span(self, soup):
        new_soup = soup.__copy__()
        all_spans = []
        for i in range(12):
            days_container = new_soup.find_all("div", {"class": "dayContainer"})
            spans = days_container[2].find_all("span")
            all_spans.extend(spans)
            next_month_clicker = self.page_loader.driver.find_element_by_id(
                "js-placeholder-booking-form-accommodation-date")
            self.page_loader.driver.execute_script("arguments[0].click();", next_month_clicker)
            next_month_clicker = self.page_loader.driver.find_elements_by_class_name("flatpickr-next-month")
            self.page_loader.driver.execute_script("arguments[0].click();", next_month_clicker[2])
            page_response = self.page_loader.driver.page_source
            new_soup = BeautifulSoup(page_response, 'html.parser')

            for span in spans:
                print(span["aria-label"])

        return list(set(all_spans))

Note that the soup is exactly what BeautifulSoup page response with HTML Parser. This will only generate all spans within approximately a month, and the click won’t change the page response to get more spans in next months. What can I do to solve this? Any other approach will also be okay.

Advertisement

Answer

Finally, after 3 hours :) I don’t want/going to explain all the wrong things that you have done in your script, but I am going to explain my code.

I have to execute all these JavaScript because website is not allowing me to click in next month button.(ie. If it works fine without executing these scripts then you may delete that JavaScript line). You are using html.parser as parser but I am using lxml because it is faster then html.parser and other things are straight forward, just clicking on next month button and scraping spans from source code. You can now do other things with these spans.

Here’s code

driver.get('https://seeksophie.com/options/1-5hr-basic-candle-workshop')

driver.execute_script("""document.querySelector("#js-booking-bottom-bar").remove()""")

n=driver.find_element_by_xpath("/html/body/div[3]/div[4]/div[5]/div/div/div[2]/div/div[1]/div/div[2]/div[1]/span[2]")

all_spans=[]
for i in range(12):
    page=driver.page_source
    soup=BeautifulSoup(page,"lxml")
    all_spans.extend(soup.find_all("div",class_="dayContainer")[1].find_all("span"))
    try:
        driver.execute_script("""document.querySelector("#js-modal-first-order-bonus").remove()""")
        driver.execute_script("""document.querySelector(".modal-backdrop").remove()""")
    except:
        pass
    n.click()

print(all_spans)

And finally if it helps you with your problem then don’t forget to mark this as answer.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement