Web scraping: help needed last post and find link

Question

First, sorry for my poor English.
Actually, I have a script which scrapes a website to find comments in webpage, in python.
Its for scrape all messages in page, but I will want scrape just last post. How to do this please?
Too, I will want to find web links probably posted in last message, but a full link.
Its possible?
Here is the webpage link and script:

Accepted Answer

First I&#8217;d like you to use correct locators, so instead of /html/body/main/div[4]/div[1]/div/div[1]/div[2]/button[2]/span try using this CSS selector .btn--mode-primary.overflow--wrap-on.In order to get the last comment you can use this XPath: (//div[@class='commentList-item'])[last()]So in order to get the last comment details only your code can be modified to be like this:#!/usr/bin/env python3# https://www.jeuxvideo.com/forums/42-47-66784467-1-0-1-0-aide-scraping-python-forum-dealabs.htm# scraping_dealabs.pyfrom selenium import webdriverfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.chrome.options import Optionsfrom selenium.webdriver.support import expected_conditions as ECfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.common.by import By from selenium.webdriver.common.action_chains import ActionChainsurl = "https://www.dealabs.com/discussions/suivi-erreurs-de-prix-1063390?page=9999"options = Options()options.headless = Truedriver = webdriver.Chrome(options=options)driver.get(url)actions = ActionChains(driver)# Accepter les cookiesWebDriverWait(driver, 2).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".btn--mode-primary.overflow--wrap-on"))).click()last_comment = driver.find_element_by_xpath("(//div[@class='commentList-item'])[last()]")actions.move_to_element(last_comment).perform()time.sleep(0.5)last_comment = driver.find_element_by_xpath("(//div[@class='commentList-item'])[last()]")_id = last_comment.get_attribute("id")author = last_comment.find_element_by_xpath(".//span[contains(@class,'userInfo-username')]").textcontent = last_comment.find_element_by_xpath(".//*[contains(@class,'userHtml-content')]").texttimestamp = last_comment.find_element_by_xpath(".//*[contains(@class,'text--color-greyShade')]").textcomment_url = f"{url}#{_id}"print("Posté par", author)print(content)print("Publication:", timestamp)print("Lien du commentaire:")print(comment_url)print('-' * 30)driver.close()UPDTo get the last element on the page, as you described in the comments, you have to change the locator fromlast_comment = driver.find_element_by_xpath("(//div[@class='commentList-item'])[last()]")tolast_comment = driver.find_element_by_xpath("(//div[@class='commentList-comment'])[last()]")So that entire code above will be:#!/usr/bin/env python3# https://www.jeuxvideo.com/forums/42-47-66784467-1-0-1-0-aide-scraping-python-forum-dealabs.htm# scraping_dealabs.pyfrom selenium import webdriverfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.chrome.options import Optionsfrom selenium.webdriver.support import expected_conditions as ECfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.common.by import By from selenium.webdriver.common.action_chains import ActionChainsurl = "https://www.dealabs.com/discussions/suivi-erreurs-de-prix-1063390?page=9999"options = Options()options.headless = Truedriver = webdriver.Chrome(options=options)driver.get(url)actions = ActionChains(driver)# Accepter les cookiesWebDriverWait(driver, 2).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".btn--mode-primary.overflow--wrap-on"))).click()last_comment = driver.find_element_by_xpath("(//div[@class='commentList-comment'])[last()]")actions.move_to_element(last_comment).perform()time.sleep(0.5)last_comment = driver.find_element_by_xpath("(//div[@class='commentList-comment'])[last()]")_id = last_comment.get_attribute("id")author = last_comment.find_element_by_xpath(".//span[contains(@class,'userInfo-username')]").textcontent = last_comment.find_element_by_xpath(".//*[contains(@class,'userHtml-content')]").texttimestamp = last_comment.find_element_by_xpath(".//*[contains(@class,'text--color-greyShade')]").textcomment_url = f"{url}#{_id}"print("Posté par", author)print(content)print("Publication:", timestamp)print("Lien du commentaire:")print(comment_url)print('-' * 30)driver.close()

Advertisement

Answer