I am trying to scrape this particular site with Python: https://www.milanofinanza.it/quotazioni/ricerca/listino-completo-2ae?refresh_cens.
I need to get all the isin codes and the names. My idea was to get them all in 2 separated lists, to do that I try to get the entire column (by changing the Xpath to tr rather than tr1) and then add it to the list.
My code goes through the pages but at a certain point just stop working (even if I add time.sleep(10)
to be sure that the code starts scraping when the site is fully loaded).
My code looks like this:
wd = wd.Chrome() wd.implicitly_wait(10) wd.get('https://www.milanofinanza.it/quotazioni/ricerca/listino-completo-2ae') company_name = [] isin = [] for n in range(0,15): time.sleep(10) tickers = wd.find_elements(By.XPATH,"//*[@id='mainbox']/div[2]/div[2]/div[4]/div/table/tbody/tr/td[1]") isin = wd.find_elements(By.XPATH,"/html/body/div[3]/div/div/div/main/div[1]/div[2]/div[2]/div[4]/div/table/tbody/tr/td[10]/span") for el in tickers: company_name.append(el.text) for i in isin: isin.append(i.text) l=wd.find_element(By.XPATH,"/html/body/div[2]/div/div/div/main/div[1]/div[2]/div[2]/div[4]/div/div[1]/div/button[4]") wd.execute_script("arguments[0].click();",l) print("data collected")
How can I solve this problem? Here some pictures to better understand:
Name:
Isin:
Advertisement
Answer
I’m sorry, but I can’t see how your existing code is working.
On my side I see the locator you are using for isin
matching nothing.
I have updated locators here.
I would advice you never using automatically created locators.
Also the code you presenting here is missing indentations. I hope your actual code having proper indentations.
Please see if this will work better:
wd = wd.Chrome() wd.get('https://www.milanofinanza.it/quotazioni/ricerca/listino-completo-2ae') company_name = [] isin = [] for n in range(0,15): time.sleep(10) tickers = wd.find_elements(By.XPATH,"//table[contains(@class,'celled')]//tbody//tr//td[1]") isins = wd.find_elements(By.XPATH,"//table[contains(@class,'celled')]//tbody//tr//td[10]") for el in tickers: company_name.append(el.text) for is_el in isins: isin.append(is_el.text) l=wd.find_element(By.XPATH,'//nav//button[@ng-click="getDataTableNextClick()"]') wd.execute_script("arguments[0].click();",l)