How do I make a crawler extracting information from relative paths?

Question

I am trying to make a simple crawler that extracts links from the &#8220;See About&#8221; section from this link https://en.wikipedia.org/wiki/Web_scraping. That is 19 links in total, which I have managed to extract using Beautiful Soup. However I get them as relative links in a list, which I also need to fix…

Accepted Answer

You should split your code like you split your problems.Your first problem was to get a list, so you could write a method called get_urls def get_urls():     url = 'https://en.wikipedia.org/wiki/Web_scraping'     data = requests.get(url).text     soup = BeautifulSoup(data, 'html.parser')     links = soup.find('div', {'class':'div-col'})     data = []     for link in links.find_all('a'):         data.append("https://en.wikipedia.org"+link.get('href'))     return dataYou wanted to get the first paragraph of every url. With little research i just got this one def get_first_paragraph(url):     data = requests.get(url).text     soup = BeautifulSoup(data, 'html.parser')     return soup.p.textnow all has to be wired up def iterate_through_urls(urls):     for url in urls:         print(get_first_paragraph(url)) def run():     urls = get_urls()     iterate_through_urls(urls)

Advertisement

Answer