How to extract link to Package Sources from Arch User Repository (AUR) website

Question

I&#8217;m using BeautifulSoup to extract this line: from a webpage. Specifically, I want this part: iwgtk-0.8.tar.gz I&#8217;ve written this code: and I assume it is this line that fails. I&#8217;ve tried but that failed too. Answer Try to select your elements more specific: or more comfortable via css select…

Accepted Answer

Try to select your elements more specific:soup.find('ul',{'id':'pkgsrcslist'}).find_all('a')or more comfortable via css selectorsoup.select('#pkgsrcslist a')and use get('href') to get the url or text / get_text() to get its text or use both and store as key value in dict:...soup = BeautifulSoup(page.text, 'html.parser')pkgs = {}for tag in soup.select('#pkgsrcslist a'):    print('url: ' +tag.get('href'))    print('text: ' + tag.text)    ### update your a dict of package versions and links    pkgs.update({        tag.text:tag.get('href')    })Examplefrom bs4 import BeautifulSoupimport requestsurl="https://aur.archlinux.org/packages/iwgtk"page = requests.get(url)if page.status_code ==200:    soup = BeautifulSoup(page.text, 'html.parser')    pkgs = {}    for tag in soup.select('#pkgsrcslist a'):        pkgs.update({            tag.text:tag.get('href')        })print(pkgs)Output{'iwgtk-0.8.tar.gz': 'https://github.com/J-Lentz/iwgtk/archive/v0.8.tar.gz'}

Advertisement

Answer

Example

Output