Skip to content
Advertisement

scraping multiple tags at once

I’m trying to scrape the imdb top 250 movies and I want to get all the links of those movies from this page https://www.imdb.com/chart/top/ I tried

html = urlopen('https://www.imdb.com/chart/top/')
bs = BeautifulSoup(html,'html.parser')
links = []
for link in bs.find('td',{'class':'titleColumn'}).find_all('a'):
    links.append(link['href'])
print(links)

but I’m only getting the first link only, so my question is how to scale this code to include the whole list of 250 movies?

Advertisement

Answer

bs.find('td',{'class':'titleColumn'}) gives you the first entry, and find_all('a') gives you all the <a> tags under that entry. To find all the entries you can use

for link in bs.select('td.titleColumn > a'):
    links.append(link['href'])

If you still want to iterate over the titles list and extract more information you need to locate all the titles and extract <a> from each one

for title in bs.find_all('td', {'class': 'titleColumn'}):
    links.append(title.find('a')['href'])
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement