I’m trying to scrape the imdb top 250 movies and I want to get all the links of those movies from this page https://www.imdb.com/chart/top/
I tried
html = urlopen('https://www.imdb.com/chart/top/') bs = BeautifulSoup(html,'html.parser') links = [] for link in bs.find('td',{'class':'titleColumn'}).find_all('a'): links.append(link['href']) print(links)
but I’m only getting the first link only, so my question is how to scale this code to include the whole list of 250 movies?
Advertisement
Answer
bs.find('td',{'class':'titleColumn'})
gives you the first entry, and find_all('a')
gives you all the <a>
tags under that entry. To find all the entries you can use
for link in bs.select('td.titleColumn > a'): links.append(link['href'])
If you still want to iterate over the titles list and extract more information you need to locate all the titles and extract <a>
from each one
for title in bs.find_all('td', {'class': 'titleColumn'}): links.append(title.find('a')['href'])