Better way of capturing multiple same tags?

Question

I’m trying to create an scraper which scrapes download links, I want to use regex but that would be a nightmare for me to do, I’ve found this library which is called BeautifulSoup, I’m trying to capture the urls in the children of div class=”article-content” which is

ta…

Accepted Answer

You might want to try this:import jsonimport reimport requestsfrom bs4 import BeautifulSoupdef scrape(source_url):    soup = BeautifulSoup(        requests.get(source_url).text,        'html.parser',    )    headers = [        h.getText() for h in soup.find_all("h3") if "Direct" in h.getText()    ]    links = [        anchor["href"] for anchor        in soup.find_all(lambda t: t.name == "a" and "Direct" in t.text)    ]    return {        header: [            link for link in links            if re.search(r"d{3,4}p", header).group(0) in link        ] for header in headers    }data = scrape("https://www.animeout.xyz/love-live-nijigasaki-gakuen-school-idol-doukoukai-1080p-300mb720p-150mbepisode-1/")print(json.dumps(data, indent=2))The reason you have one key only is that keys have to be unique but the names of the links are not. Change this with something unique, for example, an index number or the series title with the resolution.Sample output:{  "Love Live! Nijigasaki Gakuen School Idol Doukoukai (main) Direct Download Links (300MB u2013 1080p)(Encoded)": [    "http://nimbus.animeout.com/series/00RAPIDBOT/Love Live Nijigasaki Gakuen School Idol Doukoukai/[AnimeOut] Love Live Nijigasaki Gakuen School Idol Doukoukai - 01 [1080pp][1080pp][Erai-raws][RapidBot].mkv",    "http://nimbus.animeout.com/series/00RAPIDBOT/Love Live Nijigasaki Gakuen School Idol Doukoukai/[AnimeOut] Love Live Nijigasaki Gakuen School Idol Doukoukai - 01 [v2][1080pp][1080pp][Erai-raws][RapidBot].mkv",    "http://nimbus.animeout.com/series/00RAPIDBOT/Love Live Nijigasaki Gakuen School Idol Doukoukai/[AnimeOut] Love Live Nijigasaki Gakuen School Idol Doukoukai - 01 [1080pp][1080pp][Erai-raws][RapidBot].mkv",and so on ...

Advertisement

Answer