Parsing out text without a tag

Question

I have been trying to parse out text without any tags. Wanted to build a little scraping tool for myself to help find good DND games to play on Roll20 (I was going to take this data and attach it to a table within each link for the final goal). The URL I am parsing out info is here: Roll20

Accepted Answer

I started off by looking at the source code of the page, and searching for a know string. (like part of a game description).it seems every description is inside a but, its parent element, the , is more intresting as it contains all the desired data.Notice all of these tags have something in common – the data-listingid attribute.so let’s get all of those.for x in soup.select('tr[data-listingid]'): print(x.text.strip())then, we start parsing, with regex.import redef print_data(dct): for item, amount in dct.items(): print(f"{item} {'-'*(30 - len(item))} {amount}")soup = BeautifulSoup(r.text, 'html.parser')listings = soup.select('tr[data-listingid]')listings_count = len(listings)print (f"Expecting {listings_count} listings")parsed_listings = []for listing in listings: game = listing.text.strip() try: name = re.search("n{6}(.*)",game).group(1) info = re.search("n{3} (.*)",game).groups(1)[0] + "..." curent_players = re.search("(.*) Current Players",game).groups(0)[0] open_slots = re.search("((.*) Open Slots",game).groups(0)[0] game = {"Name": name, "Info": info, "Current_Players": curent_players, "Open_Slots": open_slots} parsed_listings.append(game) print_data(game) print ("n=======n") except Exception as e: # print (e) passprint (f"parsed {len(parsed_listings)} of {listings_count} total")Gives:Expecting 30 listingsName -------------------------- Curse of Strahd - Grim Hollow/High RPInfo -------------------------- Take this opportunity to play the most popular D&D module ever made with an expert DM who cares about your backstory and wants to...Current_Players --------------- 1Open_Slots -------------------- 5=======Name -------------------------- The Dragon of Icespire Peak (Monday)Info -------------------------- Dragon of Icespire Peak is the introductory adventure for the 5th Edition Starter Set, designed for PC levels 1 – 6. It is a...Current_Players --------------- 1Open_Slots -------------------- 6=======Name -------------------------- NecropolisInfo -------------------------- What ancient horrors lie slumbering in a newly discovered tomb deep in Egypt's Valley of the Kings? Are you allowing local superstitions and the...Current_Players --------------- 1Open_Slots -------------------- 4=======Name -------------------------- Weekly One-shots (Monday)Info -------------------------- My car for my primary means of income (Uber) has died and I'm **urgently** trying to raise funds to replace it. If you'd like...Current_Players --------------- 1Open_Slots -------------------- 7=======Name -------------------------- dragonball z Info -------------------------- hello all those to whom love dragonball z! i have never DM before but i am willing to give it a chance. im trying...Current_Players --------------- 1Open_Slots -------------------- 3=======Name -------------------------- Weekly One-shots (Monday)Info -------------------------- My car for my primary means of income (Uber) has died and I'm **urgently** trying to raise funds to replace it. If you'd like...Current_Players --------------- 1Open_Slots -------------------- 7=======Name -------------------------- Larula's TombInfo -------------------------- 3 Hour, Level 3 One Shot. Gritty, old school feel. Death possible. Backup characters provided. Roll 3d6 straight for stats. Roll for HP. The...Current_Players --------------- 1Open_Slots -------------------- 6=======Name -------------------------- Vast Stories of ErstoniaInfo -------------------------- Vast Stories of Erstonia is a D&D 5e group devoted to playing a series of oneshots provided by the DM. The adventures will be...Current_Players --------------- 1Open_Slots -------------------- 4=======Name -------------------------- Beasts of Fortune 2Info -------------------------- The Beasts of Fortune seeks adventures seeking fame, fortune, honor, or just a reason to smack some heads, come one come all to join...Current_Players --------------- 1Open_Slots -------------------- 20=======...parsed 22 of 30 totalthis is by no means a perfect solution, the parsing isn’t perfect at all, but it should get you going.Of course run this over each page # you want. (the /?page=0 in the url)If you want the full description of the listing, you’re gonna have to GET it, specifically the Read More tag.but then you can’t use listing.text as it strips it away.Also, this isn’t legal advice or anything, but I wouldn’t be surprised if this is against their site policy, so be wary.

Advertisement

Answer