Skip to content
Advertisement

Parsing out text without a tag

I have been trying to parse out text without any tags. Wanted to build a little scraping tool for myself to help find good DND games to play on Roll20 (I was going to take this data and attach it to a table within each link for the final goal).

The URL I am parsing out info is here: Roll20 Link

I had an idea to try to parse out the text and then put each new line into a list of its own and grab the elements needed. I wanted to grab the info on the game, current players, and current open slots. Here is the code I have done so far. Any suggestions on what I might need to do to scrape this particular data?

Here is my code:

import requests
from bs4 import BeautifulSoup
import time

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36'}
url = r'https://app.roll20.net/lfg/search//?page=0&days=thursday,friday&dayhours=1652932800,1653019200&frequency=onceweekly,biweekly,monthly&timeofday=&timeofday_seconds=&language=English&avpref=Any&gametype=Any&newplayer=false&yesmaturecontent=false&nopaytoplay=false&playingstructured=dnd_next&sortby=relevance&for_event=&roll20con='
r = requests.get(url, headers = headers)
soup = BeautifulSoup(r.text, 'html.parser')

time.sleep(2)

games= soup.find_all('tr', {'class': 'lfglisting'})

game_urls = []

for item in games:
    # item_title = item.find('a', {'class': 'lfglistingname'}).text
    # item_url = 'https://app.roll20.net' + item.find('a', {'class': 'lfglistingname'})['href']
    current_players = item.get_text("n", strip=True)
    print(current_players)
    # try:
    #   item_game = item.find('strong', {'class': 'label label-success'}).text
    # except:
    #   item_game = 'Role-Playing Game'
    # try: 
    #   item_pay = item.find('strong', {'class': 'label label-danger'}).text
    # except:
    #   item_pay = 'Free to Play'
    # try:
    #   item_welcome = item.find('strong', {'class': 'label label-info'}).text
    # except:
    #   item_welcome = 'Experts Only'
    # print(f"Game: {item_title}. URL: {item_url}. Notes on Game: {item_game}, {item_pay}, {item_welcome}")
    # game_urls.append(item_url)

# print(game_urls)

Advertisement

Answer

I started off by looking at the source code of the page, and searching for a know string. (like part of a game description). it seems every description is inside a <td class='gminfo'> but, its parent element, the <tr>, is more intresting as it contains all the desired data. Notice all of these <tr> tags have something in common – the data-listingid attribute.

so let’s get all of those.

for x in soup.select('tr[data-listingid]'):
    print(x.text.strip())

then, we start parsing, with regex.

import re

def print_data(dct):   
    for item, amount in dct.items():  
        print(f"{item} {'-'*(30 - len(item))} {amount}")


soup = BeautifulSoup(r.text, 'html.parser')

listings = soup.select('tr[data-listingid]')

listings_count = len(listings)

print (f"Expecting {listings_count} listings")
parsed_listings = []

for listing in listings:
    game = listing.text.strip()    
    try:
        name = re.search("n{6}(.*)",game).group(1)
        info = re.search("n{3} (.*)",game).groups(1)[0] + "..."
        curent_players = re.search("(.*) Current Players",game).groups(0)[0]
        open_slots = re.search("((.*) Open Slots",game).groups(0)[0]
        game = {"Name": name, "Info": info, "Current_Players": curent_players, "Open_Slots": open_slots}
        parsed_listings.append(game)
        print_data(game)
        print ("n=======n")
    except Exception as e:        
        # print (e)
        pass

print (f"parsed {len(parsed_listings)} of {listings_count} total")

Gives:

Expecting 30 listings
Name -------------------------- Curse of Strahd - Grim Hollow/High RP
Info -------------------------- Take this opportunity to play the most popular D&D module ever made with an expert DM who cares about your backstory and wants to...
Current_Players --------------- 1
Open_Slots -------------------- 5

=======

Name -------------------------- The Dragon of Icespire Peak (Monday)
Info -------------------------- Dragon of Icespire Peak is the introductory adventure for the 5th Edition Starter Set, designed for PC levels 1 – 6. It is a...
Current_Players --------------- 1
Open_Slots -------------------- 6

=======

Name -------------------------- Necropolis
Info -------------------------- What ancient horrors lie slumbering in a newly discovered tomb deep in Egypt's Valley of the Kings? Are you allowing local superstitions and the...
Current_Players --------------- 1
Open_Slots -------------------- 4

=======

Name -------------------------- Weekly One-shots (Monday)
Info -------------------------- My car for my primary means of income (Uber) has died and I'm **urgently** trying to raise funds to replace it. If you'd like...
Current_Players --------------- 1
Open_Slots -------------------- 7

=======

Name -------------------------- dragonball z 
Info -------------------------- hello all those to whom love dragonball z! i have never DM before but i am willing to give it a chance. im trying...
Current_Players --------------- 1
Open_Slots -------------------- 3

=======

Name -------------------------- Weekly One-shots (Monday)
Info -------------------------- My car for my primary means of income (Uber) has died and I'm **urgently** trying to raise funds to replace it. If you'd like...
Current_Players --------------- 1
Open_Slots -------------------- 7

=======

Name -------------------------- Larula's Tomb
Info -------------------------- 3 Hour, Level 3 One Shot. Gritty, old school feel. Death possible. Backup characters provided. Roll 3d6 straight for stats. Roll for HP. The...
Current_Players --------------- 1
Open_Slots -------------------- 6

=======

Name -------------------------- Vast Stories of Erstonia
Info -------------------------- Vast Stories of Erstonia is a D&D 5e group devoted to playing a series of oneshots provided by the DM. The adventures will be...
Current_Players --------------- 1
Open_Slots -------------------- 4

=======

Name -------------------------- Beasts of Fortune 2
Info -------------------------- The Beasts of Fortune seeks adventures seeking fame, fortune, honor, or just a reason to smack some heads, come one come all to join...
Current_Players --------------- 1
Open_Slots -------------------- 20

=======
...
parsed 22 of 30 total

this is by no means a perfect solution, the parsing isn’t perfect at all, but it should get you going.

Of course run this over each page # you want. (the /?page=0 in the url) If you want the full description of the listing, you’re gonna have to GET it, specifically the Read More <a> tag.

enter image description here

but then you can’t use listing.text as it strips it away.

Also, this isn’t legal advice or anything, but I wouldn’t be surprised if this is against their site policy, so be wary.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement