Skip to content
Advertisement

Beautiful Soup if Class “Contains” or Regex?

If my class names are constantly different say for example:

listing-col-line-3-11 dpt 41
listing-col-block-1-22 dpt 41
listing-col-line-4-13 CWK 12

Normally I could do:

for EachPart in soup.find_all("div", {"class" : "ClassNamesHere"}):
            print EachPart.get_text()

There are way too many class names to work with here so a bunch of these are out.

I know Python doesn’t have a “.contains” I would normally use but it does have an “in”. Though I haven’t been able to work out a way to incorporate that.

I’m hoping there’s a way to do this with regex. Though again my Python syntax is really letting me down I’ve been trying variations on:

regex = re.compile('.*listing-col-.*')
    for EachPart in soup.find_all(regex):

But that doesn’t seem to be doing the trick.

Advertisement

Answer

BeautifulSoup supports CSS selectors which allow you to select elements based on the content of particular attributes. This includes the selector *= for contains.

The following will return all div elements with a class attribute containing the text ‘listing-col-‘:

for EachPart in soup.select('div[class*="listing-col-"]'):
    print EachPart.get_text()
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement