If my class names are constantly different say for example:
listing-col-line-3-11 dpt 41 listing-col-block-1-22 dpt 41 listing-col-line-4-13 CWK 12
Normally I could do:
for EachPart in soup.find_all("div", {"class" : "ClassNamesHere"}): print EachPart.get_text()
There are way too many class names to work with here so a bunch of these are out.
I know Python doesn’t have a “.contains” I would normally use but it does have an “in”. Though I haven’t been able to work out a way to incorporate that.
I’m hoping there’s a way to do this with regex. Though again my Python syntax is really letting me down I’ve been trying variations on:
regex = re.compile('.*listing-col-.*') for EachPart in soup.find_all(regex):
But that doesn’t seem to be doing the trick.
Advertisement
Answer
BeautifulSoup supports CSS selectors which allow you to select elements based on the content of particular attributes. This includes the selector *=
for contains.
The following will return all div
elements with a class
attribute containing the text ‘listing-col-‘:
for EachPart in soup.select('div[class*="listing-col-"]'): print EachPart.get_text()