Extracting url from style: background-url: with beautifulsoup and without regex?

Question

I have: I want to get the url, however I don&#8217;t know how to do that without the use of regex. Is it even possible? so far my solution with regex is: Answer You could try using the cssutils package. Something like this should work: Although you are ultimately going to need to parse out the actual url this…

Accepted Answer

You could try using the cssutils package. Something like this should work:import cssutilsfrom bs4 import BeautifulSouphtml = """

"""soup = BeautifulSoup(html)div_style = soup.find('div')['style']style = cssutils.parseStyle(div_style)url = style['background-image']>>> urlu'url(/uploads/images/players/16113-1399107741.jpeg)'>>> url = url.replace('url(', '').replace(')', '') # or regex/split/find/slice etc.>>> urlu'/uploads/images/players/16113-1399107741.jpeg'Although you are ultimately going to need to parse out the actual url this method should be more resilient to changes in the HTML. If you really dislike string manipulation and regex, you can pull the url out in this roundabout way:sheet = cssutils.css.CSSStyleSheet()sheet.add("dummy_selector { %s }" % div_style)url = list(cssutils.getUrls(sheet))[0]>>> urlu'/uploads/images/players/16113-1399107741.jpeg'

Advertisement

Answer