Skip to content
Advertisement

How can I wrap all BeautifulSoup existing find/select methods in order to add additional logic and parameters?

I have a repetitive sanity-check process I go through with most calls to a BeautifulSoup object where I:

  1. Make the function call (.find, .find_all, .select_one, and .select mostly)
  2. Check to make sure the element(s) were found
    • If not found, I raise a custom MissingHTMLTagError, stopping the process there.
  3. Attempt to retrieve attribute(s) from the element(s) (using .get or getattr)
    • If not found, I raise a custom MissingHTMLAttributeError
  4. Return either a:
    • string, when it’s a single attribute of a single element (.find and .select_one)
    • list of strings, when it’s a single attribute of multiple elements (.find_all and .select)
    • dict, when it’s two attributes (key/value pairs) for multiple elements (.find_all and .select)

I’ve created the below solution that acts as a proxy (not-so-elegantly) to BeautifulSoup methods. But, I’m hoping there is an easier eay to accomplish this. Basically, I want to be able to patch all the BeautifulSoup methods to:

  1. Allow for an extra parameter to be passed, so that the above steps are taken care off in a single call
  2. If using any of the above methods without providing the extra parameter I want to return the BeautifulSoup objects like normal or raise the MissingHTMLTagError if the return value is None or an empty list.

Most of the time the below function is used with a class variable (self._soup), which is just a BeautifulSoup object of the most-recent requests.Response.

JavaScript

Is there anyway to wrap all of the exposed .find and .select-type methods of BeautifulSoup, so I can still use the methods normally (ex: soup.find()) instead of having to use my workaround function?

Advertisement

Answer

I believe I’ve figured out a succinct and reasonable way to accomplish what I’m looking for with the following wrapper:

JavaScript
Advertisement