I am currently working on a Python project with a couple class methods that are each called tens of thousands of times. One of the issues with these methods is that they rely on data being populated via another method first, so I want to be able to raise an error if the functions are called prior to populating the data.
And before anyone asks, I opted to separate the data population stage from the class constructor. This is because the data population (and processing) is intensive and I want to manage it separately from the constructor.
Simple (inefficient) implementation
A simple implementation of this might look like:
class DataNotPopulatedError(Exception): ... class Unblocker1: def __init__(self): self.data = None self._is_populated = False def populate_data(self, data): self.data = data self._is_populated = True # It will make sense later why this is its own method def _do_something(self): print("Data is:", self.data) def do_something(self): if not self._is_populated: raise DataNotPopulatedError return self._do_something() unblocker1 = Unblocker1() # Raise an error (We haven't populated the data yet) unblocker1.do_something() # Don't raise an error (We populated the data first) unblocker1.populate_data([1,2,3]) unblocker1.do_something()
My goal
Because the hypothetical do_something()
method is called tens (or hundreds) of thousands of times, I would think those extra checks to make sure that the data has been populated would start to add up.
While I may be barking up the wrong tree, my first thoughts to improve the efficiency of thefunction were to dynamically re-assign the method after the data is populated. I.e., when first creating the class, the do_something()
method would point to another function that only raises a DataNotPopulatedError
. The populate_data()
method would then both populate the data and also “unblock” do_something()
by dynamically reassigning do_something()
back to the function as written.
I figure the cleanest way to implement something like this would be using a decorator.
Hypothetical usage
I have no idea how to implement the technique described above, however, I did create a hypothetical usage with the inefficient method from before. Given the goal implementation, there might need to be two decorators–one for the blocked functions, and one to unblock them.
import functools def blocked2(attr, raises): def _blocked2(func): @functools.wraps(func) def wrapper(*args, **kwargs): # Assumes `args[0]` is `self` # If `self.key` is False, raise `raises`, otherwise call `func()` if not getattr(args[0], attr): raise raises return func(*args, **kwargs) return wrapper return _blocked2 class Unblocker2: def __init__(self): self.data = None self._is_populated = False def populate_data(self, data): self.data = data self._is_populated = True @blocked2("_is_populated", DataNotPopulatedError) def do_something(self): print("Data is:", self.data)
I’ve been having a hard time explaining what I am attempting to do, so I am open to other suggestions to accomplish a similar goal (and potentially better titles for the post). There is a decent chance I am taking the complete wrong approach here; that’s just part of learning. If there is a better way of doing what I am trying to do, I am all ears!
Advertisement
Answer
What you are trying to do does not seem especially difficult. I suspect you are overcomplicating the task a bit. Assuming you are willing to respect your own private methods, you can do something like
class Unblocker2: def __init__(self): self.data = None def populate_data(self, data): self.data = data self.do_something = self._do_something_populated def do_something(self): raise DataNotPopulatedError('Data not populated yet') def _do_something_populated(self): print("Data is:", self.data)
Since methods are non-data descriptors, assigning a bound method to the instance attribute do_something
will shadow the class attribute. That way, instances that have data populated can avoid making a check with the minimum of redundancy.
That being said, profile your code before going off and optimizing it. You’d be amazed at which parts take the longest.