Skip to content
Advertisement

Improve the speed of code with if else statements

I am working on building a classs. But I realized that one part is extremly slow. I believe it is this part below, because when I call it, it takes couple of minutes which is too long. How can the speed of the function be improved? Not sure how the code can be improved to increase speed.

JavaScript

There is a second part which also include couple of if else statements.

JavaScript

Advertisement

Answer

If your chisquare is scipy.stats.chisquare, then it takes numpy arrays so you can just use np.fromiter(your_dict.values(), dtype=float) * 100 (or dtype=int as required) as your argument instead of converting it to a list, then to an array, then to a list again.

Even if your function doesn’t take numpy arrays and absolutely must have a list for some reason, you could consider iterating directly over your_dict.values() and multiplying the elements of that iterator in a list comprehension. [i * 100 for i in your_dict.values()], or using np.ndarray.tolist() to convert the numpy array to a list.

Out of interest, I wrote some code to time the different approaches.

  1. Direct conversion from dict to ndarray, multiply by 100
  2. dict to list, then to ndarray, multiply by 100, then to list
  3. dict to ndarray, multiply by 100, then list
  4. dict to ndarray, multiply by 100, then np.tolist
  5. dict to list using list comprehension
JavaScript

This gives the following plot: enter image description here

Some observations:

  1. At small values of dictionary size ( < 100 ), the dict->list approach (f4) is fastest by far. Next are dict->array (f0) and dict->array->np.tolist (f3). Finally, we have dict->array->list (f2), and dict->list->array->list (f1). This makes sense — it takes time to convert everything to a numpy array, so if your dictionary is small enough, just multiply it in python and get it over with. f1 is predictably on the slower side, since it involves the most conversions, and therefore the most work. Interestingly, f2 is similar in performance to f1, which would indicate that converting a numpy array to a list using list() is a bottleneck.
  2. At larger input sizes ( > 10,000 ), f4 becomes slower than f0, and slowly gets worse than the other approaches that use numpy. Converting the numpy array to a list is consistently faster using np.ndarray.tolist() than using list(...)
  3. The inset shows the zoomed-in plot at the largest values. The inset y axis has a linear scale to better demonstrate the differences in each method at scale, and clearly f1 is by far the slowest, and f0 by far the fastest.

To conclude:

  1. If your dicts are small, use a list comprehension.
  2. If your dicts are large, don’t convert to a list, use a numpy array
  3. If your dicts are large and you must convert to a list, avoid list(...). Use np.fromiter to convert the dict directly to a numpy array, and np.ndarray.tolist() to convert the numpy array to a list.
Advertisement