Skip to content
Advertisement

Fastest way to round random numbers in python

I want to generate random numbers up to certain precision, one at a time (so I am not looking for a vectorized solution).

I found a method in this QnA on stackoverflow, and it gave me these benchmarks as promised. The method is definitely is almost twice as fast. Now, here’s what is puzzling me.

%timeit int(0.5192853551955484*(10**5)+0.5)/(10.**5)      #> 149 ns ± 5.76 ns per loop 
%timeit round(0.5192853551955484, 5)                      #> 432 ns ± 11.7 ns per loop
## Faster as expected

fl = random.random()
pr = 5
%timeit int(fl*(10**pr)+0.5)/(10.**pr)                    #> 613 ns ± 27.9 ns per loop 
%timeit round(fl, pr)                                     #> 444 ns ± 9.25 ns per loop
## Slower?!

%timeit int(random.random()*(10**5)+0.5)/(10.**5)         #> 280 ns ± 29.3 ns per loop 
%timeit round(random.random(), 5)                         #> 538 ns ± 17.5 ns per loop
## Faster than using a variable even though it has the overhead
## of creating a random number for each call?

Why is the above method slower when I’m using variables? It regains the lost speed when I pass the randomly generated number directly. What am I missing?

In my code, since I require rounding at multiple places, I would like it wrapped in a function as follows. I understand that there is a small cost associated with the function call, but I don’t think that’s whats costing the bulk of the increased time. This answer says, it still ought to be faster than the standard round() function of python.

def rounder(fl, pr):
    p = float(10**pr)
    return int(fl * p + 0.5)/p

%timeit rounder(random.random(), 5)                       #> 707 ns ± 14.2 ns per loop 
%timeit round(random.random(), 5)                         #> 525 ns ± 22.1 ns per loop

## Having a global variable does make it faster by not having to do the 10**5 everytime
p = float(10**5)
def my_round_5(fl):
    return int(fl* p + 0.5)/p

%timeit my_round_5(random.random())                       #> 369 ns ± 18.9 ns per loop

I would prefer not having a global variable in my code, but let’s say I concede to this requirement, but the performance gain is smaller, than using the formula without variables.

So, the final question would be, which method would be most beneficial for me to use? There is only a gain of 100-150ns to switching to a function that requires a global variable. Or is there someway that can be faster.

Advertisement

Answer

The Python byte-compiler “knows” how numbers work, and it uses this knowledge to optimise things where it can. You can use the dis module to see what’s happening.

For example, your first “fast” example:

from dis import dis

def fn():
    return int(0.5192853551955484*(10**5)+0.5)/(10.**5)

dis(fn)

actually does:

  2           0 LOAD_GLOBAL              0 (int)
              2 LOAD_CONST               1 (51929.03551955484)
              4 CALL_FUNCTION            1
              6 LOAD_CONST               2 (100000.0)
              8 BINARY_TRUE_DIVIDE
             10 RETURN_VALUE

i.e. it knows what 0.5192853551955484*(10**5)+0.5 evaluates to and does it when compiling the byte code. If you have pr as a parameter it can’t do that, so has to do more work when running the code.

To answer the “what would be best” question, maybe something like:

def fn(pr):
     # cache to prevent global lookup
     rng = random.random
     # evaluate precision once
     p = 10. ** pr
     # generate infinite stream of random numbers
     while True:
         yield int(rng() * p + 0.5) / p

can be benchmarked with:

x = fn(5)
%timeit next(x)

giving ~160ns per loop, while:

%timeit int(fl*(10**pr)+0.5)/(10**pr)
%timeit int(fl*(10.**pr)+0.5)/(10.**pr)

run in ~500 and ~250ns. Didn’t realise that int vs float exponentiation was so different!

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement