threading.Thread.start() method execution time depends on the Thread target method

I am wondering why threading.Thread().start() method execution time is dependent on the Thread‘s target method. I assumed that start() method just signalizes system that Thread can execute and does not do any processing.

To illustrate this behavior:

import numpy as np
from time import time
import threading


# Function which should take no time to execute
def pass_target_function(idx):
    pass


# Function which should take non-zero time to execute
def calculate_target_function(idx):
    a = my_arr[..., idx] * mask
    a = np.sum(a)


# Create data
my_arr_size = (1000000, 300)
my_arr = np.random.randint(255, size=my_arr_size)
mask = np.random.randint(255, size=my_arr.shape[0])


for target_function in [pass_target_function, calculate_target_function]:
    print(target_function.__name__)

    threads = []

    # Instantiate Threads
    st = time()
    for i in range(my_arr.shape[-1]):
        threads.append(threading.Thread(target=target_function, args=[i], daemon=True))
    print('tThreads instantiated  in: {:.02f} ms'.format((time() - st) * 1000))

    # Run threads
    st = time()
    for thread in threads:
        thread.start()
    print('tThreads started in: {:.02f} ms'.format((time() - st) * 1000))

    # Join threads
    st = time()
    for thread in threads:
        thread.join()
    print('tThreads joined in: {:.02f} ms'.format((time() - st) * 1000))

JavaScript
​x
 
import numpy as np
from time import time
import threading
​
​
# Function which should take no time to execute
def pass_target_function(idx):
    pass
​
​
# Function which should take non-zero time to execute
def calculate_target_function(idx):
    a = my_arr[..., idx] * mask
    a = np.sum(a)
​
​
# Create data
my_arr_size = (1000000, 300)
my_arr = np.random.randint(255, size=my_arr_size)
mask = np.random.randint(255, size=my_arr.shape[0])
​
​
for target_function in [pass_target_function, calculate_target_function]:
    print(target_function.__name__)
​
    threads = []
​
    # Instantiate Threads
    st = time()
    for i in range(my_arr.shape[-1]):
        threads.append(threading.Thread(target=target_function, args=[i], daemon=True))
    print('tThreads instantiated  in: {:.02f} ms'.format((time() - st) * 1000))
​
    # Run threads
    st = time()
    for thread in threads:
        thread.start()
    print('tThreads started in: {:.02f} ms'.format((time() - st) * 1000))
​
    # Join threads
    st = time()
    for thread in threads:
        thread.join()
    print('tThreads joined in: {:.02f} ms'.format((time() - st) * 1000))
​

Results:

pass_target_function
    Threads instantiated  in: 1.99 ms
    Threads started in: 105.72 ms
    Threads joined in: 1.00 ms
calculate_target_function
    Threads instantiated  in: 1.99 ms
    Threads started in: 1111.03 ms
    Threads joined in: 26.93 ms

JavaScript
 
pass_target_function
    Threads instantiated  in: 1.99 ms
    Threads started in: 105.72 ms
    Threads joined in: 1.00 ms
calculate_target_function
    Threads instantiated  in: 1.99 ms
    Threads started in: 1111.03 ms
    Threads joined in: 26.93 ms
​

Why threads start time differs for pass_target_function and calculate_target_function?

Edit:

Based on Steve’s answer individual thread start time is measured.

# Run threads
thread_start_time = collections.deque()
for thread in threads:
    st_in = time()
    thread.start()
    thread_start_time.append(time() - st_in)

JavaScript
 
# Run threads
thread_start_time = collections.deque()
for thread in threads:
    st_in = time()
    thread.start()
    thread_start_time.append(time() - st_in)
​

Result match behavior described in Steve’s answer:

What you’ll see is that the earlier calls complete very quickly, but as more and more threads have already been started, the time for start to run starts to take longer and longer, and the exact times are inconsistent.

Answer

Time each start call independently, printing each result. What you’ll see is that the earlier calls complete very quickly, but as more and more threads have already been started, the time for start to run starts to take longer and longer, and the exact times are inconsistent. This is due to all of the existing threads, including the main thread, having to contend for CPU time. A call to start may begin to do its thing, but then get swapped out to let other threads run for a while. This is made worse by the GIL, which is a Mutex that Python uses to keep things thread-safe by not really running your code in parallel on multiple cores of the CPU.

The reason that no-op case’s start methods run so quickly is that the threads get started and then immediately go away because they have nothing to actually due. This avoids the contention for CPU time that occurs when there’s real work to be done in each thread.

Edit:

Advertisement

Answer