I am using the @jit signature to define the types of the incoming arguments. But in calling the function I get:
ValueError: cannot compute fingerprint of empty list
I know the list is empty, but my signature defines it so am not sure why Numba does not use that signature.
I have tried the different forms of signatures (string form and the tuple form) and it still gives the error. It is not clear to me from the documentation why these signatures do not define the arguments as passed in and it is still relying on inferring types.
@nb.jit("void(List(int64), int64, List(List(int64)))", nopython=True, cache=True) def _set_indices(keys_as_int, n_keys, indices): for i, k in enumerate(keys_as_int): indices[k].append(i) indices = [([np.array(elt) for elt in indices])] def group_by(keys): _, first_occurrences, keys_as_int = np.unique(keys, return_index=True, return_inverse=True) n_keys = max(keys_as_int) + 1 indices = [[] for _ in range(max(keys_as_int) + 1)] print(str(keys_as_int) + str(n_keys) + str(indices)) _set_indices(keys_as_int, n_keys, indices) return indices result = group_by(['aaa', 'aab', 'aac', 'aaa', 'aac']) print(str(result))
I expected the signature to enforce a data typing on the incoming arguments with no need for inferring the data types. Actual error
<ipython-input-274-401e07cd4e63> in <module> ----> 1 result = group_by(['aaa', 'aab', 'aac', 'aaa', 'aac']) 2 print(str(result)) <ipython-input-273-acdebb81069c> in group_by(keys) 4 indices = [[] for _ in range(max(keys_as_int) + 1)] 5 print(str(keys_as_int) + str(n_keys) + str(indices)) ----> 6 _set_indices(keys_as_int, n_keys, indices) 7 return indices ValueError: cannot compute fingerprint of empty list
Advertisement
Answer
So I found a workaround to get your code to working. Here is a github issue with almost the same problem as you are facing. So I tried to create a List with a dummy value -1
which will be dropped towards the end. However, I ran into an ‘reflected List exception`. You can read about it more here. So I had to use Numba’s typed-list. You can check more about this data type here. Long story short, here is the final code that works in No Python mode and returns the correct result as you would expect.
import numba as nb import numpy as np from numba.typed import List @nb.jit(nopython=True, cache=True) def _set_indices(keys_as_int, n_keys, indices): # Do some operation for i, k in enumerate(keys_as_int): indices[k].append(i) # Drop the dummy element in the final result indices = [elem[1:] for elem in indices] # Return the final indices return indices def group_by(keys): _, first_occurrences, keys_as_int = np.unique(keys, return_index=True, return_inverse=True) n_keys = max(keys_as_int) + 1 # Simply adding the dummy element doesn't work here # Error: cannot reflect element of reflected container: reflected list(reflected list(int64)) # indices = [[-1] for _ in range(max(keys_as_int) + 1)] # A workaround is to create Numba's version of typed-list indices = List() for i in range(max(keys_as_int) + 1): l = List() l.append(-1) indices.append(l) print(str(keys_as_int), str(n_keys), str(indices)) indices = _set_indices(keys_as_int, n_keys, indices) return indices result = group_by(['aaa', 'aab', 'aac', 'aaa', 'aac']) # Conversion of Numba's typed list inside NoPython mode returns error # Hence do it outside the function result = [np.asarray(elem) for elem in result] print(result)
Here is the link to Google colab notebook with the working code. Go to the last cell if you want to dig into the reflected list exception.