I need help vectorizing a function in numpy. In Julia, I can do something like that:
((a,b,c) -> [a,b,c]).([[1,2],[3,4]],[[5,6],[7,8]],nothing)
which returns
2-element Vector{Vector{Union{Nothing, Vector{Int64}}}}: [[1, 2], [5, 6], nothing] [[3, 4], [7, 8], nothing]
It takes one sublist at a time from the iterables and expands nothing
.
In Python, I just can’t get to have a similar behaviour. I tried:
np.vectorize(lambda a,b,c: [a,b,c])([[1,2], [3,4]], [[5,6], [7,8]], None)
but it returns:
array([[list([1, 5, None]), list([2, 6, None])], [list([3, 7, None]), list([4, 8, None])]], dtype=object)
If I do:
np.vectorize(lambda a,b,c: print(a,b,c))([[1,2], [3,4]], [[5,6], [7,8]], np.nan)
I get back:
1 5 nan 1 5 nan 2 6 nan 3 7 nan 4 8 nan
I tried with excluded parameter, but il excludes the whole array:
np.vectorize(lambda a,b,c: print(a,b,c), excluded=[0])([[1,2], [3,4]], [[5,6], [7,8]], np.nan)
prints:
[[1, 2], [3, 4]] 5 nan [[1, 2], [3, 4]] 5 nan [[1, 2], [3, 4]] 6 nan [[1, 2], [3, 4]] 7 nan [[1, 2], [3, 4]] 8 nan
By the way, the actual function is a sklearn function, not a lambda one.
Advertisement
Answer
You gave it a (2,2), (2,2) and scalar arguments. np.vectorized
called your function 4 times, each time with a tuple of values from those 3 (broadcasted together).
You also see that with the print
version. There’s an additional tuple at the start, used to determine the return dtype, which in this case is a list, so dtype=object
.
With the exclude
it doesn’t iterate on the values of the 1st argument, rather it just passes it whole.
Here’s the right way to create your list of lists:
In [811]: a,b,c = [[1,2], [3,4]], [[5,6], [7,8]], None In [813]: [[i,j,None] for i,j in zip(a,b)] Out[813]: [[[1, 2], [5, 6], None], [[3, 4], [7, 8], None]]
If we add a signature
(and otypes
):
In [821]: f = np.vectorize(lambda a,b,c: [a,b,c], signature='(n),(n),()->()', otypes=[object]) In [822]: f(a,b,c) Out[822]: array([list([array([1, 2]), array([5, 6]), None]), list([array([3, 4]), array([7, 8]), None])], dtype=object)
Now it calls the function only twice. But the result is much slower. Read, and reread, the notes
about performance.
If we make the list arguments into arrays first:
In [825]: A,B = np.array(a), np.array(b) In [826]: A,B Out[826]: (array([[1, 2], [3, 4]]), array([[5, 6], [7, 8]]))
the signature f
returns the same thing, showing that vectorize
does convert the lists to arrays:
In [827]: f(A,B,c) Out[827]: array([list([array([1, 2]), array([5, 6]), None]), list([array([3, 4]), array([7, 8]), None])], dtype=object)
If we passed the arrays to the list comprehension, we can get:
In [829]: np.array([[i,j,None] for i,j in zip(A,B)], object) Out[829]: array([[array([1, 2]), array([5, 6]), None], [array([3, 4]), array([7, 8]), None]], dtype=object) In [830]: _.shape Out[830]: (2, 3)