Skip to content
Advertisement

Convert a 2d matrix to a 3d one hot matrix numpy

I have np matrix and I want to convert it to a 3d array with one hot encoding of the elements as third dimension. Is there a way to do with without looping over each row eg

JavaScript

should be made into

JavaScript

Advertisement

Answer

Approach #1

Here’s a cheeky one-liner that abuses broadcasted comparison –

JavaScript

Sample run –

JavaScript

For 0-based indexing, it would be –

JavaScript

If the one-hot enconding is to cover for the range of values ranging from the minimum to the maximum values, then offset by the minimum value and then feed it to the proposed method for 0-based indexing. This would be applicable for rest of the approaches discussed later on in this post as well.

Here’s a sample run on the same –

JavaScript

If you are okay with a boolean array with True for 1's and False for 0's, you can skip the .astype(int) conversion.

Approach #2

We can also initialize a zeros arrays and index into the output with advanced-indexing. Thus, for 0-based indexing, we would have –

JavaScript

Helper func –

JavaScript

This should be especially more performant when dealing with larger range of values.

For 1-based indexing, simply feed in a-1 as the input.

Approach #3 : Sparse matrix solution

Now, if you are looking for sparse array as output and AFAIK since scipy’s inbuilt sparse matrices support only 2D formats, you can get a sparse output that is a reshaped version of the output shown earlier with the first two axes merging and the third axis being kept intact. The implementation for 0-based indexing would look something like this –

JavaScript

Again, for 1-based indexing, simply feed in a-1 as the input.

Sample run –

JavaScript

This would be much better than previous two approaches if you are okay with having sparse output.

Runtime comparison for 0-based indexing

Case #1 :

JavaScript

Case #2 :

JavaScript

Squeezing out best performance

To squeeze out the best performance, we could modify approach #2 to use indexing on a 2D shaped output array and also use uint8 dtype for memory efficiency and that leading to much faster assignments, like so –

JavaScript

Timings –

JavaScript
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement