Skip to content
Advertisement

How to create a one-hot-encoding for the intermediate class?

Let’s say I have 3 classes: 0, 1, 2
One-hot-encoding an array of labels can be done via pandas as follows:

enter image description here

What I’m interested in, is how to get an encoding that can handle an intermediate class, e.g. class in the middle between 2 classes.
For example:

  • for class 0.4, resulting encoding should be [0.4, 0.6, 0]
  • for class 1.8, resulting encoding should be [0, 0.2, 0.8]

Does anybody know such an encoder?
Thanks for your answer!

Advertisement

Answer

You can write a function for your strange encoding like the below:

import numpy as np
import math

def strange_encode(num, cnt_lbl):
    encode_arr = np.zeros(cnt_lbl)
    lbl = math.floor(num)
    if lbl != num:
        num -= lbl
        if num >= 0.5:
            encode_arr[lbl:lbl+2] = [1-num, num]
        else:
            encode_arr[lbl:lbl+2] = [num, 1-num]
    else:
        encode_arr[lbl] = 1
    return encode_arr

Output:

>>> encode(0.0, cnt_lbl=3)
array([1., 0., 0.])

>>> encode(2.0, cnt_lbl=3)
array([0., 0., 1.])

>>> encode(0.4, cnt_lbl=3)
array([0.4, 0.6, 0. ])

>>> encode(1.8, cnt_lbl=3)
array([0. , 0.2, 0.8])

# You can change the count of classes
>>> encode(2.5, cnt_lbl=4)
array([0. , 0. , 0.5, 0.5])

>>> encode(1.6, cnt_lbl=4)
array([0. , 0.4, 0.6, 0. ])

>>> encode(2, cnt_lbl=4)
array([0., 0., 1., 0.])

We can write a function for generating a dataframe for encoding like below:

import pandas as pd
def generate_df_encoding(arr_nums, num_classes):
    arr = np.zeros((len(arr_nums), num_classes))
    for idx, num in enumerate(arr_nums):
        arr[idx] = strange_encode(num, cnt_lbl=num_classes)
    return pd.DataFrame(arr)

Output:

>>> generate_df_encoding([0,0,1,2,0.4,1.8], num_classes=3)

    0    1      2
0   1.0  0.0    0.0
1   1.0  0.0    0.0
2   0.0  1.0    0.0
3   0.0  0.0    1.0
4   0.4  0.6    0.0
5   0.0  0.2    0.8
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement