How to create a one-hot-encoding for the intermediate class?

Question

Let's say I have 3 classes: 0, 1, 2 One-hot-encoding an array of labels can be done via pandas as follows: What I'm interested in, is how to get an encoding that can handle an intermediate class, e.g. class in the middle between 2 classes. For example: for class 0.4, resulting encoding should be [0.4, 0.6, 0] for class 1.8,

Accepted Answer

You can write a function for your strange encoding like the below:import numpy as npimport mathdef strange_encode(num, cnt_lbl):    encode_arr = np.zeros(cnt_lbl)    lbl = math.floor(num)    if lbl != num:        num -= lbl        if num >= 0.5:            encode_arr[lbl:lbl+2] = [1-num, num]        else:            encode_arr[lbl:lbl+2] = [num, 1-num]    else:        encode_arr[lbl] = 1    return encode_arrOutput:>>> encode(0.0, cnt_lbl=3)array([1., 0., 0.])>>> encode(2.0, cnt_lbl=3)array([0., 0., 1.])>>> encode(0.4, cnt_lbl=3)array([0.4, 0.6, 0. ])>>> encode(1.8, cnt_lbl=3)array([0. , 0.2, 0.8])# You can change the count of classes>>> encode(2.5, cnt_lbl=4)array([0. , 0. , 0.5, 0.5])>>> encode(1.6, cnt_lbl=4)array([0. , 0.4, 0.6, 0. ])>>> encode(2, cnt_lbl=4)array([0., 0., 1., 0.])We can write a function for generating a dataframe for encoding like below:import pandas as pddef generate_df_encoding(arr_nums, num_classes):    arr = np.zeros((len(arr_nums), num_classes))    for idx, num in enumerate(arr_nums):        arr[idx] = strange_encode(num, cnt_lbl=num_classes)    return pd.DataFrame(arr)Output:>>> generate_df_encoding([0,0,1,2,0.4,1.8], num_classes=3)    0    1      20   1.0  0.0    0.01   1.0  0.0    0.02   0.0  1.0    0.03   0.0  0.0    1.04   0.4  0.6    0.05   0.0  0.2    0.8

Advertisement

Answer