Let’s say I have 3 classes: 0, 1, 2
One-hot-encoding an array of labels can be done via pandas as follows:
What I’m interested in, is how to get an encoding that can handle an intermediate class, e.g. class in the middle between 2 classes.
For example:
- for class
0.4
, resulting encoding should be[0.4, 0.6, 0]
- for class
1.8
, resulting encoding should be[0, 0.2, 0.8]
Does anybody know such an encoder?
Thanks for your answer!
Advertisement
Answer
You can write a function for your strange encoding like the below:
import numpy as np import math def strange_encode(num, cnt_lbl): encode_arr = np.zeros(cnt_lbl) lbl = math.floor(num) if lbl != num: num -= lbl if num >= 0.5: encode_arr[lbl:lbl+2] = [1-num, num] else: encode_arr[lbl:lbl+2] = [num, 1-num] else: encode_arr[lbl] = 1 return encode_arr
Output:
>>> encode(0.0, cnt_lbl=3) array([1., 0., 0.]) >>> encode(2.0, cnt_lbl=3) array([0., 0., 1.]) >>> encode(0.4, cnt_lbl=3) array([0.4, 0.6, 0. ]) >>> encode(1.8, cnt_lbl=3) array([0. , 0.2, 0.8]) # You can change the count of classes >>> encode(2.5, cnt_lbl=4) array([0. , 0. , 0.5, 0.5]) >>> encode(1.6, cnt_lbl=4) array([0. , 0.4, 0.6, 0. ]) >>> encode(2, cnt_lbl=4) array([0., 0., 1., 0.])
We can write a function for generating a dataframe for encoding like below:
import pandas as pd def generate_df_encoding(arr_nums, num_classes): arr = np.zeros((len(arr_nums), num_classes)) for idx, num in enumerate(arr_nums): arr[idx] = strange_encode(num, cnt_lbl=num_classes) return pd.DataFrame(arr)
Output:
>>> generate_df_encoding([0,0,1,2,0.4,1.8], num_classes=3) 0 1 2 0 1.0 0.0 0.0 1 1.0 0.0 0.0 2 0.0 1.0 0.0 3 0.0 0.0 1.0 4 0.4 0.6 0.0 5 0.0 0.2 0.8