Let’s say I have 3 classes: 0, 1, 2
One-hot-encoding an array of labels can be done via pandas as follows:
What I’m interested in, is how to get an encoding that can handle an intermediate class, e.g. class in the middle between 2 classes.
For example:
- for class
0.4
, resulting encoding should be[0.4, 0.6, 0]
- for class
1.8
, resulting encoding should be[0, 0.2, 0.8]
Does anybody know such an encoder?
Thanks for your answer!
Advertisement
Answer
You can write a function for your strange encoding like the below:
JavaScript
x
16
16
1
import numpy as np
2
import math
3
4
def strange_encode(num, cnt_lbl):
5
encode_arr = np.zeros(cnt_lbl)
6
lbl = math.floor(num)
7
if lbl != num:
8
num -= lbl
9
if num >= 0.5:
10
encode_arr[lbl:lbl+2] = [1-num, num]
11
else:
12
encode_arr[lbl:lbl+2] = [num, 1-num]
13
else:
14
encode_arr[lbl] = 1
15
return encode_arr
16
Output:
JavaScript
1
22
22
1
>>> encode(0.0, cnt_lbl=3)
2
array([1., 0., 0.])
3
4
>>> encode(2.0, cnt_lbl=3)
5
array([0., 0., 1.])
6
7
>>> encode(0.4, cnt_lbl=3)
8
array([0.4, 0.6, 0. ])
9
10
>>> encode(1.8, cnt_lbl=3)
11
array([0. , 0.2, 0.8])
12
13
# You can change the count of classes
14
>>> encode(2.5, cnt_lbl=4)
15
array([0. , 0. , 0.5, 0.5])
16
17
>>> encode(1.6, cnt_lbl=4)
18
array([0. , 0.4, 0.6, 0. ])
19
20
>>> encode(2, cnt_lbl=4)
21
array([0., 0., 1., 0.])
22
We can write a function for generating a dataframe for encoding like below:
JavaScript
1
7
1
import pandas as pd
2
def generate_df_encoding(arr_nums, num_classes):
3
arr = np.zeros((len(arr_nums), num_classes))
4
for idx, num in enumerate(arr_nums):
5
arr[idx] = strange_encode(num, cnt_lbl=num_classes)
6
return pd.DataFrame(arr)
7
Output:
JavaScript
1
10
10
1
>>> generate_df_encoding([0,0,1,2,0.4,1.8], num_classes=3)
2
3
0 1 2
4
0 1.0 0.0 0.0
5
1 1.0 0.0 0.0
6
2 0.0 1.0 0.0
7
3 0.0 0.0 1.0
8
4 0.4 0.6 0.0
9
5 0.0 0.2 0.8
10