H, I have a dataset with two columns, one of them is target
. If I group all the unique values in target, I get an array of 826
elements. My problem is when trying to assign some values based on this uniqueness.
I have a second array, called array
with contains a total of 826
values (of string type) to assign to each row in my dataset based on their values in the column target
, here is an example.
print(len(df['target'].unique())) # 826 print(len(array)) # 826 print(array[0]) # "Some string value"
When I try to iterate over both target and array, and assign each value of the array to the rows in the dataset with that target, I get that the new created column, called final_target
, has only 822
unique values!
for target, new_value in zip(df['target'].unique(), array): df.loc[df["target"] == target, 'final_target'] = new_value
IN theorty the code seems to be fine, but when checking the unique values in the column final_target
, I get:
len(df['final_target'].unique()) # 822
I can’t figure out what is wrong with this, I have to note that both columns (target
and final_target
) have the same total length (100,000 samples).
Advertisement
Answer
Let us do
#df['final_target'] = df['target'].astype('category').cat.codes df['final_target'] = df['target'].replace(dict(zip(df['target'].astype('category').cat.codes, array)))