Skip to content
Advertisement

Pandas Assign Same Value to All Unique Values in Column

H, I have a dataset with two columns, one of them is target. If I group all the unique values in target, I get an array of 826 elements. My problem is when trying to assign some values based on this uniqueness.

I have a second array, called array with contains a total of 826 values (of string type) to assign to each row in my dataset based on their values in the column target, here is an example.

print(len(df['target'].unique()))
# 826

print(len(array))
# 826

print(array[0])
# "Some string value"

When I try to iterate over both target and array, and assign each value of the array to the rows in the dataset with that target, I get that the new created column, called final_target, has only 822 unique values!

for target, new_value in zip(df['target'].unique(), array):
    df.loc[df["target"] == target, 'final_target'] = new_value

IN theorty the code seems to be fine, but when checking the unique values in the column final_target, I get:

len(df['final_target'].unique())
# 822

I can’t figure out what is wrong with this, I have to note that both columns (target and final_target) have the same total length (100,000 samples).

Advertisement

Answer

Let us do

#df['final_target'] = df['target'].astype('category').cat.codes

df['final_target'] = df['target'].replace(dict(zip(df['target'].astype('category').cat.codes, array)))
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement