Here is what I am trying to do. I want to substitute the values of this data frame.
For example. Bernard to be substituted as 1, and then Drake as 2 and so on and so forth. How to iterate through the column to write a function that can do the following.
Advertisement
Answer
The function already exists – pd.factorize
.
It returns a tuple – first a new column with the values each item has been mapped to. Then second an index of the unique values.
df = pd.DataFrame({'name': ['Bernard', 'Bernard', 'Drake', 'Drake', 'Lance']}) pd.factorize(df.name)
(array([0, 0, 1, 1, 2]), Index(['Bernard', 'Drake', 'Lance'], dtype='object'))
Using that, we’d just assign a new column:
df = df.assign(codes=pd.factorize(df.name)[0] + 1) df
name codes 0 Bernard 1 1 Bernard 1 2 Drake 2 3 Drake 2 4 Lance 3