Here is what I am trying to do.
I want to substitute the values of this data frame.
For example. Bernard to be substituted as 1, and then Drake as 2 and so on and so forth. How to iterate through the column to write a function that can do the following.
Advertisement
Answer
The function already exists – pd.factorize
.
It returns a tuple – first a new column with the values each item has been mapped to. Then second an index of the unique values.
JavaScript
x
3
1
df = pd.DataFrame({'name': ['Bernard', 'Bernard', 'Drake', 'Drake', 'Lance']})
2
pd.factorize(df.name)
3
JavaScript
1
2
1
(array([0, 0, 1, 1, 2]), Index(['Bernard', 'Drake', 'Lance'], dtype='object'))
2
Using that, we’d just assign a new column:
JavaScript
1
3
1
df = df.assign(codes=pd.factorize(df.name)[0] + 1)
2
df
3
JavaScript
1
7
1
name codes
2
0 Bernard 1
3
1 Bernard 1
4
2 Drake 2
5
3 Drake 2
6
4 Lance 3
7