If I have the following dataframe:
| ID | other |
|---|---|
| 219218 | 34 |
| 823#32 | 47 |
| unknown | 42 |
| 8#3#32 | 32 |
| 1#3#5# | 97 |
| 6#3### | 27 |
I want to obtain the following result:
| ID | other |
|---|---|
| 219218 | 34 |
| 823#32 | 47 |
| unknown | 42 |
| 8#3#32 | 32 |
| unknown | 97 |
| unknown | 27 |
I am using the following code which works.
for i in range(len(df)):
ident = testing.loc[i, 'ID']
if ident.count('#') > 2:
df.loc[i, 'ID'] = 'unknown'
Is there a way to make it more optimal, bearing in mind that I am going to apply the code to a dataframe of more than 60,000 rows?
Thank you for your help.
Advertisement
Answer
For an efficient solution, use vectorial methods and assign with loc:
df.loc[df['ID'].str.count('#').gt(2), 'ID'] = 'unknown'
output:
ID other 0 219218 34 1 823#32 47 2 unknown 42 3 8#3#32 32 4 unknown 97 5 unknown 27