If I have the following dataframe:
ID | other |
---|---|
219218 | 34 |
823#32 | 47 |
unknown | 42 |
8#3#32 | 32 |
1#3#5# | 97 |
6#3### | 27 |
I want to obtain the following result:
ID | other |
---|---|
219218 | 34 |
823#32 | 47 |
unknown | 42 |
8#3#32 | 32 |
unknown | 97 |
unknown | 27 |
I am using the following code which works.
for i in range(len(df)): ident = testing.loc[i, 'ID'] if ident.count('#') > 2: df.loc[i, 'ID'] = 'unknown'
Is there a way to make it more optimal, bearing in mind that I am going to apply the code to a dataframe of more than 60,000 rows?
Thank you for your help.
Advertisement
Answer
For an efficient solution, use vectorial methods and assign with loc
:
df.loc[df['ID'].str.count('#').gt(2), 'ID'] = 'unknown'
output:
ID other 0 219218 34 1 823#32 47 2 unknown 42 3 8#3#32 32 4 unknown 97 5 unknown 27