If I have the following dataframe:
ID | other |
---|---|
219218 | 34 |
823#32 | 47 |
unknown | 42 |
8#3#32 | 32 |
1#3#5# | 97 |
6#3### | 27 |
I want to obtain the following result:
ID | other |
---|---|
219218 | 34 |
823#32 | 47 |
unknown | 42 |
8#3#32 | 32 |
unknown | 97 |
unknown | 27 |
I am using the following code which works.
JavaScript
x
5
1
for i in range(len(df)):
2
ident = testing.loc[i, 'ID']
3
if ident.count('#') > 2:
4
df.loc[i, 'ID'] = 'unknown'
5
Is there a way to make it more optimal, bearing in mind that I am going to apply the code to a dataframe of more than 60,000 rows?
Thank you for your help.
Advertisement
Answer
For an efficient solution, use vectorial methods and assign with loc
:
JavaScript
1
2
1
df.loc[df['ID'].str.count('#').gt(2), 'ID'] = 'unknown'
2
output:
JavaScript
1
8
1
ID other
2
0 219218 34
3
1 823#32 47
4
2 unknown 42
5
3 8#3#32 32
6
4 unknown 97
7
5 unknown 27
8