If I have the following dataframe: ID other 219218 34 823#32 47 unknown 42 8#3#32 32 1#3#5# 97 6#3### 27 I want to obtain the following result: ID other 219218 34 823#32 47 unknown 42 8#3#32 32 unknown 97 unknown 27 I am using the following code which works. Is there a way to make it more optimal, bearing in

For an efficient solution, use vectorial methods and assign with loc:df.loc[df['ID'].str.count('#').gt(2), 'ID'] = 'unknown'output: ID other0 219218 341 823#32 472 unknown 423 8#3#32 324 unknown 975 unknown 27

Rename a row with X unknown characters

If I have the following dataframe:

ID	other
219218	34
823#32	47
unknown	42
8#3#32	32
1#3#5#	97
6#3###	27

I want to obtain the following result:

ID	other
219218	34
823#32	47
unknown	42
8#3#32	32
unknown	97
unknown	27

I am using the following code which works.

for i in range(len(df)):
  ident = testing.loc[i, 'ID']
  if ident.count('#') > 2:
    df.loc[i, 'ID'] = 'unknown'

JavaScript
​x
 
for i in range(len(df)):
  ident = testing.loc[i, 'ID']
  if ident.count('#') > 2:
    df.loc[i, 'ID'] = 'unknown'
​

Is there a way to make it more optimal, bearing in mind that I am going to apply the code to a dataframe of more than 60,000 rows?

Thank you for your help.

Answer

For an efficient solution, use vectorial methods and assign with loc:

df.loc[df['ID'].str.count('#').gt(2), 'ID'] = 'unknown'

JavaScript
 
df.loc[df['ID'].str.count('#').gt(2), 'ID'] = 'unknown'
​

output:

        ID  other
0   219218     34
1   823#32     47
2  unknown     42
3   8#3#32     32
4  unknown     97
5  unknown     27

JavaScript
 
        ID  other
0   219218     34
1   823#32     47
2  unknown     42
3   8#3#32     32
4  unknown     97
5  unknown     27
​

Advertisement

Answer