Skip to content
Advertisement

Remove unwanted str in Pandas dataframe

‘I am reading a csv file using panda read_csv which contains data,

Id;LibId;1;mod;modId;28;Index=10, Step=0, data=d720983f0000c0bf0000000014ae47bf0fe7c23ad1de3039;
Id;LibId;1;mod;modId;4;f9e9003e;
.
.
.
. 

In the last column, I want to remove the Index, Step, data= and want to retain the hex value part.

I have created a list with the unwanted values and used regex but nothing seem to work.

to_remove = ['Index','Step','data=']
rex = '[' + re.escape (''. join (to_remove )) + ']'
output_csv['Column_name'].str.replace(rex , '', regex=True)

Advertisement

Answer

I suggest that you fix your code using

to_remove = ['Index','Step','data=']
output_csv['Column_name'] = output_csv['Column_name'].str.replace('|'.join([re.escape(x) for x in to_remove]), '', regex=True)

The '|'.join([re.escape(x) for x in to_remove]) part will create a regex like Index|Step|data= and will match any of the to_remove substrings.

Advertisement