I am trying to write some code that splits a string in a dataframe column at comma (so it becomes a list) and removes a certain string from that list if it is present. after removing the unwanted string I want to join the list elements again at comma. My dataframe looks like this:
df: Column1 Column2 0 a a,b,c 1 y b,n,m 2 d n,n,m 3 d b,b,x
So basically my goal is to remove all b values from column2 so that I get:
df:
Column1 Column2 0 a a,c 1 y n,m 2 d n,n,m 3 d x
The code I have written is the following:
df=df['Column2'].apply(lambda x: x.split(',')) def exclude_b(df): for index, liste in df['column2].iteritems(): if 'b' in liste: liste.remove('b') return liste else: return liste
The first row splits all the values in the column into a comma separated list. with the function now I tried to iterate through all the lists and remove the b if present, if it is not present return the list as it is. If I print ‘liste’ at the end it only returns the first row of Column2, but not the others. What am I doing wrong? And would there be a way to implement my if condition into a lambda function?
Advertisement
Answer
simply you can apply the regex b,?
, which means replace any value of b
and ,
found after the b
if exists
df['Column2'] = df.Column2.str.replace('b,?' , '') Out[238]: Column1 Column2 0 a a,c 1 y n,m 2 d n,n,m 3 d x