I’m trying to use LOC with an AND condition. It works fine with OR conditions, but I can’t get it to work with ANDs when there are duplicate values in a column.
def locreplace(df,col,needle,replace,needle2=''): if (needle2==''): df.loc[df[col].str.contains(needle, case=False)==True,col] = replace else: df.loc[[df[col].str.contains(needle, case=False)==True] and df[col].str.contains(needle2, case=False)==True,col] = replace
This table with no duplicates works as expected:
#Create a data frame data = [['granny apple', 'juicy'], ['blood orange', 'refreshing'], ['spanish lemon', 'tangy']] fruitdf = pd.DataFrame(data, columns = ['fruit', 'taste']) #Single replace - works #locreplace(fruitdf,'fruit','apple','big red nice apple') #Will fail - works #locreplace(fruitdf,'fruit','apple','big red apple','uncle') #Double replace - works locreplace(fruitdf,'fruit','apple','big huge red apple','granny')
But when you create a data frame with two “granny” entries the double replace AND condition replaces both instances of “granny” even though “apple” in the AND condition isn’t being matched.
data = [['granny apple', 'juicy'], ['granny blood orange', 'refreshing'], ['spanish lemon', 'tangy']] fruitdf = pd.DataFrame(data, columns = ['fruit', 'taste']) #Single replace - works #locreplace(fruitdf,'fruit','apple','big red nice apple') #Will fail - works #locreplace(fruitdf,'fruit','apple','big red apple','uncle') #Double replace - fails locreplace(fruitdf,'fruit','apple','big huge red apple','granny')
No doubt my fault, and a misplacing of brackets (or misunderstanding of code), but what is the correct way to achieve an AND condition replace with loc (or other easier method)?
Current output:
fruit taste 0 big huge red apple juicy 1 big huge red apple refreshing 2 spanish lemon tangy
Desired output:
fruit taste 0 big huge red apple juicy 1 granny blood orange refreshing 2 spanish lemon tangy
Advertisement
Answer
The issue is in the else
block in locreplace
[df[col].str.contains(needle, case=False)==True]
Which is a list with a series in the first index rather than a series. You need to remove the brackets and replace and
with &
df.loc[df[col].str.contains(needle, case=False) & df[col].str.contains(needle2, case=False), col] = replace
Output
fruit taste 0 big huge red apple juicy 1 granny blood orange refreshing 2 spanish lemon tangy