I’m trying to use LOC with an AND condition. It works fine with OR conditions, but I can’t get it to work with ANDs when there are duplicate values in a column.
def locreplace(df,col,needle,replace,needle2=''):
if (needle2==''):
df.loc[df[col].str.contains(needle, case=False)==True,col] = replace
else:
df.loc[[df[col].str.contains(needle, case=False)==True] and df[col].str.contains(needle2, case=False)==True,col] = replace
This table with no duplicates works as expected:
#Create a data frame
data = [['granny apple', 'juicy'], ['blood orange', 'refreshing'], ['spanish lemon', 'tangy']]
fruitdf = pd.DataFrame(data, columns = ['fruit', 'taste'])
#Single replace - works
#locreplace(fruitdf,'fruit','apple','big red nice apple')
#Will fail - works
#locreplace(fruitdf,'fruit','apple','big red apple','uncle')
#Double replace - works
locreplace(fruitdf,'fruit','apple','big huge red apple','granny')
But when you create a data frame with two “granny” entries the double replace AND condition replaces both instances of “granny” even though “apple” in the AND condition isn’t being matched.
data = [['granny apple', 'juicy'], ['granny blood orange', 'refreshing'], ['spanish lemon', 'tangy']]
fruitdf = pd.DataFrame(data, columns = ['fruit', 'taste'])
#Single replace - works
#locreplace(fruitdf,'fruit','apple','big red nice apple')
#Will fail - works
#locreplace(fruitdf,'fruit','apple','big red apple','uncle')
#Double replace - fails
locreplace(fruitdf,'fruit','apple','big huge red apple','granny')
No doubt my fault, and a misplacing of brackets (or misunderstanding of code), but what is the correct way to achieve an AND condition replace with loc (or other easier method)?
Current output:
fruit taste
0 big huge red apple juicy
1 big huge red apple refreshing
2 spanish lemon tangy
Desired output:
fruit taste
0 big huge red apple juicy
1 granny blood orange refreshing
2 spanish lemon tangy
Advertisement
Answer
The issue is in the else
block in locreplace
[df[col].str.contains(needle, case=False)==True]
Which is a list with a series in the first index rather than a series. You need to remove the brackets and replace and
with &
df.loc[df[col].str.contains(needle, case=False) & df[col].str.contains(needle2, case=False), col] = replace
Output
fruit taste
0 big huge red apple juicy
1 granny blood orange refreshing
2 spanish lemon tangy