Skip to content
Advertisement

Everything is str, but still get this error (python) unsupported operand type(s) for &: ‘str’ and ‘int’

tablet = ['ipad', 'tablet']
connection_issue = ['load', 'loading','error loading', 'connection issue']
blankQuestion = ['blank question', 'question loading', 'question does not load', 'question doesn't load','no question']
mocktest = ['mock test','mock tests']
    
df['bug_types'] = np.where(df['Ticket description'].str.contains(*tablet),'tablet',
                  np.where(df['Ticket description'].str.contains(*connection_issue),'connectionz',
                  np.where(df['Ticket description'].isin(blankQuestion),'blankQuestion',
                  np.where(df['Ticket description'].str.contains(*mocktest),'mock tests', 'others'))))

There is no string in connection_issue :(. And the code works fine for tablet i.e., if I just changed the .str.contains(*connection_issue) back to .isin(connection_issue), the rest, including .str.contains(*tablet), runs perfectly fine.

Advertisement

Answer

@matchifang has the right explanation as to why!

If you want to add more tags in the future based on different keywords, it would be good to have a more dynamic way of checking for tags, I recommend the following solution:

#!/usr/bin/env python
from collections import OrderedDict
import pandas as pd

tags_keywords = OrderedDict([
    ('tablet', ['ipad', 'tablet']),
    ('connection_issue', ['load', 'connection issue']),  # 'loading' and 'error loading' will be picked up by 'load'
    ('blank_question', ['blank question', 'question loading', 'question does not load', 'question doesn't load', 'no question']),
    ('mock_test', ['mock test']),  # 'mock tests' will be found by 'mock test'
    ('app_quit', ['quit']),  # 'quitting' will be picked up by 'quit'
    ('scoring', ['sas', 'decile', 'attainment', 'stars']),  # going to make everything lowercase for easier comparison
    ('alp', ['alp', 'atom learning point']),
    ('learning_journey', ['learning journey', 'world']),
    ('transcript', ['transcript', 'score card']),
    ('practice', ['practice', 'custom', 'suggested'])  # 'practice' covers 'suggested practice', 'custom practice', etc
])

df = pd.DataFrame([{
    'Ticket description': "I'm having trouble loading the world"
}, {
    'Ticket description': "ALPs make no sense at all"
}, {
    'Ticket description': "My attainment score keeps going up and down"
}, {
    'Ticket description': "I can't finish my MOCK TEST"
}])


df['bug_types'] = 'others'
for tag, keywords in tags_keywords.items():
    df.loc[df['Ticket description'].str.contains('|'.join(keywords), case=False), 'bug_types'] = tag

print(df)

In this solution, we’re making the “later” tags in the dict a higher priority, and we’re using an OrderedDict to guarantee that the order is respected (because regular dictionaries in Python don’t guarantee order). We’re creating the dict from a list of tuples because if we created it from another dict (and using a Python version <3.6) Python would first create the unordered dict so it couldn’t guarantee order either.

Then, we’re iterating over all the tag/keyword combinations, and looking for instances of any of the tags (which we’re putting together in the format @matchifang mentioned), but by adding case=False we’re making the search case-independent so both uppercase and lowercase values will match.

Advertisement