Everything is str, but still get this error (python) unsupported operand type(s) for &: ‘str’ and ‘int’

Question

There is no string in connection_issue :(. And the code works fine for tablet i.e., if I just changed the .str.contains(*connection_issue) back to .isin(connection_issue), the rest, including .str.contains(*tablet), runs perfectly fine. Answer @matchifang has the right explanation as to why! If you want to add more tags in the future based on different keywords, it would be good to have

Accepted Answer

@matchifang has the right explanation as to why!If you want to add more tags in the future based on different keywords, it would be good to have a more dynamic way of checking for tags, I recommend the following solution:#!/usr/bin/env pythonfrom collections import OrderedDictimport pandas as pdtags_keywords = OrderedDict([    ('tablet', ['ipad', 'tablet']),    ('connection_issue', ['load', 'connection issue']),  # 'loading' and 'error loading' will be picked up by 'load'    ('blank_question', ['blank question', 'question loading', 'question does not load', 'question doesn't load', 'no question']),    ('mock_test', ['mock test']),  # 'mock tests' will be found by 'mock test'    ('app_quit', ['quit']),  # 'quitting' will be picked up by 'quit'    ('scoring', ['sas', 'decile', 'attainment', 'stars']),  # going to make everything lowercase for easier comparison    ('alp', ['alp', 'atom learning point']),    ('learning_journey', ['learning journey', 'world']),    ('transcript', ['transcript', 'score card']),    ('practice', ['practice', 'custom', 'suggested'])  # 'practice' covers 'suggested practice', 'custom practice', etc])df = pd.DataFrame([{    'Ticket description': "I'm having trouble loading the world"}, {    'Ticket description': "ALPs make no sense at all"}, {    'Ticket description': "My attainment score keeps going up and down"}, {    'Ticket description': "I can't finish my MOCK TEST"}])df['bug_types'] = 'others'for tag, keywords in tags_keywords.items():    df.loc[df['Ticket description'].str.contains('|'.join(keywords), case=False), 'bug_types'] = tagprint(df)In this solution, we&#8217;re making the &#8220;later&#8221; tags in the dict a higher priority, and we&#8217;re using an OrderedDict to guarantee that the order is respected (because regular dictionaries in Python don&#8217;t guarantee order). We&#8217;re creating the dict from a list of tuples because if we created it from another dict (and using a Python version <3.6) Python would first create the unordered dict so it couldn&#8217;t guarantee order either.Then, we&#8217;re iterating over all the tag/keyword combinations, and looking for instances of any of the tags (which we&#8217;re putting together in the format @matchifang mentioned), but by adding case=False we&#8217;re making the search case-independent so both uppercase and lowercase values will match.

Advertisement

Answer