I have a dataframe (df1) where I would like to search each row for items from listA. If the dataframe has a row that contains ‘positive’ and one or more of the items from listA, I would like to generate another dataframe (df2) by adding a column called result, listing the listA item + present. Items in list A, may exist as a stand alone item in each row of df1 or they may exist as part of a larger string. I’ve tried using pandas.DataFrame.loc but I am only able to search through one column at a time which isn’t ideal.
df1 = pd.DataFrame({'column no': ['1', '2', '3', '4'], 'name': ['fred', 'sammy', 'tom', 'sam'], 'test': ['positive', 'positive', 'negative', 'negative'], 'date': ["15-'05", "13-'02", "12-'01", "29-'08"], 'food':['lemon-2.v4*?-10%;ham-12?-0%;orange?-58%', 'cake', 'cheese', 'eggs']}) listA = ["15-'05",'ham','tom','cake']
Output:
df2 = pd.DataFrame({'column no': ['1', '2', '3', '4'], 'name': ['fred', 'sammy', 'tom', 'sam'], 'test': ['positive', 'positive', 'negative', 'negative'], 'date': ["15-'05", "13-'02", "12-'01", "29-'08"], 'food':['lemon-2.v4*?-10%;ham-12?-0%;orange?-58%', 'cake', 'cheese', 'eggs'], 'result': ["15-'05, ham, present", "cake, present", 'tom, present', 'not found']})
Advertisement
Answer
Updated:
I have created a function first which is applied to every row (‘axis=1’) and the results are added to the result column.
def check_rows(row): same_values = ', '.join([term for term in listA for substring in row.values if term in substring]) if same_values: return same_values+", present" else: return 'not found' df1['result'] = df1.apply(lambda x: check_rows(x), axis=1)
Output:
column no name test date food result 0 1 fred positive 15-'05 lemon-2.v4*?-10%;ham-12?-0%;orange?-58% 15-'05, ham, present 1 2 sammy positive 13-'02 cake cake, present 2 3 tom negative 12-'01 cheese tom, present 3 4 sam negative 29-'08 eggs not found