I have a dataframe (df1) where I would like to search each row for items from listA. If the dataframe has a row that contains ‘positive’ and one or more of the items from listA, I would like to generate another dataframe (df2) by adding a column called result, listing the listA item + present. Items in list A, may exist as a stand alone item in each row of df1 or they may exist as part of a larger string. I’ve tried using pandas.DataFrame.loc but I am only able to search through one column at a time which isn’t ideal.
df1 = pd.DataFrame({'column no': ['1', '2', '3', '4'],
'name': ['fred', 'sammy', 'tom', 'sam'],
'test': ['positive', 'positive', 'negative', 'negative'],
'date': ["15-'05", "13-'02", "12-'01", "29-'08"],
'food':['lemon-2.v4*?-10%;ham-12?-0%;orange?-58%', 'cake', 'cheese', 'eggs']})
listA = ["15-'05",'ham','tom','cake']
Output:
df2 = pd.DataFrame({'column no': ['1', '2', '3', '4'],
'name': ['fred', 'sammy', 'tom', 'sam'],
'test': ['positive', 'positive', 'negative', 'negative'],
'date': ["15-'05", "13-'02", "12-'01", "29-'08"],
'food':['lemon-2.v4*?-10%;ham-12?-0%;orange?-58%', 'cake', 'cheese', 'eggs'],
'result': ["15-'05, ham, present", "cake, present", 'tom, present', 'not found']})
Advertisement
Answer
Updated:
I have created a function first which is applied to every row (‘axis=1’) and the results are added to the result column.
def check_rows(row):
same_values = ', '.join([term for term in listA for substring in row.values if term in substring])
if same_values:
return same_values+", present"
else:
return 'not found'
df1['result'] = df1.apply(lambda x: check_rows(x), axis=1)
Output:
column no name test date food result 0 1 fred positive 15-'05 lemon-2.v4*?-10%;ham-12?-0%;orange?-58% 15-'05, ham, present 1 2 sammy positive 13-'02 cake cake, present 2 3 tom negative 12-'01 cheese tom, present 3 4 sam negative 29-'08 eggs not found