I have a dataframe (df1) where I would like to search each row for items from listA. If the dataframe has a row that contains ‘positive’ and one or more of the items from listA, I would like to generate another dataframe (df2) by adding a column called result, listing the listA item + present. Items in list A, may exist as a stand alone item in each row of df1 or they may exist as part of a larger string. I’ve tried using pandas.DataFrame.loc but I am only able to search through one column at a time which isn’t ideal.
JavaScript
x
8
1
df1 = pd.DataFrame({'column no': ['1', '2', '3', '4'],
2
'name': ['fred', 'sammy', 'tom', 'sam'],
3
'test': ['positive', 'positive', 'negative', 'negative'],
4
'date': ["15-'05", "13-'02", "12-'01", "29-'08"],
5
'food':['lemon-2.v4*?-10%;ham-12?-0%;orange?-58%', 'cake', 'cheese', 'eggs']})
6
7
listA = ["15-'05",'ham','tom','cake']
8
Output:
JavaScript
1
7
1
df2 = pd.DataFrame({'column no': ['1', '2', '3', '4'],
2
'name': ['fred', 'sammy', 'tom', 'sam'],
3
'test': ['positive', 'positive', 'negative', 'negative'],
4
'date': ["15-'05", "13-'02", "12-'01", "29-'08"],
5
'food':['lemon-2.v4*?-10%;ham-12?-0%;orange?-58%', 'cake', 'cheese', 'eggs'],
6
'result': ["15-'05, ham, present", "cake, present", 'tom, present', 'not found']})
7
Advertisement
Answer
Updated:
I have created a function first which is applied to every row (‘axis=1’) and the results are added to the result column.
JavaScript
1
10
10
1
def check_rows(row):
2
same_values = ', '.join([term for term in listA for substring in row.values if term in substring])
3
if same_values:
4
return same_values+", present"
5
else:
6
return 'not found'
7
8
df1['result'] = df1.apply(lambda x: check_rows(x), axis=1)
9
10
Output:
JavaScript
1
7
1
column no name test date food result
2
0 1 fred positive 15-'05 lemon-2.v4*?-10%;ham-12?-0%;orange?-58% 15-'05, ham, present
3
1 2 sammy positive 13-'02 cake cake, present
4
2 3 tom negative 12-'01 cheese tom, present
5
3 4 sam negative 29-'08 eggs not found
6
7