I have two python lists of strings that I would like to compare. The first is my main list, containing a series of long codes. The second is a list of partial strings.
JavaScript
x
5
1
input:
2
list1 = ['fda3232', 'fcg3224', 'kgj5543', '3323fda9832', 'ffz3392', '221gks9483', 'mnx8385', 'aaz9323', '332kgj4323']
3
4
list2 = ['fda', 'kgj', 'mxx', 'mnx']
5
The desired result is a mask of list 1, populated by the substrings from list two. If no match is found, list3 can return 0, np.nan, ‘-‘, or similar. In other words, I’m looking for the following:
JavaScript
1
4
1
output:
2
3
list3 = ['fda', np.nan, 'kgj', 'fda', np.nan, np.nan, 'mnx', np.nan, 'kgj']
4
With help from folks in another thread, I was able to get close. However, these results return the values in list1, but I would like my result to return the matching substring from list2.
JavaScript
1
6
1
solution 1:
2
list3 = [x if any(y in x for y in list2) else np.nan for x in list1]
3
4
solution 2:
5
list3 = np.where([np.sum(np.char.find(x, sub=list2)+1) for x in list1], list1, np.NaN)
6
Advertisement
Answer
You may use:
JavaScript
1
15
15
1
import numpy as np
2
3
list1 = ['fda3232', 'fcg3224', 'kgj5543', '3323fda9832', 'ffz3392', '221gks9483', 'mnx8385', 'aaz9323', '332kgj4323']
4
5
list2 = ['fda', 'kgj', 'mxx', 'mnx']
6
7
def isin(haystack):
8
for needle in list2:
9
if needle in haystack:
10
return needle
11
return np.nan
12
13
list3 = [isin(haystack) for haystack in list1]
14
print(list3)
15
Which yields
JavaScript
1
2
1
['fda', nan, 'kgj', 'fda', nan, nan, 'mnx', nan, 'kgj']
2
Your could even put it in a comprehension:
JavaScript
1
4
1
list3 = [result[0]
2
for haystack in list1
3
for result in [[needle for needle in list2 if needle in haystack] or [np.nan]]]
4