I have a list A:
A = [['512', '102'] ['410', '105'] ['820', '520']]
And list B:
B = [['510', '490', '512', '912'] ['512', '108', '102', '520' , '901', '821'] ['510', '118', '284']]
I would like to leave only these rows in list A, that all values of which are contained in at least one row from list B. So my expected output is:
[['512', '102']]
Because values ‘512’ and ‘102’ are in second row of list B.
I know how to achieve that by iterating or every item in list A and compare with every element in list B but the problem is that I have ~500000 rows in list A and ~10000 rows in list B and it is extremely slow.
Is there a way to achieve that in a more optimal way?
Advertisement
Answer
You must definitely work with sets here, as they are much faster than lists.
Here is one solution:
[i for i in A if any(set(i)-set(k)==set() for k in B)]
result
[['512', '102']]
Explanation:
set(i)-set(k)==set()
checks if all items of i are included in k
any(set(i)-set(k)==set() for k in B)
checks if the above is valid for any item of B for specific item of A and finally
[i for i in A if any(set(i)-set(k)==set() for k in B)]
returns all items of A that satisfy the above condition