Skip to content
Advertisement

Generating Maximal Subsets of a Set Under Some Constraints in Python

I have a set of attributes A= {a1, a2, ...an} and a set of clusters C = {c1, c2, ... ck} and I have a set of correspondences COR which is a subset of A x C and |COR|<< A x C. Here is a sample set of correspondences

COR = {(a1, c1), (a1, c2), (a2, c1), (a3, c3), (a4, c4)}

Now, I want to generate all the subsets of COR such that each pair in the subset represents an injective function from set A to set C. Let’s call each of such subset a mapping then the valid mappings from the above set COR would be
m1 = {(a1, c1), (a3, c3), (a4, c4)} and m2 = {(a1, c2), (a2, c1), (a3, c3), (a4, c4)}
m1 is interesting here because adding any of the remaining elements from COR to m1 would either violate the definition of the function or it would violate the condition of being an injective function. For instance, if we add the pair (a1,c2) to m1, m1 would not be a function anymore and if we add (a2,c1) to m1, it will cease to be an injective function. So, I am interested in some code snippets or algorithm that I can use to generate all such mappings. Here is what I have tried so far in python import collections import itertools

JavaScript

And as expected the code produces m2 but not m1 because m1 will not be generated from itertools.product. Can anyone guide me on this? I would also like some guidance on performance because the actual sets would be larger than COR set used here and may contain many more conflicting sets.

Advertisement

Answer

A simpler definition of your requirements is:

  • You have a set of unique tuples.
  • You want to generate all subsets for which:
    • all of the first elements of the tuples are unique (to ensure a function);
    • and all of the second elements are unique (to ensure injectivity).
  • Your title suggests you only want the maximal subsets, i.e. it must be impossible to add any additional elements from the original set without breaking the other requirements.

I’m also assuming any a<x> or c<y> is unique.

Here’s a solution:

JavaScript

The test is_injective_function checks if the provided set f represents a valid injective function, by getting all the values from the domain and range of the function and checking that both only contain unique values.

The generator takes an f, and if it represents an injective valid function, it checks to see that none of the elements that have been removed from the original corr to reach f can be added back in while still having it represent an injective valid function. If that’s the case, it yields f as a valid result.

If f isn’t an injective valid function to begin with, it will try to remove each of the elements in f in turn and generate any injective valid functions from each of those subsets.

Finally, the whole function removes duplicates from the resulting generator and returns it as a list of unique sets.

Output:

JavaScript

Note, there’s several approaches to deduplicating a list of non-hashable values, but this approach turns all the sets in the list into a frozenset to make them hashable, then turns the list into a set to remove duplicates, then turns the contents into sets again and returns the result as a list.

You can prevent removing duplicates at the end by keeping track of what removed subsets have already been tried, which may perform better depending on your actual data set:

JavaScript

This is probably a generally better performing solution, but I liked the clean algorithm of the first one better for explanation.

I was annoyed by the slowness of the above solution after the comment asking whether it scales up to 100 elements with ~15 conflicts (it would run for many minutes to solve it), so here’s a faster solution that runs under 1 second for 100 elements with 15 conflicts, although the execution time still goes up exponentially, so it has its limits):

JavaScript
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement