Problem:
I am generating a search query from key=value pairs. The system being queried does not support searching by the same field twice. I need to generate all unique permutations (assuming that is the correct word) of the pairs so I can generate multiple queries.
Example query:
python test.py --search field_1="books" and (field_2="paper" or (field_2="abcd" and field_4="test")) and field_20=80 and field_20="443" and not field_13=test or field19="test" and field19="4"
Ignore the boolean operations. After parsing I end up with:
['field_1="books"', 'field_2="paper"', 'field_2="abcd"', 'field_4="test"', 'field_20="80"', 'field_20="443"', 'field_13="test"', 'field19="test"', 'field19="4"']
Number and name of fields used/re-used is user dependent. I wish use this list to generate the below.
Desired Output:
['field_1="books"', 'field_4="test"', 'field_13="test"', 'field_2="paper"', 'field_20="80"', 'field19="test"'] ['field_1="books"', 'field_4="test"', 'field_13="test"', 'field_2="abcd"', 'field_20="443"', 'field19="4"'] and so on...
Or a list of dicts is fine too. I just need every permutation where the same key (field_x) is not used twice in the same list.
Attempts:
Tried to break apart repeated fields and only generate permutations of repeats, then was going to append to the non-repeated fields. Seems way more involved than it should be.
repeat_pairs = [] once_pairs = [] for pair in search_pairs: key = pair.split('=')[0] if key in repeat_keys: repeat_pairs.append(pair) else: once_pairs.append(pair) print(search_pairs) def gen_queries(repeat_list): master_query_list = [] for item in repeat_list: tmp_list = repeat_list[:] key = item.split('=')[0] value = item.split('=')[1] build = [] build.append(item) tmp_list.remove(item) for sub in tmp_list: sub_key = sub.split('=')[0] sub_value = sub.split('=')[1] if key != sub_key: build.append(sub) tmp_list.remove(sub) master_query_list.append(build) master_query_list.sort() for item in master_query_list: print(item) gen_queries(repeat_pairs)
Outputs:
['field19="4"', 'field_2="paper"', 'field_20="80"', 'field_2="test"'] ['field19="test"', 'field_2="paper"', 'field_20="80"', 'field_2="test"'] ['field_20="443"', 'field_2="paper"', 'field_2="test"', 'field19="4"'] ['field_20="80"', 'field_2="paper"', 'field_2="test"', 'field19="4"'] ['field_2="abcd"', 'field_20="80"', 'field19="test"'] ['field_2="paper"', 'field_20="80"', 'field19="test"'] ['field_2="test"', 'field_20="80"', 'field19="test"']
This feels like something simple and doable with recursion but my brain just isn’t clicking.
Advertisement
Answer
Group these strings into “bins” by their key and compute a product of these bins:
conds = ['field_1="books"', 'field_2="paper"', 'field_2="abcd"', 'field_4="test"', 'field_20="80"', 'field_20="443"', 'field_13="test"', 'field19="test"', 'field19="4"'] from collections import defaultdict from itertools import product bins = defaultdict(list) for c in conds: k, _ = c.split('=') bins[k].append(c) for q in product(*bins.values()): print(q)
Result
('field_1="books"', 'field_2="paper"', 'field_4="test"', 'field_20="80"', 'field_13="test"', 'field19="test"') ('field_1="books"', 'field_2="paper"', 'field_4="test"', 'field_20="80"', 'field_13="test"', 'field19="4"') ('field_1="books"', 'field_2="paper"', 'field_4="test"', 'field_20="443"', 'field_13="test"', 'field19="test"') ('field_1="books"', 'field_2="paper"', 'field_4="test"', 'field_20="443"', 'field_13="test"', 'field19="4"') ('field_1="books"', 'field_2="abcd"', 'field_4="test"', 'field_20="80"', 'field_13="test"', 'field19="test"') ('field_1="books"', 'field_2="abcd"', 'field_4="test"', 'field_20="80"', 'field_13="test"', 'field19="4"') ('field_1="books"', 'field_2="abcd"', 'field_4="test"', 'field_20="443"', 'field_13="test"', 'field19="test"') ('field_1="books"', 'field_2="abcd"', 'field_4="test"', 'field_20="443"', 'field_13="test"', 'field19="4"')