I am working on a search program over an inverted index. The index itself is a dictionary whose keys are terms and whose values are themselves dictionaries of short documents, with ID numbers as keys and their text content as values.
To perform an ‘AND’ search for two terms, I thus need to intersect their postings lists (dictionaries). What is a clear (not necessarily overly clever) way to do this in Python? I started out by trying it the long way with iter
:
p1 = index[term1] p2 = index[term2] i1 = iter(p1) i2 = iter(p2) while ... # not sure of the 'iter != end 'syntax in this case ...
Advertisement
Answer
In general, to construct the intersection of dictionaries in Python, you can first use the &
operator to calculate the intersection of sets of the dictionary keys (dictionary keys are set-like objects in Python 3):
dict_a = {"a": 1, "b": 2} dict_b = {"a": 2, "c": 3} intersection = dict_a.keys() & dict_b.keys() # {'a'}
On Python 2 you have to convert the dictionary keys to sets yourself:
keys_a = set(dict_a.keys()) keys_b = set(dict_b.keys()) intersection = keys_a & keys_b
Then given the intersection of the keys, you can then build the intersection of your values however is desired. You have to make a choice here, since the concept of set intersection doesn’t tell you what to do if the associated values differ. (This is presumably why the &
intersection operator is not defined directly for dictionaries in Python).
In this case it sounds like your values for the same key would be equal, so you can just choose the value from one of the dictionaries:
dict_of_dicts_a = {"a": {"x":1}, "b": {"y":3}} dict_of_dicts_b = {"a": {"x":1}, "c": {"z":4}} shared_keys = dict_of_dicts_a.keys() & dict_of_dicts_b.keys() # values equal so choose values from a: dict_intersection = {k: dict_of_dicts_a[k] for k in shared_keys } # {"a":{"x":1}}
Other reasonable methods of combining values would depend on the types of the values in your dictionaries, and what they represent. For example you might also want the union of values for shared keys of dictionaries of dictionaries. Since the union of dictionaries doesn’t depend on the values, it is well defined, and in python you can get it using the |
operator:
# union of values for each key in the intersection: dict_intersection_2 = { k: dict_of_dicts_a[k] | dict_of_dicts_b[k] for k in shared_keys }
Which in this case, with identical dictionary values for key "a"
in both, would be the same result.