I am working on some sentence formation like this:
sentence = "PERSON is ADJECTIVE" dictionary = {"PERSON": ["Alice", "Bob", "Carol"], "ADJECTIVE": ["cute", "intelligent"]}
I would now need all possible combinations to form this sentence from the dictionary, like:
Alice is cute Alice is intelligent Bob is cute Bob is intelligent Carol is cute Carol is intelligent
The above use case was relatively simple, and it was done with the following code
dictionary = {"PERSON": ["Alice", "Bob", "Carol"], "ADJECTIVE": ["cute", "intelligent"]} for i in dictionary["PERSON"]: for j in dictionary["ADJECTIVE"]: print(f"{i} is {j}")
But can we also make this scale up for longer sentences?
Example:
sentence = "PERSON is ADJECTIVE and is from COUNTRY" dictionary = {"PERSON": ["Alice", "Bob", "Carol"], "ADJECTIVE": ["cute", "intelligent"], "COUNTRY": ["USA", "Japan", "China", "India"]}
This should again provide all possible combinations like:
Alice is cute and is from USA Alice is intelligent and is from USA . . . . Carol is intelligent and is from India
I tried to use https://www.pythonpool.com/python-permutations/ , but the sentence are all are mixed up – but how can we make a few words fixed, like in this example the words "and is from"
is fixed
Essentially if any key in the dictionary is equal to the word in the string, then the word should be replaced by the list of values from the dictionary.
Any thoughts would be really helpful.
Advertisement
Answer
I would base my answer off of two building blocks itertools.product
and zip
.
itertools.product
will allow us to get the various combinations of our dictionary list values
zip
with the original keys and the combinations above will allow us to create a list of tuples that we can use with replace
.
import itertools sentence = "PERSON is ADJECTIVE and is from COUNTRY" dictionary = {"PERSON": ["Alice", "Bob", "Carol"], "ADJECTIVE": ["cute", "intelligent"], "COUNTRY": ["USA", "Japan", "China", "India"]} keys = dictionary.keys() for values in itertools.product(*dictionary.values()): new_sentence = sentence for tpl in zip(keys, values): new_sentence = new_sentence.replace(*tpl) print(new_sentence)
IF you happen to have the ability to control the “sentence” template, and you can do:
sentence = "{PERSON} is {ADJECTIVE} and is from {COUNTRY}"
Then you can simplify this to:
sentence = "{PERSON} is {ADJECTIVE} and is from {COUNTRY}" dictionary = {"PERSON": ["Alice", "Bob", "Carol"], "ADJECTIVE": ["cute", "intelligent"], "COUNTRY": ["USA", "Japan", "China", "India"]} keys = dictionary.keys() for values in itertools.product(*dictionary.values()): new_sentence = sentence.format(**dict(zip(keys, values))) print(new_sentence)
both should give you the results like:
Alice is cute and is from USA Alice is cute and is from Japan ... Carol is intelligent and is from China Carol is intelligent and is from India
Note that the order of appearance in the template is not important and both solutions should work with a template of:
sentence = "PERSON is from COUNTRY and is ADJECTIVE"
or in case 2
sentence = "{PERSON} is from {COUNTRY} and is {ADJECTIVE}"
Followup:
What happens if there is a chance that the dictionary contains items that are not in the sentence template? At the moment, that is not ideal as the way the sentences are generated with product()
assumes that all keys are and we currently would generate duplicates.
The easiest fix for that would be to just ensure that the dictionary only has keys of interest…
In the first case this might do that.
dictionary = {key: value for key, value in dictionary.items() if key in sentence}
or in the second case:
dictionary = {key: value for key, value in dictionary.items() if f"{{{key}}}" in sentence}