Skip to content
Advertisement

How to create a dictionnary whose key:value pairs are the values of two different lists of dictionnaries?

I have 2 lists of dictionnaries that result from a pymongo extraction.

A list of dicts containing id’s (string) and lemmas (strings):

lemmas = [{'id': 'id1', 'lemma': 'lemma1'}, {'id': 'id2', 'lemma': 'lemma2'}, {'id': 'id3', 'lemma': 'lemma3'}, ...]

A list of dicts containing id’s and multiple words per id:

words = [{'id': 'id1', 'word': 'word1.1'}, {'id': 'id1', 'word': 'word1.2'}, {'id': 'id2', 'word': 'word2.1'}, {'id': 'id3', 'word': 'word3.1'}, {'id': 'id3', 'word': 'word3.2'}, ...]

As you can see, the two lists of dictionnaries are of different length, as there are multiple words associated with each id, but only one lemma.

My goal here is to obtain a dictionnary whose key:value pairs correspond to word:lemma values for the words and lemmas that have the same id. This way, i can replace every word for the corresponding lemma in a text that i am analyzing. For example:

word_lemma_dict = {'word1.1': 'lemma1', 'word1.2': 'lemma1', 'word2.1': 'lemma2', 'word3.1': 'lemma3'; 'word3.2': 'lemma3', ...}

Is there a simple way to do this?

The best i could achieve was to use 2 for loops, but it’s not very “pythonistic”:

id_lemma_dict = {}
word_lemma_dict = {}

for dico in lemmas:
    id_lemma_dict[dico['id']] = dico['lemma']  # create id:lemma dict from list of dicts

for dico in words:
    word_lemma_dict[dico['word']] = id_lemma_dict[dico['id']]

print(word_lemma_dict)

Advertisement

Answer

Here’s an option with comprehensions:

lemmas = [{"id": "id1", "lemma":"lemma1"}, {"id": "id2", "lemma":"lemma2"}, {"id": "id3", "lemma": "lemma3"}]
words = [{"id": "id1", "word": "word1.1"}, {"id": "id1", "word": "word1.2"}, {"id": "id2", "word": "word2.1"}, {"id": "id3", "word": "word3.1"}, {"id": "id3", "word": "word3.2"}]

lemmas_dict = {item["id"]: item["lemma"] for item in lemmas}
word_to_lemma = {word['word']: lemmas_dict[word['id']] for word in words}

print(word_to_lemma)

Output:

{'word1.1': 'lemma1', 'word1.2': 'lemma1', 'word2.1': 'lemma2', 'word3.1': 'lemma3', 'word3.2': 'lemma3'}
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement