I want to convert this into a more readable for other programmers in the team, but I am not sure how properly refactor this function that merges two dict, and remove duplicates based on value
def mergeDict(json1, json2):
cveids = set([n['id'] for n in json1]).union(set([n['id'] for n in json2]))
jf1={s['id']:s['url'] for s in json1}
jf2={s['id']:s['url'] for s in json2}
return [{'id':cveid,'url':list(set(jf1.get(cveid,[])+jf2.get(cveid,[])))} for cveid in cveids]
```
Advertisement
Answer
When you are working with novice programmers, doing things separately is a nice start to make the code more understandable for them.
For example:
def mergeDict(json1, json2):
cveids1 = set([n['id'] for n in json1])
cveids2 = set([n['id'] for n in json2])
cveids = cveids1.union(cveids2)
jf1={s['id']:s['url'] for s in json1}
jf2={s['id']:s['url'] for s in json2}
def makeUniq(cveid):
urls1 = jf1.get(cveid, [])
urls2 = jf2.get(cveid, [])
urls = list(set(urls1 + urls2))
return { 'id': cveid, 'url': urls }
return [makeUniq(cveid) for cveid in sorted(list(cveids))]
A list comprehension is good for performance, but causes confusion doing complex things. For beginners, it’s good to work with a simple ‘for’ syntax (best), or using an auxiliary function to apply the comprehension (maybe), like I do.
Good names are expected too. I start thinking “what’s ‘cveids’? why not keys?”, but I kept your named variables because I don’t know your business, and these names may be expressive inside a context.