I want to convert this into a more readable for other programmers in the team, but I am not sure how properly refactor this function that merges two dict, and remove duplicates based on value
def mergeDict(json1, json2): cveids = set([n['id'] for n in json1]).union(set([n['id'] for n in json2])) jf1={s['id']:s['url'] for s in json1} jf2={s['id']:s['url'] for s in json2} return [{'id':cveid,'url':list(set(jf1.get(cveid,[])+jf2.get(cveid,[])))} for cveid in cveids] ```
Advertisement
Answer
When you are working with novice programmers, doing things separately is a nice start to make the code more understandable for them.
For example:
def mergeDict(json1, json2): cveids1 = set([n['id'] for n in json1]) cveids2 = set([n['id'] for n in json2]) cveids = cveids1.union(cveids2) jf1={s['id']:s['url'] for s in json1} jf2={s['id']:s['url'] for s in json2} def makeUniq(cveid): urls1 = jf1.get(cveid, []) urls2 = jf2.get(cveid, []) urls = list(set(urls1 + urls2)) return { 'id': cveid, 'url': urls } return [makeUniq(cveid) for cveid in sorted(list(cveids))]
A list comprehension is good for performance, but causes confusion doing complex things. For beginners, it’s good to work with a simple ‘for’ syntax (best), or using an auxiliary function to apply the comprehension (maybe), like I do.
Good names are expected too. I start thinking “what’s ‘cveids’? why not keys?”, but I kept your named variables because I don’t know your business, and these names may be expressive inside a context.