using map/reduce on lists of lists

Question

I have a very large list of lists, and I want to use map/reduce techniques (in Python/PySpark), in an efficient way, to calculate the PageRank of the network made of the elements in the list of lists that sharing a list means a link between them. I have no clue how to deal with the elements in the lists because

Accepted Answer

First I thought of using map and then reduce for removing same pairs but below solution using itertools also seemed fine to meimport itertoolsdata = [["n1", "n2"], ["n1", "n3", "n4", "n5"], ["n2", "n5", "n7"]]rd=sc.parallelize(data)rd=rd.flatMap(lambda x:itertools.combinations(x,2))rd.collect()#outputOut[60]: [('n1', 'n2'), ('n1', 'n3'), ('n1', 'n4'), ('n1', 'n5'), ('n3', 'n4'), ('n3', 'n5'), ('n4', 'n5'), ('n2', 'n5'), ('n2', 'n7'), ('n5', 'n7')]

Advertisement

Answer