Counting repeated pairs in a list

I have an assignment that has a data mining element. I need to find which authors collaborate the most across several publication webpages.

I’ve scraped the webpages and compiled the author text into a list.

My current output looks like this:

for author in list:
   print(author)

##output :
['Author 1', 'Author 2', 'Author 3']
['Author 2', 'Author 4', 'Author 1']
['Author 1', 'Author 5', 'Author 6', 'Author 7', 'Author 4']

JavaScript
​x
 
for author in list:
   print(author)
​
##output :
['Author 1', 'Author 2', 'Author 3']
['Author 2', 'Author 4', 'Author 1']
['Author 1', 'Author 5', 'Author 6', 'Author 7', 'Author 4']
​

etc for ~100 more rows.

My idea is, for in each section of the list, to produce another list that contains each of the unique pairs in that list. E.g. the third demo row would give ‘Author 1 + Author 5’, ‘Author 1 + Author 6’, ‘Author 1 + Author 7’, ‘Author 1 + Author 4’, ‘Author 5 + Author 6’, ‘Author 5 + Author 7’, ‘Author 5 + Author 4’, ‘Author 6 + Author 7’, ‘Author 6 + Author 4’, ‘Author 7 + Author 4’. Then I’d append these pairs lists to one large list and put it through a counter to see which pairs came up the most.

The problem is I’m just not sure how to actually implement that pair matcher, so if anyone has any pointers that would be great. I’m sure it can’t be that complicated an answer, but I’ve been unable to find it. Alternative ideas on how to measure collaboration would be good too.

Answer

It seems like you want to generate all subsets of size 2 for a given list. itertools will do just that:

import itertools
for author in lists:
    a = list(itertools.combinations(author, 2))
    print(a)

JavaScript
 
import itertools
for author in lists:
    a = list(itertools.combinations(author, 2))
    print(a)
​

Advertisement

Answer