Skip to content
Advertisement

Generate all permutations including abbreviations with weightages

My string –

name_target = "ARUN GULABRAO INDULKAR"

I want to generate all permutations with the original name and abbreviations and assign weightages to each –

[ARUNGULABRAOINDULKAR, 1]
[ARUNGINDULKAR, 0.9]
[ARUNGULABRAOI, 0.9]
[AGULABRAOINDULKAR, 0.9]
[ARUNGI, 0.8]
[AGINDULKAR, 0.8]
[AGULABRAOI, 0.8]
[ARUNINDULKARGULABRAO, 1]
[ARUNIGULABRAO, 0.9]
[ARUNINDULKARG, 0.9]
[AINDULKARGULABRAO, 0.9]
[ARUNIG, 0.8]
[AIGULABRAO, 0.8]
[AINDULKARG, 0.8]
[GULABRAOARUNINDULKAR, 1]
[GULABRAOAINDULKAR, 0.9]
[GULABRAOARUNI, 0.9]
[GARUNINDULKAR, 0.9]
[GULABRAOAI, 0.8]
[GAINDULKAR, 0.8]
[GARUNI, 0.8]
[GULABRAOINDULKARARUN, 1]
[GULABRAOIARUN, 0.9]
[GULABRAOINDULKARA, 0.9]
[GINDULKARARUN, 0.9]
[GULABRAOIA, 0.8]
[GIARUN, 0.8]
[GINDULKARA, 0.8]
[INDULKARARUNGULABRAO, 1]
[INDULKARAGULABRAO, 0.9]
[INDULKARARUNG, 0.9]
[IARUNGULABRAO, 0.9]
[INDULKARAG, 0.8]
[IAGULABRAO, 0.8]
[IARUNG, 0.8]
[INDULKARGULABRAOARUN, 1]
[INDULKARGARUN, 0.9]
[INDULKARGULABRAOA, 0.9]
[IGULABRAOARUN, 0.9]
[INDULKARGA, 0.8]
[IGARUN, 0.8]
[IGULABRAOA, 0.8]

Not concerned about this output data stucture, it can be anything. Weightage is 1 if no abbreviations and full names are used.

If an abbreviation is used, the weight gets decreased by 10%. For example ARUNGINDULKAR in the 2nd output row got 0.9 because the middle name got abbreviated. ARUNGI got 0.8 because middle name and last name got abbreviated.

I have effectively used itertools.permutations(name_target) to generate the 1st set of permutations.

I am unable to wrap my head around how to combine the abbreviations. name_target can be of variable length when split by ' '

Please ignore duplicates in the expected output.

Advertisement

Answer

You can use recursion with a generator to build the name abbreviation combinations. itertools.permutations is also used to create all possible orderings of the original input names, and each of these full name combinations is passed to get_combos, where the abbreviation combinations is produced. A boolean flag (True for full name, False for an abbreviation) is associated with each name component generated in get_combos, allowing the weightage to be calculated:

from itertools import permutations as prmt
def get_combos(d, l, c = []):
   if d:
      yield from get_combos(d[1:], l, c+[(d[0], True)])
      if sum(not b for _, b in c) + 1 < l:
         yield from get_combos(d[1:], l, c+[(d[0][0], False)])
   else:
      yield [''.join(a for a, _ in c), 1-sum(0.1 for _, b in c if not b)]

name_target = "ARUN GULABRAO INDULKAR"
n = name_target.split()
l = len(n)
result = [i for b in prmt(n, l) for i in get_combos(b, l)]

Output:

[['ARUNGULABRAOINDULKAR', 1], ['ARUNGULABRAOI', 0.9], ['ARUNGINDULKAR', 0.9], ['ARUNGI', 0.8], ['AGULABRAOINDULKAR', 0.9], ['AGULABRAOI', 0.8], ['AGINDULKAR', 0.8], ['ARUNINDULKARGULABRAO', 1], ['ARUNINDULKARG', 0.9], ['ARUNIGULABRAO', 0.9], ['ARUNIG', 0.8], ['AINDULKARGULABRAO', 0.9], ['AINDULKARG', 0.8], ['AIGULABRAO', 0.8], ['GULABRAOARUNINDULKAR', 1], ['GULABRAOARUNI', 0.9], ['GULABRAOAINDULKAR', 0.9], ['GULABRAOAI', 0.8], ['GARUNINDULKAR', 0.9], ['GARUNI', 0.8], ['GAINDULKAR', 0.8], ['GULABRAOINDULKARARUN', 1], ['GULABRAOINDULKARA', 0.9], ['GULABRAOIARUN', 0.9], ['GULABRAOIA', 0.8], ['GINDULKARARUN', 0.9], ['GINDULKARA', 0.8], ['GIARUN', 0.8], ['INDULKARARUNGULABRAO', 1], ['INDULKARARUNG', 0.9], ['INDULKARAGULABRAO', 0.9], ['INDULKARAG', 0.8], ['IARUNGULABRAO', 0.9], ['IARUNG', 0.8], ['IAGULABRAO', 0.8], ['INDULKARGULABRAOARUN', 1], ['INDULKARGULABRAOA', 0.9], ['INDULKARGARUN', 0.9], ['INDULKARGA', 0.8], ['IGULABRAOARUN', 0.9], ['IGULABRAOA', 0.8], ['IGARUN', 0.8]]
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement