My string –
name_target = "ARUN GULABRAO INDULKAR"
I want to generate all permutations with the original name and abbreviations and assign weightages to each –
[ARUNGULABRAOINDULKAR, 1] [ARUNGINDULKAR, 0.9] [ARUNGULABRAOI, 0.9] [AGULABRAOINDULKAR, 0.9] [ARUNGI, 0.8] [AGINDULKAR, 0.8] [AGULABRAOI, 0.8] [ARUNINDULKARGULABRAO, 1] [ARUNIGULABRAO, 0.9] [ARUNINDULKARG, 0.9] [AINDULKARGULABRAO, 0.9] [ARUNIG, 0.8] [AIGULABRAO, 0.8] [AINDULKARG, 0.8] [GULABRAOARUNINDULKAR, 1] [GULABRAOAINDULKAR, 0.9] [GULABRAOARUNI, 0.9] [GARUNINDULKAR, 0.9] [GULABRAOAI, 0.8] [GAINDULKAR, 0.8] [GARUNI, 0.8] [GULABRAOINDULKARARUN, 1] [GULABRAOIARUN, 0.9] [GULABRAOINDULKARA, 0.9] [GINDULKARARUN, 0.9] [GULABRAOIA, 0.8] [GIARUN, 0.8] [GINDULKARA, 0.8] [INDULKARARUNGULABRAO, 1] [INDULKARAGULABRAO, 0.9] [INDULKARARUNG, 0.9] [IARUNGULABRAO, 0.9] [INDULKARAG, 0.8] [IAGULABRAO, 0.8] [IARUNG, 0.8] [INDULKARGULABRAOARUN, 1] [INDULKARGARUN, 0.9] [INDULKARGULABRAOA, 0.9] [IGULABRAOARUN, 0.9] [INDULKARGA, 0.8] [IGARUN, 0.8] [IGULABRAOA, 0.8]
Not concerned about this output data stucture, it can be anything. Weightage is 1
if no abbreviations and full names are used.
If an abbreviation is used, the weight gets decreased by 10%. For example ARUNGINDULKAR
in the 2nd output row got 0.9
because the middle name got abbreviated. ARUNGI
got 0.8
because middle name and last name got abbreviated.
I have effectively used itertools.permutations(name_target)
to generate the 1st set of permutations.
I am unable to wrap my head around how to combine the abbreviations. name_target
can be of variable length when split by ' '
Please ignore duplicates in the expected output.
Advertisement
Answer
You can use recursion with a generator to build the name abbreviation combinations. itertools.permutations
is also used to create all possible orderings of the original input names, and each of these full name combinations is passed to get_combos
, where the abbreviation combinations is produced. A boolean flag (True
for full name, False
for an abbreviation) is associated with each name component generated in get_combos
, allowing the weightage to be calculated:
from itertools import permutations as prmt def get_combos(d, l, c = []): if d: yield from get_combos(d[1:], l, c+[(d[0], True)]) if sum(not b for _, b in c) + 1 < l: yield from get_combos(d[1:], l, c+[(d[0][0], False)]) else: yield [''.join(a for a, _ in c), 1-sum(0.1 for _, b in c if not b)] name_target = "ARUN GULABRAO INDULKAR" n = name_target.split() l = len(n) result = [i for b in prmt(n, l) for i in get_combos(b, l)]
Output:
[['ARUNGULABRAOINDULKAR', 1], ['ARUNGULABRAOI', 0.9], ['ARUNGINDULKAR', 0.9], ['ARUNGI', 0.8], ['AGULABRAOINDULKAR', 0.9], ['AGULABRAOI', 0.8], ['AGINDULKAR', 0.8], ['ARUNINDULKARGULABRAO', 1], ['ARUNINDULKARG', 0.9], ['ARUNIGULABRAO', 0.9], ['ARUNIG', 0.8], ['AINDULKARGULABRAO', 0.9], ['AINDULKARG', 0.8], ['AIGULABRAO', 0.8], ['GULABRAOARUNINDULKAR', 1], ['GULABRAOARUNI', 0.9], ['GULABRAOAINDULKAR', 0.9], ['GULABRAOAI', 0.8], ['GARUNINDULKAR', 0.9], ['GARUNI', 0.8], ['GAINDULKAR', 0.8], ['GULABRAOINDULKARARUN', 1], ['GULABRAOINDULKARA', 0.9], ['GULABRAOIARUN', 0.9], ['GULABRAOIA', 0.8], ['GINDULKARARUN', 0.9], ['GINDULKARA', 0.8], ['GIARUN', 0.8], ['INDULKARARUNGULABRAO', 1], ['INDULKARARUNG', 0.9], ['INDULKARAGULABRAO', 0.9], ['INDULKARAG', 0.8], ['IARUNGULABRAO', 0.9], ['IARUNG', 0.8], ['IAGULABRAO', 0.8], ['INDULKARGULABRAOARUN', 1], ['INDULKARGULABRAOA', 0.9], ['INDULKARGARUN', 0.9], ['INDULKARGA', 0.8], ['IGULABRAOARUN', 0.9], ['IGULABRAOA', 0.8], ['IGARUN', 0.8]]