Adding an increment to duplicates within a python dataframe

Question

I'm looking to concatenate two columns in data frame and, where there are duplicates, append an integer number at the end. The wrinkle here is that I will keep receiving feeds of data and the increment needs to be aware of historical values that were generated and not reuse them. I've been trying to do this with an apply function

Accepted Answer

I&#8217;m not completely sure what you want to achieve, but you can update blacklist in the process. blacklist is just a pointer to the actual list data. If you slightly modify gen_summary by adding blacklist.append(summary) before the return statementdef gen_summary(color, car, blacklist):    ...            exists = False  # Exit this loop    blacklist.append(summary)    return summaryyou will get following result   color         car          summary0    Red      Toyota       RedToyota11   Blue  Volkswagon  BlueVolkswagon32   Blue  Volkswagon  BlueVolkswagon43  Green     Hyundai     GreenHyundaiGrouping would be a bit more efficient. This should produce the same result:def gen_summary(ser, blacklist):    color_car = ser.iat[0]    summary = color_car    increment = 0    exists = True    while exists:        if summary in blacklist:            increment += 1            summary = color_car + str(increment)  # Append increment if in burn list        else:            exists = False  # Exit this loop    return ([color_car + ('' if increment == 0 else str(increment))]            + [color_car + str(i + increment) for i in range(1, len(ser))])df['summary'] = df['color'] + df['car']df['summary'] = df.groupby(['color', 'car']).transform(gen_summary, blacklist)Is that the result you are looking for? If yes, I&#8217;d like to add a suggestion for optimising your approach: Use a dictionary instead of a list for blacklist:def gen_summary(color, car, blacklist):    key = color + car    num = blacklist.get(key, -1) + 1    blacklist[key] = num    return key if num == 0 else f'{key}{num}'blacklist = {'RedToyota': 0, 'BlueVolkswagon': 2}or with groupingdef gen_summary(ser, blacklist):    key = ser.iat[0]    num = blacklist.get(key, -1) + 1    return ([f'{key}{"" if num == 0 else num}']            + [f'{key}{i + num}' for i in range(1, len(ser))])blacklist = {'RedToyota': 0, 'BlueVolkswagon': 2}df['summary'] = df['color'] + df['car']df['summary'] = df.groupby(['color', 'car']).transform(gen_summary, blacklist)should produce the same result without the while-loop and a much faster lookup.

Advertisement

Answer