Divide into groups according to the specified attribute

Question

I need to group the data in such a way that if the difference between the adjacent values from column a1 was equal to the same pre-specified value, then they belong to the same group. If the value between two adjacent elements is different, then all subsequent data belong to a different group. For example, I have such a data

Accepted Answer

Assuming that your data frame is sorted by a1 and that I understood your problem correctly, I think you could do something like this:import pandas as pdimport numpy as npfrom numba import njitdata = [    [5, 2],    [100, 23],    [101, -2],    [303, 9],    [304, 4],    [709, 14],    [710, 3],    [711, 3],    [988, 21]]columns = ['a1', 'a2']df = pd.DataFrame(data=data, columns=columns)@njitdef get_groups(vals):    counter = 0    group = []    for i in range(len(vals)-1):        if vals[i+1]-vals[i] == 1:            group.append(counter)        else:            group.append(counter)            counter += 1    if vals[-1] - vals[-2] == 1: group.append(group[-1])    else: group.append(counter + 1)            return group      groups = get_groups(df['a1'].values)assert len(groups) == len(df)df['group'] = groupsfinal_ls = df.reset_index().groupby(['group']).agg({'index': list})['index'].to_list()final_ls------------------------------------------------------------[[0], [1, 2], [3, 4], [5, 6, 7], [8]]------------------------------------------------------------The njit decorator from numba makes the looping approach efficient.

Advertisement

Answer