Pandas unique values per row, variable number of columns with data

Question

Consider the below dataframe: Assuming my index is unique, I&#8217;m looking to retrieve the unique values per index row, to an output like the one below. I wish to keep the empty rows. I have a working, albeit slow, solution, see below. The output number order is not relevant, as long all values are presente…

Accepted Answer

Another option, albeit longer:outcome = (df.melt(ignore_index= False) # keep the index as a tracker             .reset_index()            # get the unique rows             .drop_duplicates(subset=['index','value'])             .dropna()            # use this to build the new column names             .assign(counter = lambda df: df.groupby('index').cumcount() + 1)             .pivot('index', 'counter', 'value')             .add_prefix('num')             .reindex(df.index)             .rename_axis(columns=None)) outcome     num1   num2   num30  111.0    NaN    NaN1  112.0  115.0    NaN2  113.0    NaN    NaN3    NaN    NaN    NaN4  118.0  110.0  117.0If you want it to exactly match your output, you can dump it into numpy, sort and return to pandas:pd.DataFrame(np.sort(outcome, axis = 1), columns = outcome.columns)    num1   num2   num30  111.0    NaN    NaN1  112.0  115.0    NaN2  113.0    NaN    NaN3    NaN    NaN    NaN4  110.0  117.0  118.0Another option is to do the sorting within numpy before reshaping in Pandas:(pd.DataFrame(np.sort(df, axis = 1))   .apply(pd.unique, axis=1)   .apply(pd.Series)   .dropna(how='all',axis=1)   .set_axis(['num1', 'num2','num3'], axis=1))     num1   num2   num30  111.0    NaN    NaN1  112.0  115.0    NaN2  113.0    NaN    NaN3    NaN    NaN    NaN4  110.0  117.0  118.0

Advertisement

Answer