Skip to content
Advertisement

pandas: Explode (duplicate) by group

I have a df that looks like this :

df = pd.DataFrame(data={
    'id': ["idx1"],
    'test_A': ["ABC", ],
    'test_B': ["X"],
    'test_C': ["ABC / XYZ"],
    'test_D': ["ABC / JKL / XYZ"]})

It’s an example for one row but there are thousands of rows. I want to explode each value where there are multiple values in these four “TEST” columns ie. I want each one to duplicate the row for each one of the “test” that is the same and if there are many (separed by a “/”) i want to duplicate too. I want an “X” if the duplication contains nothing.

Of course there are a lot of other columns, for which the values doesn’t matter for the duplication: it should stay the same. I’m not sure how to explain it, but here’s an example of the output I want for the input I provided:

df = pd.DataFrame(data={
    'id': ["idx1", "idx1", "idx1"],
    'test_A': ["ABC", "X", "X"],
    'test_B': ["X", "X", "X"],
    'test_C': ["ABC", "XYZ", "X"],
    'test_D': ["ABC", "XYZ", "JKL"]})

Advertisement

Answer

Here is an alternative, using a helper function:

def split(df):
    return (df.apply(lambda c: c.str.split(' / '))  # split cells
              .apply(lambda x: x.explode().reset_index(drop=True)) # explode
              .fillna({c: 'X' for c in df.filter(like='test_').columns}) # fill missing test with X
              .ffill() # fill non-test columns
            )

## single row
split(df)

## multiple rows
df.groupby('id').apply(split).droplevel(0)

output:

     id test_A test_B test_C test_D
0  idx1    ABC      X    ABC    ABC
1  idx1      X      X    XYZ    JKL
2  idx1      X      X      X    XYZ

output on @jezrael’s better example:

     id test_A test_B test_C test_D
0  idx1    ABC      X    ABC    ABC
1  idx1      X      X    XYZ    JKL
2  idx1      X      X      X    XYZ
0  idx2    SSD   "ABC     aa    ABC
1  idx2      X    JKL      X    JKL
2  idx2      X    XYZ      X      X
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement