pandas assign across multiple columns functionally

Question

Is there a way, in pandas, to apply a function to some chosen columns, while strictly keeping a functional pipeline (no border effects,no assignation before the result, the result of the function only depends of its arguments, and I don't want to drop the other columns). Ie, what is the equivalent of across in R ? In R, I would

Accepted Answer

One option is to unpack the computation within assign:(df.assign(**df.loc(axis=1)[['column_a', 'column_c']].add(20),         **df.loc[:, ['column_e', 'column_b']].div(3)))   column_a  column_b  column_c  column_d  column_e0        20  0.333333        22         2 -0.3333331        23  0.666667        24         4 -2.3333332        24  1.333333        45        -1 -2.6666673        22  1.666667        45         5 -3.0000004        21  6.000000        46         2  1.000000For readability purposes, I&#8217;d suggest splitting it up:first = df.loc(axis=1)[['column_a', 'column_c']].add(20)second = df.loc[:, ['column_e', 'column_b']].div(3)df.assign(**first, **second)   column_a  column_b  column_c  column_d  column_e0        20  0.333333        22         2 -0.3333331        23  0.666667        24         4 -2.3333332        24  1.333333        45        -1 -2.6666673        22  1.666667        45         5 -3.0000004        21  6.000000        46         2  1.000000Another option, still with the unpacking idea is to iterate through the columns, based on the pattern:mapper = {key : value.add(20)           if key.endswith(('a','c'))           else value.div(3)           if key.endswith(('e','b'))           else value           for key, value           in df.items()}df.assign(**mapper)   column_a  column_b  column_c  column_d  column_e0        20  0.333333        22         2 -0.3333331        23  0.666667        24         4 -2.3333332        24  1.333333        45        -1 -2.6666673        22  1.666667        45         5 -3.0000004        21  6.000000        46         2  1.000000You can dump it into a function and then pipe it:def func(f):    mapp = {}    for key, value in f.items():        if key in ('column_a', 'column_c'):            value = value + 20        elif key in ('column_e', 'column_b'):            value = value / 3        mapp[key] = value    return f.assign(**mapp)df.pipe(func)   column_a  column_b  column_c  column_d  column_e0        20  0.333333        22         2 -0.3333331        23  0.666667        24         4 -2.3333332        24  1.333333        45        -1 -2.6666673        22  1.666667        45         5 -3.0000004        21  6.000000        46         2  1.000000We can take the function declaration a step further for easier use :def across(df, columns, func):    result = func(df.loc[:, columns])    return df.assign(**result)(df.pipe(across, ['column_a', 'column_c'], lambda df: df + 20).pipe(across, ['column_e', 'column_b'], lambda df: df / 3))    column_a  column_b  column_c  column_d  column_e0        20  0.333333        22         2 -0.3333331        23  0.666667        24         4 -2.3333332        24  1.333333        45        -1 -2.6666673        22  1.666667        45         5 -3.0000004        21  6.000000        46         2  1.000000pyjanitor has a transform_columns function that can be handy for this:(df.transform_columns(['column_a', 'column_c'], lambda df: df + 20).transform_columns(['column_e', 'column_b'], lambda df: df / 3))    column_a  column_b  column_c  column_d  column_e0        20  0.333333        22         2 -0.3333331        23  0.666667        24         4 -2.3333332        24  1.333333        45        -1 -2.6666673        22  1.666667        45         5 -3.0000004        21  6.000000        46         2  1.000000

Advertisement

Answer