Is there a way, in pandas, to apply a function to some chosen columns, while strictly keeping a functional pipeline (no border effects,no assignation before the result, the result of the function only depends of its arguments, and I don’t want to drop the other columns). Ie, what is the equivalent of across in R ?
JavaScript
x
23
23
1
import pandas as pd
2
df = (
3
pd.DataFrame({
4
"column_a":[0,3,4,2,1],
5
"column_b":[1,2,4,5,18],
6
"column_c":[2,4,25,25,26],
7
"column_d":[2,4,-1,5,2],
8
"column_e":[-1,-7,-8,-9,3]
9
})
10
.assign(column_a=lambda df:df["column_a"]+20)
11
.assign(column_c=lambda df:df["column_c"]+20)
12
.assign(column_e=lambda df:df["column_e"]/3)
13
.assign(column_b=lambda df:df["column_b"]/3)
14
)
15
print(df)
16
17
# column_a column_b column_c column_d column_e
18
# 0 20 0.333333 22 2 -0.333333
19
# 1 23 0.666667 24 4 -2.333333
20
# 2 24 1.333333 45 -1 -2.666667
21
# 3 22 1.666667 45 5 -3.000000
22
# 4 21 6.000000 46 2 1.000000
23
In R, I would have written :
JavaScript
1
21
21
1
library(dplyr)
2
df <-
3
tibble(
4
column_a = c(0,3,4,2,1),
5
column_b = c(1,2,4,5,18),
6
column_c = c(2,4,25,25,26),
7
column_d = c(2,4,-1,5,2),
8
column_e = c(-1,-7,-8,-9,3)
9
) %>%
10
mutate(across(c(column_a,column_c),~.x + 20),
11
across(c(column_e,column_b),~.x / 3))
12
13
# # A tibble: 5 × 5
14
# column_a column_b column_c column_d column_e
15
# <dbl> <dbl> <dbl> <dbl> <dbl>
16
# 1 20 0.333 22 2 -0.333
17
# 2 23 0.667 24 4 -2.33
18
# 3 24 1.33 45 -1 -2.67
19
# 4 22 1.67 45 5 -3
20
# 5 21 6 46 2 1
21
Advertisement
Answer
One option is to unpack the computation within assign
:
JavaScript
1
11
11
1
(df
2
.assign(**df.loc(axis=1)[['column_a', 'column_c']].add(20),
3
**df.loc[:, ['column_e', 'column_b']].div(3))
4
)
5
column_a column_b column_c column_d column_e
6
0 20 0.333333 22 2 -0.333333
7
1 23 0.666667 24 4 -2.333333
8
2 24 1.333333 45 -1 -2.666667
9
3 22 1.666667 45 5 -3.000000
10
4 21 6.000000 46 2 1.000000
11
For readability purposes, I’d suggest splitting it up:
JavaScript
1
11
11
1
first = df.loc(axis=1)[['column_a', 'column_c']].add(20)
2
second = df.loc[:, ['column_e', 'column_b']].div(3)
3
df.assign(**first, **second)
4
5
column_a column_b column_c column_d column_e
6
0 20 0.333333 22 2 -0.333333
7
1 23 0.666667 24 4 -2.333333
8
2 24 1.333333 45 -1 -2.666667
9
3 22 1.666667 45 5 -3.000000
10
4 21 6.000000 46 2 1.000000
11
Another option, still with the unpacking idea is to iterate through the columns, based on the pattern:
JavaScript
1
16
16
1
mapper = {key : value.add(20)
2
if key.endswith(('a','c'))
3
else value.div(3)
4
if key.endswith(('e','b'))
5
else value
6
for key, value
7
in df.items()}
8
9
df.assign(**mapper)
10
column_a column_b column_c column_d column_e
11
0 20 0.333333 22 2 -0.333333
12
1 23 0.666667 24 4 -2.333333
13
2 24 1.333333 45 -1 -2.666667
14
3 22 1.666667 45 5 -3.000000
15
4 21 6.000000 46 2 1.000000
16
You can dump it into a function and then pipe
it:
JavaScript
1
19
19
1
def func(f):
2
mapp = {}
3
for key, value in f.items():
4
if key in ('column_a', 'column_c'):
5
value = value + 20
6
elif key in ('column_e', 'column_b'):
7
value = value / 3
8
mapp[key] = value
9
return f.assign(**mapp)
10
11
df.pipe(func)
12
13
column_a column_b column_c column_d column_e
14
0 20 0.333333 22 2 -0.333333
15
1 23 0.666667 24 4 -2.333333
16
2 24 1.333333 45 -1 -2.666667
17
3 22 1.666667 45 5 -3.000000
18
4 21 6.000000 46 2 1.000000
19
We can take the function declaration a step further for easier use :
JavaScript
1
15
15
1
def across(df, columns, func):
2
result = func(df.loc[:, columns])
3
return df.assign(**result)
4
5
(df
6
.pipe(across, ['column_a', 'column_c'], lambda df: df + 20)
7
.pipe(across, ['column_e', 'column_b'], lambda df: df / 3)
8
)
9
column_a column_b column_c column_d column_e
10
0 20 0.333333 22 2 -0.333333
11
1 23 0.666667 24 4 -2.333333
12
2 24 1.333333 45 -1 -2.666667
13
3 22 1.666667 45 5 -3.000000
14
4 21 6.000000 46 2 1.000000
15
pyjanitor has a transform_columns function that can be handy for this:
JavaScript
1
11
11
1
(df
2
.transform_columns(['column_a', 'column_c'], lambda df: df + 20)
3
.transform_columns(['column_e', 'column_b'], lambda df: df / 3)
4
)
5
column_a column_b column_c column_d column_e
6
0 20 0.333333 22 2 -0.333333
7
1 23 0.666667 24 4 -2.333333
8
2 24 1.333333 45 -1 -2.666667
9
3 22 1.666667 45 5 -3.000000
10
4 21 6.000000 46 2 1.000000
11