I am trying to translate my R script to python. I have a survey data with several date of birth and education level columns for each family member(from family member 1 to member 10): here a sample:
JavaScript
x
7
1
id_name dob_1 dob_2 dob_3 education_1 education_2 education_3
2
12 1958 2001 2005 1 5 1
3
13 1990 1999 1932 2 1 3
4
14 1974 1965 1965 3 3 3
5
15 1963 1963 1990 4 3 1
6
16 2020 1995 1988 1 1 2
7
I had a function in R in order to check the logic and re code wrong education level in all columns.Like this
JavaScript
1
13
13
1
# R function
2
3
edu_recode <- function(dob, edu){
4
case_when(
5
dob >= 2003 & (edu == 1 | edu == 2 | edu == 3 | edu == 4) ~ 8,
6
dob > 2000 & (edu == 1 | edu == 2 | edu == 3 | edu == 4) ~ 1,
7
dob >= 1996 & (edu == 3 | edu == 4) ~ 2,
8
dob > 1995 & edu == 4 ~ 3,
9
(dob >= 2001 & dob <= 2002) & edu == 8 ~ 1,
10
TRUE ~ as.numeric(edu)
11
)
12
}
13
and apply it for all columns like this:
JavaScript
1
14
14
1
library(tidyverse)
2
df %>%
3
mutate(education_1 = edu_recode(dob_1,education_1),
4
education_2 = edu_recode(dob_2,education_2),
5
education_3 = edu_recode(dob_3,education_3),
6
education_4 = edu_recode(dob_4,education_4),
7
education_5 = edu_recode(dob_5,education_5),
8
education_6 = edu_recode(dob_6,education_6),
9
education_7 = edu_recode(dob_7,education_7),
10
education_8 = edu_recode(dob_8,education_8),
11
education_9 = edu_recode(dob_9,education_9),
12
education_10 = edu_recode(dob_10,education_10)
13
)
14
is there a way to do similar process in Python instead of manually recoding each column?
Advertisement
Answer
You can write a function that combines pipe with np.select, as well as a dictionary (to abstract as much manual processing as possible):
JavaScript
1
27
27
1
def edu_recode(df, dob, edu):
2
df = df.copy()
3
cond1 = (df[dob] >= 2003) & (df[edu].isin([1, 4]))
4
cond2 = (df[dob] > 2000) & (df[edu].isin([1, 4]))
5
cond3 = (df[dob] > 1996) & (df[edu].isin([3, 4]))
6
cond4 = (df[dob] > 1995) & (df[edu] == 4)
7
cond5 = (df[dob].isin([2001, 2002])) & (df[edu] == 8)
8
condlist = [cond1, cond2, cond3, cond4, cond5]
9
choicelist = [8, 1, 2, 3, 1]
10
return np.select(condlist, choicelist, pd.to_numeric(df[edu]))
11
12
# sticking to the sample data, you can extend this
13
mapping = {f"education_{num}": df.pipe(edu_recode, f"dob_{num}",
14
f"education_{num}")
15
for num in range(1, 4)}
16
17
df.assign(**mapping)
18
19
20
21
id_name dob_1 dob_2 dob_3 education_1 education_2 education_3
22
0 12 1958 2001 2005 1 5 8
23
1 13 1990 1999 1932 2 1 3
24
2 14 1974 1965 1965 3 3 3
25
3 15 1963 1963 1990 4 3 1
26
4 16 2020 1995 1988 8 1 2
27