Recode multiple values in several columns in Python [similar to R]

Question

I am trying to translate my R script to python. I have a survey data with several date of birth and education level columns for each family member(from family member 1 to member 10): here a sample: I had a function in R in order to check the logic and re code wrong education level in all columns.Like this and

Accepted Answer

You can write a function that combines pipe with np.select, as well as a dictionary (to abstract as much manual processing as possible):def edu_recode(df, dob, edu):    df = df.copy()    cond1 = (df[dob] >= 2003) & (df[edu].isin([1, 4]))    cond2 = (df[dob] > 2000) & (df[edu].isin([1, 4]))    cond3 = (df[dob] > 1996) & (df[edu].isin([3, 4]))    cond4 = (df[dob] > 1995) & (df[edu] == 4)    cond5 = (df[dob].isin([2001, 2002])) & (df[edu] == 8)    condlist = [cond1, cond2, cond3, cond4, cond5]    choicelist = [8, 1, 2, 3, 1]    return np.select(condlist, choicelist, pd.to_numeric(df[edu]))# sticking to the sample data, you can extend this mapping = {f"education_{num}": df.pipe(edu_recode, f"dob_{num}",                                                    f"education_{num}")           for num in range(1, 4)}df.assign(**mapping)             id_name    dob_1   dob_2   dob_3   education_1 education_2 education_3    0   12      1958    2001    2005       1           5          8    1   13      1990    1999    1932       2           1          3    2   14      1974    1965    1965       3           3          3    3   15      1963    1963    1990       4           3          1        4   16      2020    1995    1988       8           1          2

Advertisement

Answer