Creating a function to standardize categorical variables (python)

Question

I don&#8217;t know if it is right to say &#8220;standardize&#8221; categorical variable string, but basically I want to create a function to set all observations F or f in the column below to 0 and M or m to 1: I tried this: But I got an error: Any ideas? Thanks! Answer There is no replace function defined in…

Accepted Answer

There is no replace function defined in your code.Back to your goal, use a vector function.Convert to lower and map f->0, m->1:df['gender_num'] = df['gender'].str.lower().map({'f': 0, 'm': 1})Or use a comparison (not equal to f) and conversion from boolean to integer:df['gender_num'] = df['gender'].str.lower().ne('f').astype(int)output:  gender  gender_num0      f           01      F           02      f           03      M           14      M           15      m           1generalizationyou can generalize to ant number of categories using pandas.factorize. Advantage: you will get a real Categorical type.NB. the number values is set depending on whatever values comes first, or lexicographic order if sort=True:s, key = pd.factorize(df['gender'].str.lower(), sort=True)df['gender_num'] = skey = dict(enumerate(key))# {0: 'f', 1: 'm'}

Advertisement

Answer

generalization