How can I avoid for-loops using pandas?

Question

Would love to know how to optimize this code without using for-loops, if it's possible. What I'm trying to do is to categorize all the values in series df['Состояние'] looking at key words in lists list_rep and list_dem one by one. Thank you! Answer Use Series.str.lower fiirst, then Series.str.contains with join by | for regex OR and set new values

Accepted Answer

Use Series.str.lower fiirst, then Series.str.contains with join by | for regex OR and set new values in numpy.select, then use Series.str.extract and replace missing values:df = pd.DataFrame({'Состояние':['abc','def','opa1','ujb2','a1sb1d','B21op']})print (df)  Состояние0       abc1       def2      opa13      ujb24    a1sb1d5     B21opconditions = ['a','b']list_rep = ['a1','a2']list_dem = ['b1','b2']s = df['Состояние'].str.lower()m1 = s.str.contains('|'.join(list_rep))m2 = s.str.contains('|'.join(list_dem))df['Состояние'] = np.select([m1, m2], [conditions[0], conditions[1]], s)df['Состояние'] = df['Состояние'].str.extract(f'({"|".join(conditions)})').fillna('-')print (df)  Состояние0         a1         -2         a3         b4         a5         bAnother idea is create dictionary for mapping, first use Series.str.lower and Series.str.extract, then Series.map and last replace missing values:conditions = ['a','b']list_rep = ['a1','a2']list_dem = ['b1','b2']d = {**dict.fromkeys(list_rep,conditions[0]),     **dict.fromkeys(list_dem,conditions[1]),     **dict(zip(conditions,conditions))}print (d){'a1': 'a', 'a2': 'a', 'b1': 'b', 'b2': 'b', 'a': 'a', 'b': 'b'}pat = rf'({"|".join(d.keys())})'df['Состояние'] = (df['Состояние'].str.lower()                                  .str.extract(pat, expand=False)                                  .map(d)                                  .fillna('-'))print (df)  Состояние0         a1         -2         a3         b4         a5         b

Advertisement

Answer