I have built a dataframe that extracts data through a scraper. I extracted job positions, and currently, this column contains job positions as follows:
Title Research Number 1 Dean NaN 2 Professor of Law NaN 3 Associate Dean for Information & Technology Se... NaN 4 Professor of Lawn NaN 5 Associate Dean for Faculty DevelopmentnCharle... NaN 6 Associate Dean for Faculty DevelopmentnCharle... NaN 7 Assistant Professor of Clinical Education & Di... NaN 8 Judge George Howard, Jr., Distinguished Profes... NaN 9 Visiting Assistant Professor of Law NaN 10 Associate Dean for Academic AffairsnArkansas ... NaN 11 Distinguished Professor in Constitutional Law NaN 12 Assistant Professor of Law NaN 13 Instructor of Clinical Education; Supervising ... NaN 14 Associate Professor of Law NaN 15 Assistant Professor of Lawn NaN 16 Assistant Professor of Clinical Education; Tax... NaN 17 Assistant Professor of Law Librarianship; NaN 18 Byron M. Eiseman Distinguished Professor of Ta... NaN 19 Professor of Lawn NaN 20 Associate Professor of Law; Mediation Clinic D... NaN 21 Assistant Professor of Clinical Education; Fam... NaN 22 Assistant Professor of Clinical Education; Co... NaN 23 Associate Professor of Lawn NaN 24 Professor of Law Librarianship; Electronic Res... NaN 25 Professor of Lawn NaN 26 Professor of Lawn NaN 27 Associate Dean for Experiential Learning & Cli... NaN 28 Associate Professor of Lawn NaN 29 Assistant Professor of Clinical Education; Bus... NaN 30 Associate Professor of Law Librarianship; NaN
I would like to replace these titles with the following titles:
titles=["Adjunct Professor","Professor Emeritus","Associate Professor","Assistant Professor","Professor"]
How can I look for partial text and replace it? I don’t want to fully replace the text if it’s not a 100% match. For example ‘Visiting Assistant Professor of Law’ should be replaced with ‘Assistant Professor’
Thank you!
Advertisement
Answer
Use str.extract
:
df['Title2'] = df['Title'].str.extract(f'({"|".join(titles)})')
output:
Title 1 NaN 2 Professor 3 NaN ... 29 Assistant Professor 30 Associate Professor
If you want to keep the original Title in case of no match, use:
df['Title'] = df['Title'].str.extract(f'({"|".join(titles)})', expand=False).fillna(df['Title'])