Skip to content
Advertisement

Python replacing partial matching text based on a list of elements in data frame

I have built a dataframe that extracts data through a scraper. I extracted job positions, and currently, this column contains job positions as follows:

                                                Title Research Number  
1                                                Dean             NaN   
2                                    Professor of Law             NaN   
3   Associate Dean for Information & Technology Se...             NaN   
4                                  Professor of Lawn             NaN   
5   Associate Dean for Faculty DevelopmentnCharle...             NaN   
6   Associate Dean for Faculty DevelopmentnCharle...             NaN   
7   Assistant Professor of Clinical Education & Di...             NaN   
8   Judge George Howard, Jr., Distinguished Profes...             NaN   
9                 Visiting Assistant Professor of Law             NaN   
10  Associate Dean for Academic AffairsnArkansas ...             NaN   
11      Distinguished Professor in Constitutional Law             NaN   
12                         Assistant Professor of Law             NaN   
13  Instructor of Clinical Education; Supervising ...             NaN   
14                         Associate Professor of Law             NaN   
15                       Assistant Professor of Lawn             NaN   
16  Assistant Professor of Clinical Education; Tax...             NaN   
17         Assistant Professor of Law Librarianship;              NaN   
18  Byron M. Eiseman Distinguished Professor of Ta...             NaN   
19                                 Professor of Lawn             NaN   
20  Associate Professor of Law; Mediation Clinic D...             NaN   
21  Assistant Professor of Clinical Education; Fam...             NaN   
22   Assistant Professor of Clinical Education; Co...             NaN   
23                       Associate Professor of Lawn             NaN   
24  Professor of Law Librarianship; Electronic Res...             NaN   
25                                 Professor of Lawn             NaN   
26                                 Professor of Lawn             NaN   
27  Associate Dean for Experiential Learning & Cli...             NaN   
28                       Associate Professor of Lawn             NaN   
29  Assistant Professor of Clinical Education; Bus...             NaN   
30         Associate Professor of Law Librarianship;              NaN 

I would like to replace these titles with the following titles:

titles=["Adjunct Professor","Professor Emeritus","Associate Professor","Assistant Professor","Professor"]

How can I look for partial text and replace it? I don’t want to fully replace the text if it’s not a 100% match. For example ‘Visiting Assistant Professor of Law’ should be replaced with ‘Assistant Professor’

Thank you!

Advertisement

Answer

Use str.extract:

df['Title2'] = df['Title'].str.extract(f'({"|".join(titles)})')

output:

                  Title
1                   NaN
2             Professor
3                   NaN
...
29  Assistant Professor
30  Associate Professor

If you want to keep the original Title in case of no match, use:

df['Title'] = df['Title'].str.extract(f'({"|".join(titles)})', expand=False).fillna(df['Title'])
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement