I have built a dataframe that extracts data through a scraper. I extracted job positions, and currently, this column contains job positions as follows: I would like to replace these titles with the following titles: How can I look for partial text and replace it? I don't want to fully replace the text if it's not a 100% match. For

Python replacing partial matching text based on a list of elements in data frame

I have built a dataframe that extracts data through a scraper. I extracted job positions, and currently, this column contains job positions as follows:

                                                Title Research Number  
1                                                Dean             NaN   
2                                    Professor of Law             NaN   
3   Associate Dean for Information & Technology Se...             NaN   
4                                  Professor of Lawn             NaN   
5   Associate Dean for Faculty DevelopmentnCharle...             NaN   
6   Associate Dean for Faculty DevelopmentnCharle...             NaN   
7   Assistant Professor of Clinical Education & Di...             NaN   
8   Judge George Howard, Jr., Distinguished Profes...             NaN   
9                 Visiting Assistant Professor of Law             NaN   
10  Associate Dean for Academic AffairsnArkansas ...             NaN   
11      Distinguished Professor in Constitutional Law             NaN   
12                         Assistant Professor of Law             NaN   
13  Instructor of Clinical Education; Supervising ...             NaN   
14                         Associate Professor of Law             NaN   
15                       Assistant Professor of Lawn             NaN   
16  Assistant Professor of Clinical Education; Tax...             NaN   
17         Assistant Professor of Law Librarianship;              NaN   
18  Byron M. Eiseman Distinguished Professor of Ta...             NaN   
19                                 Professor of Lawn             NaN   
20  Associate Professor of Law; Mediation Clinic D...             NaN   
21  Assistant Professor of Clinical Education; Fam...             NaN   
22   Assistant Professor of Clinical Education; Co...             NaN   
23                       Associate Professor of Lawn             NaN   
24  Professor of Law Librarianship; Electronic Res...             NaN   
25                                 Professor of Lawn             NaN   
26                                 Professor of Lawn             NaN   
27  Associate Dean for Experiential Learning & Cli...             NaN   
28                       Associate Professor of Lawn             NaN   
29  Assistant Professor of Clinical Education; Bus...             NaN   
30         Associate Professor of Law Librarianship;              NaN

JavaScript
​x
 
                                                Title Research Number  
                                              Dean             NaN   
                                  Professor of Law             NaN   
 Associate Dean for Information & Technology Se...             NaN   
                                Professor of Lawn             NaN   
 Associate Dean for Faculty DevelopmentnCharle...             NaN   
 Associate Dean for Faculty DevelopmentnCharle...             NaN   
 Assistant Professor of Clinical Education & Di...             NaN   
 Judge George Howard, Jr., Distinguished Profes...             NaN   
               Visiting Assistant Professor of Law             NaN   
Associate Dean for Academic AffairsnArkansas ...             NaN   
    Distinguished Professor in Constitutional Law             NaN   
                       Assistant Professor of Law             NaN   
Instructor of Clinical Education; Supervising ...             NaN   
                       Associate Professor of Law             NaN   
                     Assistant Professor of Lawn             NaN   
Assistant Professor of Clinical Education; Tax...             NaN   
       Assistant Professor of Law Librarianship;              NaN   
Byron M. Eiseman Distinguished Professor of Ta...             NaN   
                               Professor of Lawn             NaN   
Associate Professor of Law; Mediation Clinic D...             NaN   
Assistant Professor of Clinical Education; Fam...             NaN   
 Assistant Professor of Clinical Education; Co...             NaN   
                     Associate Professor of Lawn             NaN   
Professor of Law Librarianship; Electronic Res...             NaN   
                               Professor of Lawn             NaN   
                               Professor of Lawn             NaN   
Associate Dean for Experiential Learning & Cli...             NaN   
                     Associate Professor of Lawn             NaN   
Assistant Professor of Clinical Education; Bus...             NaN   
       Associate Professor of Law Librarianship;              NaN 
​

I would like to replace these titles with the following titles:

titles=["Adjunct Professor","Professor Emeritus","Associate Professor","Assistant Professor","Professor"]

JavaScript
 
titles=["Adjunct Professor","Professor Emeritus","Associate Professor","Assistant Professor","Professor"]
​

How can I look for partial text and replace it? I don’t want to fully replace the text if it’s not a 100% match. For example ‘Visiting Assistant Professor of Law’ should be replaced with ‘Assistant Professor’

Thank you!

Answer

Use str.extract:

df['Title2'] = df['Title'].str.extract(f'({"|".join(titles)})')

JavaScript
 
df['Title2'] = df['Title'].str.extract(f'({"|".join(titles)})')
​

output:

                  Title
1                   NaN
2             Professor
3                   NaN
...
29  Assistant Professor
30  Associate Professor

JavaScript
 
                  Title
1                   NaN
2             Professor
3                   NaN
...
29  Assistant Professor
30  Associate Professor
​

If you want to keep the original Title in case of no match, use:

df['Title'] = df['Title'].str.extract(f'({"|".join(titles)})', expand=False).fillna(df['Title'])

JavaScript
 
df['Title'] = df['Title'].str.extract(f'({"|".join(titles)})', expand=False).fillna(df['Title'])
​

Advertisement

Answer