I have built a dataframe that extracts data through a scraper. I extracted job positions, and currently, this column contains job positions as follows:
JavaScript
x
32
32
1
Title Research Number
2
1 Dean NaN
3
2 Professor of Law NaN
4
3 Associate Dean for Information & Technology Se NaN
5
4 Professor of Lawn NaN
6
5 Associate Dean for Faculty DevelopmentnCharle NaN
7
6 Associate Dean for Faculty DevelopmentnCharle NaN
8
7 Assistant Professor of Clinical Education & Di NaN
9
8 Judge George Howard, Jr., Distinguished Profes NaN
10
9 Visiting Assistant Professor of Law NaN
11
10 Associate Dean for Academic AffairsnArkansas NaN
12
11 Distinguished Professor in Constitutional Law NaN
13
12 Assistant Professor of Law NaN
14
13 Instructor of Clinical Education; Supervising NaN
15
14 Associate Professor of Law NaN
16
15 Assistant Professor of Lawn NaN
17
16 Assistant Professor of Clinical Education; Tax NaN
18
17 Assistant Professor of Law Librarianship; NaN
19
18 Byron M. Eiseman Distinguished Professor of Ta NaN
20
19 Professor of Lawn NaN
21
20 Associate Professor of Law; Mediation Clinic D NaN
22
21 Assistant Professor of Clinical Education; Fam NaN
23
22 Assistant Professor of Clinical Education; Co NaN
24
23 Associate Professor of Lawn NaN
25
24 Professor of Law Librarianship; Electronic Res NaN
26
25 Professor of Lawn NaN
27
26 Professor of Lawn NaN
28
27 Associate Dean for Experiential Learning & Cli NaN
29
28 Associate Professor of Lawn NaN
30
29 Assistant Professor of Clinical Education; Bus NaN
31
30 Associate Professor of Law Librarianship; NaN
32
I would like to replace these titles with the following titles:
JavaScript
1
2
1
titles=["Adjunct Professor","Professor Emeritus","Associate Professor","Assistant Professor","Professor"]
2
How can I look for partial text and replace it? I don’t want to fully replace the text if it’s not a 100% match. For example ‘Visiting Assistant Professor of Law’ should be replaced with ‘Assistant Professor’
Thank you!
Advertisement
Answer
Use str.extract
:
JavaScript
1
2
1
df['Title2'] = df['Title'].str.extract(f'({"|".join(titles)})')
2
output:
JavaScript
1
8
1
Title
2
1 NaN
3
2 Professor
4
3 NaN
5
6
29 Assistant Professor
7
30 Associate Professor
8
If you want to keep the original Title in case of no match, use:
JavaScript
1
2
1
df['Title'] = df['Title'].str.extract(f'({"|".join(titles)})', expand=False).fillna(df['Title'])
2