I am trying to scrap “salary” from each subpage. For one of the subpage, I am copying the specific contents of the soup =BeautifulSoup(requests.get('url_of_job').text
. I copied soup
content to a word file and sliced the content surrounding salary
and copied here. Copying all text crosses the limit here.
soup =
JavaScript
x
163
163
1
```
2
</style>
3
<script type="application/ld+json">
4
{
5
"@context" : "https://schema.org",
6
"@type" : "Organization",
7
"url" : "https://www.higheredjobs.com",
8
"logo" : "https://www.higheredjobs.com/assets/hej/img/HEJ_Logo_2c.png",
9
"potentialAction" : {
10
"@type" : "SearchAction",
11
"target" : "https://www.higheredjobs.com/search/advance_action.cfm?Keyword={Keyword}",
12
"query-input" : "required name=Keyword"
13
}
14
}
15
</script>
16
<script type="application/ld+json">
17
{
18
"@context" : "http://schema.org/",
19
"@type" : "JobPosting",
20
"industry" : ["Academic/Education","Education"],
21
"title" : "Assistant Professor of Engineering F99507",
22
"description" : "<p><strong><u>MCNEESE STATE UNIVERSITY</u></strong><strong> invites applicants for the position of Assistant Professor of Engineering.</strong></p>
23
<p><strong>DEPARTMENT: </strong>Engineering and Computer Science&nbsp;</p>
24
<p><strong>POSITION INFORMATION: </strong>This is a full-time, 9-month, unclassified, tenure-track position. The appointment begins in January 2023.</p>
25
<p><strong>POSITION NUMBER: </strong>F99507</p>
26
<p><strong>SALARY: </strong>$76,000</p>
27
<p><strong>REPORTING AUTHORITY: </strong>Department Head of Engineering and Computer Science</p>
28
<p><strong>POSITION DUTIES/RESPONSIBILITIES: </strong>The Assistant Professor of Engineering is responsible for functions related to teaching courses, advising students, and conducting research. The assistant professor will work closely with faculty and staff in the Department of Engineering and Computer Science.</p>
29
<p><strong>REQUIRED/PREFERRED QUALIFICATIONS: </strong></p>
30
<p><u>Required</u>:&nbsp;</p>
31
<ul>
32
<li>Master of Engineering with 18 hours in Electrical Engineering</li>
33
<li>Experience with Power Engineering</li>
34
</ul>
35
<p><u>Preferred</u>:</p>
36
<ul>
37
<li>PhD in Electrical Engineering</li>
38
</ul>
39
<p><strong>DEADLINE: </strong>Position will remain open until filled.&nbsp;</p>
40
<p><strong>APPLICATION PROCESS AND MATERIALS: </strong>For the application process, applicants are required and <strong><span style="text-decoration: underline;">MUST</span></strong> complete an electronic application and upload required documents listed below:</p>
41
<ul>
42
<li>Letter of Application (Cover Letter)</li>
43
<li>Resume/Vitae</li>
44
<li>Three Professional References (include: name, phone number, and e-mail address)</li>
45
<li>Unofficial Transcripts</li>
46
</ul>
47
<p><strong>TO APPLY FOR THIS VACANCY, </strong>click on the <strong>&ldquo;APPLY&rdquo;</strong> button at the top of advertisement to complete the electronic application, which can be used for this vacancy as well as future job opportunities. Applicants are responsible for checking the status of their application to determine where they are in the recruitment process. Further status message information is located under the Information section of the Current Job Opportunities page.</p>
48
<p>ALL JOB OFFERS ARE CONTINGENT UPON THE SUCCESSFUL RESULT OF A CRIMINAL BACKGROUND CHECK AND RECEIPT OF TRANSCRIPT(S) IF APPLICABLE.</p>
49
<p><strong>If you have questions regarding this recruitment, please contact the HR Liaison:</strong></p>
50
<p>Kim Dronett at <a rel="noopener" href="kdronett@mcneese.edu" target="_blank">kdronett@mcneese.edu</a> or (337) 475-5413.</p>",
51
"datePosted" : "2022-09-02 09:48:23.527",
52
53
54
55
"validThrough" : "2023-03-01 23:59:59.9",
56
57
"baseSalary" : {
58
"@type" : "MonetaryAmount",
59
"currency" : "USD",
60
"value" : {
61
"@type" : "QuantitativeValue",
62
63
"value" : "76,000.00",
64
65
"unitText" : "YEAR"
66
}
67
},
68
69
"hiringOrganization" : {
70
"@type" : "Organization",
71
72
"id" : "http://www.mcneese.edu",
73
"url" : "http://www.mcneese.edu",
74
75
"name" : "McNeese State University"
76
},
77
"jobLocation" : {
78
"@type" : "Place",
79
"address" : {
80
"@type" : "PostalAddress",
81
"addressLocality" : "Lake Charles",
82
"addressRegion" : "LA",
83
"addressCountry" : "US"
84
}
85
},
86
87
88
"employmentType" : "FULL_TIME",
89
"image" : "https://www.higheredjobs.com/assets/hej/img/HEJ_Logo_2c.png"
90
}
91
</script>
92
<noscript>
93
<style type="text/css">
94
@media (min-width: 768px) { /*menus work different at XS*/
95
HEADER ul.nav li.dropdown:hover > .arrow,
96
HEADER ul.nav li.dropdown:hover > ul.dropdown-menu {
97
display: block;
98
}
99
}
100
</style>
101
</noscript>
102
103
<div class="col-6">
104
<h5>Resources</h5>
105
<ul>
106
<li><a href="/career/">Career Resources</a></li>
107
<li><a href="/salary/">Salary Data</a></li>
108
<li><a href="/career/resumes.cfm">Job Search Tips</a></li>
109
<li><a href="/career/ResumeService.cfm">Resume/CV Writing<br/> Service</a></li>
110
<li><a href="/articles/DiversityResources.cfm">Diversity Resources</a></li>
111
<li><a href="/career/SiteListings.cfm">Search Firms</a></li>
112
</ul>
113
</div>
114
</div>
115
</li>
116
</ul>
117
</li>
118
<li class="nav-item dropdown js-emplogin">
119
<a class="nav-link dropdown-toggle" data-toggle="dropdown" href="/employers/">
120
121
122
<div class="" id="jobAttrib">
123
<div class="job-info">
124
<strong>Type:</strong>
125
Full-Time
126
<br/>
127
<strong>Salary:</strong>
128
$76,000.00 USD Per Year
129
<br/>
130
<strong>Posted:</strong>
131
09/02/2022
132
<br/>
133
<strong>Application Due:</strong>
134
Open Until Filled
135
<br/>
136
<strong>Category:</strong>
137
<a class="job-cat" href="/faculty/search.cfm?JobCat=117">
138
Electrical Engineering
139
</a>
140
<br/>
141
</div>
142
</div>
143
144
</div>
145
</div>
146
<div id="jobDesc">
147
<p><strong><u>MCNEESE STATE UNIVERSITY</u></strong><strong> invites applicants for the position of Assistant Professor of
148
Engineering.</strong></p> <p><strong>DEPARTMENT: </strong>Engineering and Computer Science </p> <p><strong>POSITION INFORMATION:
149
</strong>This is a full-time, 9-month, unclassified, tenure-track position. The appointment begins in January 2023.</p> <p><strong>POSITION
150
NUMBER: </strong>F99507</p> <p><strong>SALARY: </strong>$76,000</p> <p><strong>REPORTING AUTHORITY: </strong>Department Head of Engineering
151
and Computer Science</p> <p><strong>POSITION DUTIES/RESPONSIBILITIES: </strong>The Assistant Professor of Engineering is responsible for
152
functions related to teaching courses, advising students, and conducting research. The assistant professor will work closely with faculty
153
and staff in the Department of Engineering and Computer Science.</p> <p><strong>REQUIRED/PREFERRED QUALIFICATIONS: </strong></p>
154
<p><u>Required</u>: </p> <ul>
155
<li>Master of Engineering with 18 hours in Electrical Engineering</li>
156
<li>Experience with Power
157
Engineering</li>
158
</ul> <p><u>Preferred</u>:</p> <ul>
159
<li>PhD in Electrical Engineering</li>
160
</ul> <p><strong>DEADLINE: </strong>Position
161
will remain open until filled.
162
```
163
My code:
JavaScript
1
2
1
'salary': soup.select_one('strong:-soup-contains("Salary:")').get_text(strip=True)
2
Present solution:
JavaScript
1
2
1
NotImplementedError: Only the following pseudo-classes are implemented: nth-of-type.
2
Expected solution:
JavaScript
1
2
1
'salary':$76,000.00 USD Per Year
2
Advertisement
Answer
Here is a solution which allows you to programmatically access that job’s info:
JavaScript
1
161
161
1
from bs4 import BeautifulSoup
2
import json
3
html = '''
4
<script type="application/ld+json">
5
{
6
"@context" : "http://schema.org/",
7
"@type" : "JobPosting",
8
"industry" : ["Academic/Education","Education"],
9
"title" : "Assistant Professor of Engineering F99507",
10
"description" : "<p><strong><u>MCNEESE STATE UNIVERSITY</u></strong><strong> invites applicants for the position of Assistant Professor of Engineering.</strong></p>
11
<p><strong>DEPARTMENT: </strong>Engineering and Computer Science&nbsp;</p>
12
<p><strong>POSITION INFORMATION: </strong>This is a full-time, 9-month, unclassified, tenure-track position. The appointment begins in January 2023.</p>
13
<p><strong>POSITION NUMBER: </strong>F99507</p>
14
<p><strong>SALARY: </strong>$76,000</p>
15
<p><strong>REPORTING AUTHORITY: </strong>Department Head of Engineering and Computer Science</p>
16
<p><strong>POSITION DUTIES/RESPONSIBILITIES: </strong>The Assistant Professor of Engineering is responsible for functions related to teaching courses, advising students, and conducting research. The assistant professor will work closely with faculty and staff in the Department of Engineering and Computer Science.</p>
17
<p><strong>REQUIRED/PREFERRED QUALIFICATIONS: </strong></p>
18
<p><u>Required</u>:&nbsp;</p>
19
<ul>
20
<li>Master of Engineering with 18 hours in Electrical Engineering</li>
21
<li>Experience with Power Engineering</li>
22
</ul>
23
<p><u>Preferred</u>:</p>
24
<ul>
25
<li>PhD in Electrical Engineering</li>
26
</ul>
27
<p><strong>DEADLINE: </strong>Position will remain open until filled.&nbsp;</p>
28
<p><strong>APPLICATION PROCESS AND MATERIALS: </strong>For the application process, applicants are required and <strong><span style="text-decoration: underline;">MUST</span></strong> complete an electronic application and upload required documents listed below:</p>
29
<ul>
30
<li>Letter of Application (Cover Letter)</li>
31
<li>Resume/Vitae</li>
32
<li>Three Professional References (include: name, phone number, and e-mail address)</li>
33
<li>Unofficial Transcripts</li>
34
</ul>
35
<p><strong>TO APPLY FOR THIS VACANCY, </strong>click on the <strong>&ldquo;APPLY&rdquo;</strong> button at the top of advertisement to complete the electronic application, which can be used for this vacancy as well as future job opportunities. Applicants are responsible for checking the status of their application to determine where they are in the recruitment process. Further status message information is located under the Information section of the Current Job Opportunities page.</p>
36
<p>ALL JOB OFFERS ARE CONTINGENT UPON THE SUCCESSFUL RESULT OF A CRIMINAL BACKGROUND CHECK AND RECEIPT OF TRANSCRIPT(S) IF APPLICABLE.</p>
37
<p><strong>If you have questions regarding this recruitment, please contact the HR Liaison:</strong></p>
38
<p>Kim Dronett at <a rel="noopener" href="kdronett@mcneese.edu" target="_blank">kdronett@mcneese.edu</a> or (337) 475-5413.</p>",
39
"datePosted" : "2022-09-02 09:48:23.527",
40
41
42
43
"validThrough" : "2023-03-01 23:59:59.9",
44
45
"baseSalary" : {
46
"@type" : "MonetaryAmount",
47
"currency" : "USD",
48
"value" : {
49
"@type" : "QuantitativeValue",
50
51
"value" : "76,000.00",
52
53
"unitText" : "YEAR"
54
}
55
},
56
57
"hiringOrganization" : {
58
"@type" : "Organization",
59
60
"id" : "http://www.mcneese.edu",
61
"url" : "http://www.mcneese.edu",
62
63
"name" : "McNeese State University"
64
},
65
"jobLocation" : {
66
"@type" : "Place",
67
"address" : {
68
"@type" : "PostalAddress",
69
"addressLocality" : "Lake Charles",
70
"addressRegion" : "LA",
71
"addressCountry" : "US"
72
}
73
},
74
75
76
"employmentType" : "FULL_TIME",
77
"image" : "https://www.higheredjobs.com/assets/hej/img/HEJ_Logo_2c.png"
78
}
79
</script>
80
<noscript>
81
<style type="text/css">
82
@media (min-width: 768px) { /*menus work different at XS*/
83
HEADER ul.nav li.dropdown:hover > .arrow,
84
HEADER ul.nav li.dropdown:hover > ul.dropdown-menu {
85
display: block;
86
}
87
}
88
</style>
89
</noscript>
90
91
<div class="col-6">
92
<h5>Resources</h5>
93
<ul>
94
<li><a href="/career/">Career Resources</a></li>
95
<li><a href="/salary/">Salary Data</a></li>
96
<li><a href="/career/resumes.cfm">Job Search Tips</a></li>
97
<li><a href="/career/ResumeService.cfm">Resume/CV Writing<br/> Service</a></li>
98
<li><a href="/articles/DiversityResources.cfm">Diversity Resources</a></li>
99
<li><a href="/career/SiteListings.cfm">Search Firms</a></li>
100
</ul>
101
</div>
102
</div>
103
</li>
104
</ul>
105
</li>
106
<li class="nav-item dropdown js-emplogin">
107
<a class="nav-link dropdown-toggle" data-toggle="dropdown" href="/employers/">
108
109
110
<div class="" id="jobAttrib">
111
<div class="job-info">
112
<strong>Type:</strong>
113
Full-Time
114
<br/>
115
<strong>Salary:</strong>
116
$76,000.00 USD Per Year
117
<br/>
118
<strong>Posted:</strong>
119
09/02/2022
120
<br/>
121
<strong>Application Due:</strong>
122
Open Until Filled
123
<br/>
124
<strong>Category:</strong>
125
<a class="job-cat" href="/faculty/search.cfm?JobCat=117">
126
Electrical Engineering
127
</a>
128
<br/>
129
</div>
130
</div>
131
132
</div>
133
</div>
134
<div id="jobDesc">
135
<p><strong><u>MCNEESE STATE UNIVERSITY</u></strong><strong> invites applicants for the position of Assistant Professor of
136
Engineering.</strong></p> <p><strong>DEPARTMENT: </strong>Engineering and Computer Science </p> <p><strong>POSITION INFORMATION:
137
</strong>This is a full-time, 9-month, unclassified, tenure-track position. The appointment begins in January 2023.</p> <p><strong>POSITION
138
NUMBER: </strong>F99507</p> <p><strong>SALARY: </strong>$76,000</p> <p><strong>REPORTING AUTHORITY: </strong>Department Head of Engineering
139
and Computer Science</p> <p><strong>POSITION DUTIES/RESPONSIBILITIES: </strong>The Assistant Professor of Engineering is responsible for
140
functions related to teaching courses, advising students, and conducting research. The assistant professor will work closely with faculty
141
and staff in the Department of Engineering and Computer Science.</p> <p><strong>REQUIRED/PREFERRED QUALIFICATIONS: </strong></p>
142
<p><u>Required</u>: </p> <ul>
143
<li>Master of Engineering with 18 hours in Electrical Engineering</li>
144
<li>Experience with Power
145
Engineering</li>
146
</ul> <p><u>Preferred</u>:</p> <ul>
147
<li>PhD in Electrical Engineering</li>
148
</ul> <p><strong>DEADLINE: </strong>Position
149
will remain open until filled.
150
'''
151
soup = BeautifulSoup(html, 'html.parser')
152
script = soup.select_one('script[type="application/ld+json"]')
153
# print(script.contents[0])
154
json_obj = json.loads(script.contents[0], strict=False)
155
print(json_obj['industry'])
156
print(json_obj['baseSalary']['value']['value'])
157
#### and here is a solution to find salary in actual html, not in script tag ###
158
salary_parent_div = soup.select_one('div.job-info')
159
salary = salary_parent_div.find('strong', string='Salary:').next_sibling
160
print(salary.strip())
161
This will print in terminal:
JavaScript
1
4
1
['Academic/Education', 'Education']
2
76,000.00
3
$76,000.00 USD Per Year
4
Of course you can drill down into that json object, and select more details about the job. BeautifulSoup docs: https://beautiful-soup-4.readthedocs.io/en/latest/index.html