I am trying to scrap “salary” from each subpage. For one of the subpage, I am copying the specific contents of the soup =BeautifulSoup(requests.get('url_of_job').text
. I copied soup
content to a word file and sliced the content surrounding salary
and copied here. Copying all text crosses the limit here.
soup =
``` </style> <script type="application/ld+json"> { "@context" : "https://schema.org", "@type" : "Organization", "url" : "https://www.higheredjobs.com", "logo" : "https://www.higheredjobs.com/assets/hej/img/HEJ_Logo_2c.png", "potentialAction" : { "@type" : "SearchAction", "target" : "https://www.higheredjobs.com/search/advance_action.cfm?Keyword={Keyword}", "query-input" : "required name=Keyword" } } </script> <script type="application/ld+json"> { "@context" : "http://schema.org/", "@type" : "JobPosting", "industry" : ["Academic/Education","Education"], "title" : "Assistant Professor of Engineering F99507", "description" : "<p><strong><u>MCNEESE STATE UNIVERSITY</u></strong><strong> invites applicants for the position of Assistant Professor of Engineering.</strong></p> <p><strong>DEPARTMENT: </strong>Engineering and Computer Science&nbsp;</p> <p><strong>POSITION INFORMATION: </strong>This is a full-time, 9-month, unclassified, tenure-track position. The appointment begins in January 2023.</p> <p><strong>POSITION NUMBER: </strong>F99507</p> <p><strong>SALARY: </strong>$76,000</p> <p><strong>REPORTING AUTHORITY: </strong>Department Head of Engineering and Computer Science</p> <p><strong>POSITION DUTIES/RESPONSIBILITIES: </strong>The Assistant Professor of Engineering is responsible for functions related to teaching courses, advising students, and conducting research. The assistant professor will work closely with faculty and staff in the Department of Engineering and Computer Science.</p> <p><strong>REQUIRED/PREFERRED QUALIFICATIONS: </strong></p> <p><u>Required</u>:&nbsp;</p> <ul> <li>Master of Engineering with 18 hours in Electrical Engineering</li> <li>Experience with Power Engineering</li> </ul> <p><u>Preferred</u>:</p> <ul> <li>PhD in Electrical Engineering</li> </ul> <p><strong>DEADLINE: </strong>Position will remain open until filled.&nbsp;</p> <p><strong>APPLICATION PROCESS AND MATERIALS: </strong>For the application process, applicants are required and <strong><span style="text-decoration: underline;">MUST</span></strong> complete an electronic application and upload required documents listed below:</p> <ul> <li>Letter of Application (Cover Letter)</li> <li>Resume/Vitae</li> <li>Three Professional References (include: name, phone number, and e-mail address)</li> <li>Unofficial Transcripts</li> </ul> <p><strong>TO APPLY FOR THIS VACANCY, </strong>click on the <strong>&ldquo;APPLY&rdquo;</strong> button at the top of advertisement to complete the electronic application, which can be used for this vacancy as well as future job opportunities. Applicants are responsible for checking the status of their application to determine where they are in the recruitment process. Further status message information is located under the Information section of the Current Job Opportunities page.</p> <p>ALL JOB OFFERS ARE CONTINGENT UPON THE SUCCESSFUL RESULT OF A CRIMINAL BACKGROUND CHECK AND RECEIPT OF TRANSCRIPT(S) IF APPLICABLE.</p> <p><strong>If you have questions regarding this recruitment, please contact the HR Liaison:</strong></p> <p>Kim Dronett at <a rel="noopener" href="kdronett@mcneese.edu" target="_blank">kdronett@mcneese.edu</a> or (337) 475-5413.</p>", "datePosted" : "2022-09-02 09:48:23.527", "validThrough" : "2023-03-01 23:59:59.9", "baseSalary" : { "@type" : "MonetaryAmount", "currency" : "USD", "value" : { "@type" : "QuantitativeValue", "value" : "76,000.00", "unitText" : "YEAR" } }, "hiringOrganization" : { "@type" : "Organization", "id" : "http://www.mcneese.edu", "url" : "http://www.mcneese.edu", "name" : "McNeese State University" }, "jobLocation" : { "@type" : "Place", "address" : { "@type" : "PostalAddress", "addressLocality" : "Lake Charles", "addressRegion" : "LA", "addressCountry" : "US" } }, "employmentType" : "FULL_TIME", "image" : "https://www.higheredjobs.com/assets/hej/img/HEJ_Logo_2c.png" } </script> <noscript> <style type="text/css"> @media (min-width: 768px) { /*menus work different at XS*/ HEADER ul.nav li.dropdown:hover > .arrow, HEADER ul.nav li.dropdown:hover > ul.dropdown-menu { display: block; } } </style> </noscript> <div class="col-6"> <h5>Resources</h5> <ul> <li><a href="/career/">Career Resources</a></li> <li><a href="/salary/">Salary Data</a></li> <li><a href="/career/resumes.cfm">Job Search Tips</a></li> <li><a href="/career/ResumeService.cfm">Resume/CV Writing<br/> Service</a></li> <li><a href="/articles/DiversityResources.cfm">Diversity Resources</a></li> <li><a href="/career/SiteListings.cfm">Search Firms</a></li> </ul> </div> </div> </li> </ul> </li> <li class="nav-item dropdown js-emplogin"> <a class="nav-link dropdown-toggle" data-toggle="dropdown" href="/employers/"> <div class="" id="jobAttrib"> <div class="job-info"> <strong>Type:</strong> Full-Time <br/> <strong>Salary:</strong> $76,000.00 USD Per Year <br/> <strong>Posted:</strong> 09/02/2022 <br/> <strong>Application Due:</strong> Open Until Filled <br/> <strong>Category:</strong> <a class="job-cat" href="/faculty/search.cfm?JobCat=117"> Electrical Engineering </a> <br/> </div> </div> </div> </div> <div id="jobDesc"> <p><strong><u>MCNEESE STATE UNIVERSITY</u></strong><strong> invites applicants for the position of Assistant Professor of Engineering.</strong></p> <p><strong>DEPARTMENT: </strong>Engineering and Computer Science </p> <p><strong>POSITION INFORMATION: </strong>This is a full-time, 9-month, unclassified, tenure-track position. The appointment begins in January 2023.</p> <p><strong>POSITION NUMBER: </strong>F99507</p> <p><strong>SALARY: </strong>$76,000</p> <p><strong>REPORTING AUTHORITY: </strong>Department Head of Engineering and Computer Science</p> <p><strong>POSITION DUTIES/RESPONSIBILITIES: </strong>The Assistant Professor of Engineering is responsible for functions related to teaching courses, advising students, and conducting research. The assistant professor will work closely with faculty and staff in the Department of Engineering and Computer Science.</p> <p><strong>REQUIRED/PREFERRED QUALIFICATIONS: </strong></p> <p><u>Required</u>: </p> <ul> <li>Master of Engineering with 18 hours in Electrical Engineering</li> <li>Experience with Power Engineering</li> </ul> <p><u>Preferred</u>:</p> <ul> <li>PhD in Electrical Engineering</li> </ul> <p><strong>DEADLINE: </strong>Position will remain open until filled. ```
My code:
'salary': soup.select_one('strong:-soup-contains("Salary:")').get_text(strip=True)
Present solution:
NotImplementedError: Only the following pseudo-classes are implemented: nth-of-type.
Expected solution:
'salary':$76,000.00 USD Per Year
Advertisement
Answer
Here is a solution which allows you to programmatically access that job’s info:
from bs4 import BeautifulSoup import json html = ''' <script type="application/ld+json"> { "@context" : "http://schema.org/", "@type" : "JobPosting", "industry" : ["Academic/Education","Education"], "title" : "Assistant Professor of Engineering F99507", "description" : "<p><strong><u>MCNEESE STATE UNIVERSITY</u></strong><strong> invites applicants for the position of Assistant Professor of Engineering.</strong></p> <p><strong>DEPARTMENT: </strong>Engineering and Computer Science&nbsp;</p> <p><strong>POSITION INFORMATION: </strong>This is a full-time, 9-month, unclassified, tenure-track position. The appointment begins in January 2023.</p> <p><strong>POSITION NUMBER: </strong>F99507</p> <p><strong>SALARY: </strong>$76,000</p> <p><strong>REPORTING AUTHORITY: </strong>Department Head of Engineering and Computer Science</p> <p><strong>POSITION DUTIES/RESPONSIBILITIES: </strong>The Assistant Professor of Engineering is responsible for functions related to teaching courses, advising students, and conducting research. The assistant professor will work closely with faculty and staff in the Department of Engineering and Computer Science.</p> <p><strong>REQUIRED/PREFERRED QUALIFICATIONS: </strong></p> <p><u>Required</u>:&nbsp;</p> <ul> <li>Master of Engineering with 18 hours in Electrical Engineering</li> <li>Experience with Power Engineering</li> </ul> <p><u>Preferred</u>:</p> <ul> <li>PhD in Electrical Engineering</li> </ul> <p><strong>DEADLINE: </strong>Position will remain open until filled.&nbsp;</p> <p><strong>APPLICATION PROCESS AND MATERIALS: </strong>For the application process, applicants are required and <strong><span style="text-decoration: underline;">MUST</span></strong> complete an electronic application and upload required documents listed below:</p> <ul> <li>Letter of Application (Cover Letter)</li> <li>Resume/Vitae</li> <li>Three Professional References (include: name, phone number, and e-mail address)</li> <li>Unofficial Transcripts</li> </ul> <p><strong>TO APPLY FOR THIS VACANCY, </strong>click on the <strong>&ldquo;APPLY&rdquo;</strong> button at the top of advertisement to complete the electronic application, which can be used for this vacancy as well as future job opportunities. Applicants are responsible for checking the status of their application to determine where they are in the recruitment process. Further status message information is located under the Information section of the Current Job Opportunities page.</p> <p>ALL JOB OFFERS ARE CONTINGENT UPON THE SUCCESSFUL RESULT OF A CRIMINAL BACKGROUND CHECK AND RECEIPT OF TRANSCRIPT(S) IF APPLICABLE.</p> <p><strong>If you have questions regarding this recruitment, please contact the HR Liaison:</strong></p> <p>Kim Dronett at <a rel="noopener" href="kdronett@mcneese.edu" target="_blank">kdronett@mcneese.edu</a> or (337) 475-5413.</p>", "datePosted" : "2022-09-02 09:48:23.527", "validThrough" : "2023-03-01 23:59:59.9", "baseSalary" : { "@type" : "MonetaryAmount", "currency" : "USD", "value" : { "@type" : "QuantitativeValue", "value" : "76,000.00", "unitText" : "YEAR" } }, "hiringOrganization" : { "@type" : "Organization", "id" : "http://www.mcneese.edu", "url" : "http://www.mcneese.edu", "name" : "McNeese State University" }, "jobLocation" : { "@type" : "Place", "address" : { "@type" : "PostalAddress", "addressLocality" : "Lake Charles", "addressRegion" : "LA", "addressCountry" : "US" } }, "employmentType" : "FULL_TIME", "image" : "https://www.higheredjobs.com/assets/hej/img/HEJ_Logo_2c.png" } </script> <noscript> <style type="text/css"> @media (min-width: 768px) { /*menus work different at XS*/ HEADER ul.nav li.dropdown:hover > .arrow, HEADER ul.nav li.dropdown:hover > ul.dropdown-menu { display: block; } } </style> </noscript> <div class="col-6"> <h5>Resources</h5> <ul> <li><a href="/career/">Career Resources</a></li> <li><a href="/salary/">Salary Data</a></li> <li><a href="/career/resumes.cfm">Job Search Tips</a></li> <li><a href="/career/ResumeService.cfm">Resume/CV Writing<br/> Service</a></li> <li><a href="/articles/DiversityResources.cfm">Diversity Resources</a></li> <li><a href="/career/SiteListings.cfm">Search Firms</a></li> </ul> </div> </div> </li> </ul> </li> <li class="nav-item dropdown js-emplogin"> <a class="nav-link dropdown-toggle" data-toggle="dropdown" href="/employers/"> <div class="" id="jobAttrib"> <div class="job-info"> <strong>Type:</strong> Full-Time <br/> <strong>Salary:</strong> $76,000.00 USD Per Year <br/> <strong>Posted:</strong> 09/02/2022 <br/> <strong>Application Due:</strong> Open Until Filled <br/> <strong>Category:</strong> <a class="job-cat" href="/faculty/search.cfm?JobCat=117"> Electrical Engineering </a> <br/> </div> </div> </div> </div> <div id="jobDesc"> <p><strong><u>MCNEESE STATE UNIVERSITY</u></strong><strong> invites applicants for the position of Assistant Professor of Engineering.</strong></p> <p><strong>DEPARTMENT: </strong>Engineering and Computer Science </p> <p><strong>POSITION INFORMATION: </strong>This is a full-time, 9-month, unclassified, tenure-track position. The appointment begins in January 2023.</p> <p><strong>POSITION NUMBER: </strong>F99507</p> <p><strong>SALARY: </strong>$76,000</p> <p><strong>REPORTING AUTHORITY: </strong>Department Head of Engineering and Computer Science</p> <p><strong>POSITION DUTIES/RESPONSIBILITIES: </strong>The Assistant Professor of Engineering is responsible for functions related to teaching courses, advising students, and conducting research. The assistant professor will work closely with faculty and staff in the Department of Engineering and Computer Science.</p> <p><strong>REQUIRED/PREFERRED QUALIFICATIONS: </strong></p> <p><u>Required</u>: </p> <ul> <li>Master of Engineering with 18 hours in Electrical Engineering</li> <li>Experience with Power Engineering</li> </ul> <p><u>Preferred</u>:</p> <ul> <li>PhD in Electrical Engineering</li> </ul> <p><strong>DEADLINE: </strong>Position will remain open until filled. ''' soup = BeautifulSoup(html, 'html.parser') script = soup.select_one('script[type="application/ld+json"]') # print(script.contents[0]) json_obj = json.loads(script.contents[0], strict=False) print(json_obj['industry']) print(json_obj['baseSalary']['value']['value']) #### and here is a solution to find salary in actual html, not in script tag ### salary_parent_div = soup.select_one('div.job-info') salary = salary_parent_div.find('strong', string='Salary:').next_sibling print(salary.strip())
This will print in terminal:
['Academic/Education', 'Education'] 76,000.00 $76,000.00 USD Per Year
Of course you can drill down into that json object, and select more details about the job. BeautifulSoup docs: https://beautiful-soup-4.readthedocs.io/en/latest/index.html