Hi guys I am trying to scrap/crawl this json based site using scrapy/Beautifulsoup
https://pk.profdir.com/jobs-for-angular-developer-lahore-punjab-cddb
I have write this below code to run read/fetch the json from website:
website_text = response.body.decode("utf-8") jobs_soup = BeautifulSoup(website_text.replace("<", " <"), "html.parser") script_tag = jobs_soup.find('script', {"type": 'application/ld+json'}).text data = json.loads(script_tag, strict=False)
But it will arise this error again and again:
raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
If anyone knows please help me it will be very helpful for me
Advertisement
Answer
The json that is located inside <script>
isn’t valid, so json
by default cannot decode it. Quick-and-dirty fix is replace the "description":
by re.sub
(also, use html5lib
as BeautifulSoup parser):
import re import json import requests from bs4 import BeautifulSoup url = "https://pk.profdir.com/jobs-for-angular-developer-lahore-punjab-cddb" soup = BeautifulSoup(requests.get(url).content, "html5lib") data = soup.select_one('script[type="application/ld+json"]').contents[0] # fix "broken" description data = re.sub( r'(?<="description" : )"(.*?)"(?=,s+")', lambda g: json.dumps(g.group(1)), data, flags=re.S, ) data = json.loads(data) print(json.dumps(data, indent=4))
Prints:
{ "@context": "http://schema.org/", "@type": "JobPosting", "title": "angular-developer", "description": "<p>Designing and developing user interfaces using Angular best practicesn</p><p>n</p><p>Adapting interface for modern internet applications using the latest front-end technologiesn</p><p>n</p><p>Developing product analysis tasks and optimizing the user experiencen</p><p>n</p><p>Proficiency in Angular, HTML, CSS, and JavaScript for rapid prototyping.n</p><p>n</p><p>Integration of APIs and RESTful Services.n</p><p>n</p><p>Creating Maintaining Mobile and Website Responsive Design and Mobile website.n</p><p>n</p><p>Developing Across Browsersn</p><p>n</p><p>Creating tools that improve site interaction regardless of the browser.n</p><p>n</p><p>Managing software workflow.n</p><p>n</p><p>Following SEO best practices Fixing bugs and testing for usabilityn</p><p>n</p><p>Conducting performance testsn</p><p>n</p><p>Consulting with the design teamn</p><p>n</p><p>Ensuring high performance of applications and providing supportn</p><p>n</p><p>n</p><p>Job Requirements:n</p><p>n</p><p>n</p><p>Expert knowledge of HTML5, CSS3n</p><p>n</p><p>Strong knowledge of JavaScriptn</p><p>n</p><p>Experience in JS frameworks Angularn</p><p>n</p><p>Familiarity with Software version control systems e.g., Gitn</p><p>n</p><p>Experience in Node.jsn</p><p>n</p><p>Having knowledge of AWS environment is a plusn</p><p>n</p><p>AlienVault experience is a plusn</p><p>n</p><p>Jira Cloud experience is a plusn</p><p>n</p><p>Knowledge of CSS Pre-processor technologies including SASSn</p><p>n</p><p>Able to quickly transform visual designs into accurate HTML/CSSn</p><p>n</p><p>Ability to write high-performance, reusable code for UI componentsn</p><p>n</p><p>Strong understanding of security and performance fundamentals requiredn</p><p>n</p><p>Familiarity with the whole web stack, including protocols and web server optimization techniquesn</p><p>n</p><p>Great communication skills You'll be interacting with Product and Development teamsn</p><p>n</p><p>Experience in Grunt, Rollup, or Webpack is a plusn</p><p>n</p><p>Good Technical skills, Communication skills, General problem-solving skills, and Coding skillsn</p><p>n</p><p>Package: Negotiable</p>", "identifier": { "@type": "PropertyValue", "name": "TTS", "value": "cddb" }, "datePosted": "2022-02-18T00:00", "validThrough": "2022-05-19T00:00", "employmentType": "permanent<br>full time", "hiringOrganization": { "@type": "Organization", "name": "TTS", "sameAs": "https://pk.profdir.com/companies/tts-ebfu", "logo": "https://pk.profdir.com/apple-icon.png" }, "jobLocation": { "@type": "Place", "address": { "@type": "PostalAddress", "streetAddress": "R Block DHA Phase 2", "addressLocality": "Lahore", "addressRegion": "Punjab", "postalCode": "53720", "addressCountry": "PK" } }, "baseSalary": { "@type": "MonetaryAmount", "currency": "PKR", "value": { "@type": "QuantitativeValue", "value": "70000", "unitText": "MONTH" } } }