I am fetching html source code of many pages from one website, I need to convert it into json object and combine with other elements in json doc. . I have seen many questions on same topic but non of them were helpful.
My code:
url = "https://totalhash.cymru.com/analysis/?1ce201cf28c6dd738fd4e65da55242822111bd9f" htmlContent = requests.get(url, verify=False) data = htmlContent.text print("data",data) jsonD = json.dumps(htmlContent.text) jsonL = json.loads(jsonD) ContentUrl='{ "url" : "'+str(urls)+'" ,'+"n"+' "uid" : "'+str(uniqueID)+'" ,n"page_content" : "'+jsonL+'" , n"date" : "'+finalDate+'"}'
above code gives me unicode type, however, when I put that output in jsonLint it gives me invalid json error. Can somebody help me understand how can I convert the complete html into a json objet?
Advertisement
Answer
jsonD = json.dumps(htmlContent.text)
converts the raw HTML content into a JSON string representation.
jsonL = json.loads(jsonD)
parses the JSON string back into a regular string/unicode object. This results in a no-op, as any escaping done by dumps()
is reverted by loads()
. jsonL
contains the same data as htmlContent.text
.
Try to use json.dumps
to generate your final JSON instead of building the JSON by hand:
ContentUrl = json.dumps({ 'url': str(urls), 'uid': str(uniqueID), 'page_content': htmlContent.text, 'date': finalDate })