<script type="text/javascript"> /** * Define SVG path for target icon */ var targetSVG = "M9,0C4.029,0,0,4.029,0,9s4.029,9,9,9s9-4.029,9-9S13.971,0,9,0z M9,15.93 c-3.83,0-6.93-3.1-6.93-6.93S5.17,2.07,9,2.07s6.93,3.1,6.93,6.93S12.83,15.93,9,15.93 M12.5,9c0,1.933-1.567,3.5-3.5,3.5S5.5,10.933,5.5,9S7.067,5.5,9,5.5 S12.5,7.067,12.5,9z"; /** * Create the map */ var i=1; var countrydataprovider = { "map": "indiaLow", "getAreasFromMap": true, "theme": "none", "imagesSettings": { "rollOverColor": "#089282", "rollOverScale": 3, "labelPosition": "middle", "labelFontSize": 8, "labelColor": "#fff", "selectedScale": 3, "selectedColor": "#089282", "color": "#13564e" }, "images": [ { "imageURL": "nowcast_marker/map-marker-icon-png-green.png", "width": 20, "height": 20, "description": "<p>No Warning </br></br> Time of issue: 2022-10-07</br>1005 Hrs</br> Valid upto: 1305 Hrs </p>", "zoomLevel": 5, "scale": 0.5, "title": "Bapatla", "latitude": "15.905897", "longitude": "80.471587" },
I want to get the data regarding the information regarding “images” subsection. This is the code that I have written until now. However, I could not move forward. Could anybody please help?
import requests # This is a request to the website from bs4 import BeautifulSoup # This is a parser url = "https://mausam.imd.gov.in/imd_latest/contents/stationwise-nowcast-warning.php" html = requests.get(url).content # requests instance soup = BeautifulSoup(html, 'html.parser') # getting raw data a = soup.find('script', attrs={'type': 'text/javascript'})
Advertisement
Answer
You are on the right track, you just need to further dissect the information from that tag, to get what you need. Here is one way of obtaining that data:
import requests import pandas as pd from bs4 import BeautifulSoup as bs import json url = 'https://mausam.imd.gov.in/imd_latest/contents/stationwise-nowcast-warning.php' script_w_data = bs(requests.get(url).text, 'html.parser').select_one('script[type="text/javascript"]').text.split('"images": [')[1].split(']')[0] obj = json.loads('[' + script_w_data + ']') df = pd.json_normalize(obj) print(df)
Result in terminal:
imageURL width height description zoomLevel scale title latitude longitude 0 nowcast_marker/map-marker-icon-png-green.png 20 20 <p>No Warning </br></br> Time of issue: 2022-1... 5 0.5 Bapatla 15.905897 80.471587 1 nowcast_marker/map-marker-icon-png-green.png 20 20 <p>No Warning </br></br> Time of issue: 2022-1... 5 0.5 Eluru 16.71066 81.09524 2 nowcast_marker/map-marker-icon-png-yellow.png 20 20 <p>Light rain: < 5 mm/hr</br> Light Thundersto... 5 0.5 Gannavaram 16.540171 80.801249 3 nowcast_marker/map-marker-icon-png-green.png 20 20 <p>No Warning </br></br> Time of issue: 2022-1... 5 0.5 Guntur 16.306652 80.43654 4 nowcast_marker/map-marker-icon-png-green.png 20 20 <p>No Warning </br></br> Time of issue: 2022-1... 5 0.5 Kakinada 16.945181 82.238647 ... ... ... ... ... ... ... ... ... ... 1115 nowcast_marker/map-marker-icon-png-green.png 20 20 <p>No Warning </br></br> Time of issue: 2022-1... 5 0.5 Namrup 27.12 95.18 1116 nowcast_marker/map-marker-icon-png-green.png 20 20 <p>No Warning </br></br> Time of issue: 2022-1... 5 0.5 Nazira 26.54 94.44 1117 nowcast_marker/map-marker-icon-png-green.png 20 20 <p>No Warning </br></br> Time of issue: 2022-1... 5 0.5 Moreh 24.2475 94.3045 1118 nowcast_marker/map-marker-icon-png-green.png 20 20 <p>No Warning </br></br> Time of issue: 2022-1... 5 0.5 Moirang 24.5028 93.7768 1119 nowcast_marker/map-marker-icon-png-green.png 20 20 <p>No Warning </br></br> Time of issue: 2022-1... 5 0.5 Jhandutta 31.3702 76.6369 1120 rows × 9 columns
See pandas documentation at https://pandas.pydata.org/docs/
Also BeautifulSoup docs: https://beautiful-soup-4.readthedocs.io/en/latest/