Skip to content
Advertisement

Format the extracted covid vaccine data from website

Trying to format the “Vaccine data” from URL to pandas dataframe

https://www.mygov.in/sites/default/files/covid/vaccine/covid_vaccine_timeline.json

Here is the parent website

https://www.mygov.in/

Sample output

{"vaccine_data":[{"day":"2021-03-01","india_dose1":12256337,"india_dose2":2597799,"india_total_doses":14854136,"india_last_dose1":null,"india_last_dose2":null,"india_last_total_doses":null,"vacc_st_data":[{"st_name":"Andaman and Nicobar","state_id":"1","covid_state_name":"Andaman and Nicobar","covid_state_id":"35","dose1":"6581","dose2":"2556","total_doses":"9137","last_dose1":"","last_dose2":"","last_total_doses":""},{"st_name":"Andhra Pradesh","state_id":"2","covid_state_name":"Andhra Pradesh","covid_state_id":"28","dose1":"541202","dose2":"142431","total_doses":"683633","last_dose1":"","last_dose2":"","last_total_doses":""},{"st_name":"Arunachal Pradesh","state_id":"3","covid_state_name":"Arunachal Pradesh","covid_state_id":"12","dose1":"27572","dose2":"7309","total_doses":"34881","last_dose1":"","last_dose2":"","last_total_doses":""},{"st_name":"Assam","state_id":"4","covid_state_name":"Assam","covid_state_id":"18","dose1":"201640","dose2":"29159","total_doses":"230799","last_dose1":"","last_dose2":"","last_total_doses":""},{"st_name":"Bihar","state_id":"4","covid_state_name":"Bihar","covid_state_id":"10","dose1":"562270","dose2":"81079","total_doses":"643349","last_dose1":"","last_dose2":"","last_total_doses":""},{"st_name":"Chandigarh","state_id":"6","covid_state_name":"Chandigarh","covid_state_id":"4","dose1":"22424","dose2":"1899","total_doses":"24323","last_dose1":"","last_dose2":"","last_total_doses":""},

test = pd.read_json("/Users/dsg281/Downloads/vacin.json")

I am trying to extract the data in the below format in my data frame

enter image description here

Advertisement

Answer

import pandas as pd
import requests

req=requests.get("https://www.mygov.in/sites/default/files/covid/vaccine/covid_vaccine_timeline.json")

for i in range(len(req.json()["vaccine_data"])):
  df=pd.json_normalize(req.json()["vaccine_data"][i]['vacc_st_data'])
print(df)
    
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement