Skip to content
Advertisement

How to I adjust this nested loop to store the output of different URL requests in separate databases or .csv files?

so I’m working at a simple project but apparently I’m stuck at the first step. Basically I’m requesting the .json files from a public github repository. 7 different files which I aim to download and convert to 7 differently named databases.

I tried to use this nested loop, trying to create 7 different csv files, the only problem is that it gives me 7 different named csv files with the same content (the one from the last URL). I think it has something to do with the way I store the data from the json output in the list “data”. How could I solve this problem?

import pandas as pd
import datetime
import re, json, requests #this is needed to import the data from the github repository

naz_l_url = 'https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-json/dpc-covid19-ita-andamento-nazionale-latest.json'
naz_url = 'https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-json/dpc-covid19-ita-andamento-nazionale.json'
reg_l_url = 'https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-json/dpc-covid19-ita-regioni-latest.json'
reg_url = 'https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-json/dpc-covid19-ita-regioni.json'
prov_l_url = 'https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-json/dpc-covid19-ita-province-latest.json'
prov_url = 'https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-json/dpc-covid19-ita-province.json'
news_url = 'https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-json/dpc-covid19-ita-note.json'

list_of_url= [naz_l_url,naz_url, reg_l_url,reg_url,prov_url,prov_l_url,news_url]
csv_names = ['01','02','03','04','05','06','07']

for i in list_of_url:
 resp = requests.get(i)
 data = pd.read_json(resp.text, convert_dates=True)
 for x in csv_names:
  data.to_csv(f"{x}_df.csv")

I want to try two different ways. 1 with the loop giving me csv files, and another with the loop giving me pd dataframes. But I need to solve the problem of the loop giving me the same output for now.

Advertisement

Answer

The problem is that you are iterating over the full list of names for each URL you download. Note how for x in csv_names is inside the for i in list_of_url loop.

Where the problem comes from

Python uses indentation levels to determine when you are in and out of a loop (as other languages might use curly braces, begin/end, or do/end). I’d recommend you brush up on this topic. For example, with Concept of Indentation in Python. You can see the official documentation about Compound statements, too.

Proposed solution

I’d recommend you replace the naming of the files, and do something like this instead:

import pandas as pd
import datetime
import re, json, requests #this is needed to import the data from the github repository
from urllib.parse import urlparse

naz_l_url = 'https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-json/dpc-covid19-ita-andamento-nazionale-latest.json'
naz_url = 'https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-json/dpc-covid19-ita-andamento-nazionale.json'
reg_l_url = 'https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-json/dpc-covid19-ita-regioni-latest.json'
reg_url = 'https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-json/dpc-covid19-ita-regioni.json'
prov_l_url = 'https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-json/dpc-covid19-ita-province-latest.json'
prov_url = 'https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-json/dpc-covid19-ita-province.json'
news_url = 'https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-json/dpc-covid19-ita-note.json'

list_of_url= [naz_l_url,naz_url, reg_l_url,reg_url,prov_url,prov_l_url,news_url]
csv_names = ['01','02','03','04','05','06','07']

for url in list_of_url:
 resp = requests.get(url)
 data = pd.read_json(resp.text, convert_dates=True)
 # here is where you DON'T want to have a nested `for` loop
 file_name = urlparse(url).path.split('/')[-1].replace('json', 'csv')
 data.to_csv(file_name)
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement