I am trying to get the id of respective movie name in that i need to check whether the url is working or not . If not then i need to append the movie name in the empty list print(movie_buff_uuid) if i passed the data2 in the above loop i am getting this error urllib.error.HTTPError: HTTP Error 404: Not Found

How to iterate the loop if the condition is not met

I am trying to get the id of respective movie name in that i need to check whether the url is working or not . If not then i need to append the movie name in the empty list

data = {'id': [nan, nan, nan,nan],
            'movie_name': ['captain-fantastic', 'passengers', 'transformers','guardians-of-the-galaxy-vol-2']}


data2 = {'id': [nan, nan, nan,nan],
        'movie_name': ['captain-fantastic', 'passengers', 'transformers','guardians-of-the-galaxy-vol.2']}

dfa = pd.DataFrame(data)
dfa2 = pd.DataFrame(data2)

from turtle import clear
from urllib.request import Request, urlopen
import pandas as pd
import urllib.request as ur, json

movie_buff_uuid= []

for i in dfa["movie_name"]:
    url = ur.urlopen("https://www.moviebuff.com/"+str(i)+".json")
    d = json.loads(url.read())['uuid']
    movie_buff_uuid.append(d)

JavaScript
​x
 
data = {'id': [nan, nan, nan,nan],
            'movie_name': ['captain-fantastic', 'passengers', 'transformers','guardians-of-the-galaxy-vol-2']}
​
​
data2 = {'id': [nan, nan, nan,nan],
        'movie_name': ['captain-fantastic', 'passengers', 'transformers','guardians-of-the-galaxy-vol.2']}
​
dfa = pd.DataFrame(data)
dfa2 = pd.DataFrame(data2)
​
from turtle import clear
from urllib.request import Request, urlopen
import pandas as pd
import urllib.request as ur, json
​
movie_buff_uuid= []
​
for i in dfa["movie_name"]:
    url = ur.urlopen("https://www.moviebuff.com/"+str(i)+".json")
    d = json.loads(url.read())['uuid']
    movie_buff_uuid.append(d)    
​

print(movie_buff_uuid)

if i passed the data2 in the above loop i am getting this error urllib.error.HTTPError: HTTP Error 404: Not Found to overcome this error. I have tried this

movie_buff_uuid= []

for i in dfa["movie_name"]:
    url = ur.urlopen("https://www.moviebuff.com/"+str(i)+".json")
    if url.getcode() == 404:
       d = json.loads(url.read())['uuid']
    else:
        d = i

movie_buff_uuid.append(d)
     
print(movie_buff_uuid)

JavaScript
 
movie_buff_uuid= []
​
for i in dfa["movie_name"]:
    url = ur.urlopen("https://www.moviebuff.com/"+str(i)+".json")
    if url.getcode() == 404:
       d = json.loads(url.read())['uuid']
    else:
        d = i
​
movie_buff_uuid.append(d)
     
print(movie_buff_uuid)
​

Expected output:

['f8379c86-1307-4b22-b175-5000284ef6b9', '8f0c611a-4356-454d-a6d6-aac437519540', '7cd2dffa-cb31-4897-a7b0-30dcaee66104', 'guardians-of-the-galaxy-vol.2']

JavaScript
 
['f8379c86-1307-4b22-b175-5000284ef6b9', '8f0c611a-4356-454d-a6d6-aac437519540', '7cd2dffa-cb31-4897-a7b0-30dcaee66104', 'guardians-of-the-galaxy-vol.2']
​

Any idea would be appreciated

Answer

As dominik-air said, you’re getting a 404 response when the file doesn’t exist. However Python’s built-in urllib raises an error when it gets this (unlike, for example, the justly popular requests library).

In Python generally we use try/catch flow to deal with this (EAFP).

Putting it all together:

import json
from time import sleep
from urllib.error import HTTPError
from urllib.request import urlopen

movie_buff_uuid = []
movies = ['captain-fantastic', 'passengers', 'transformers','guardians-of-the-galaxy-vol-2']

for movie in movies:
    try:
        url = urlopen(f"https://www.moviebuff.com/{movie}.json")
        uuid = json.loads(url.read())['uuid']
        movie_buff_uuid.append(uuid)
    except HTTPError:
        movie_buff_uuid.append(movie)
    sleep(5)  # let's avoid hitting the server too heavily

JavaScript
 
import json
from time import sleep
from urllib.error import HTTPError
from urllib.request import urlopen
​
movie_buff_uuid = []
movies = ['captain-fantastic', 'passengers', 'transformers','guardians-of-the-galaxy-vol-2']
​
for movie in movies:
    try:
        url = urlopen(f"https://www.moviebuff.com/{movie}.json")
        uuid = json.loads(url.read())['uuid']
        movie_buff_uuid.append(uuid)
    except HTTPError:
        movie_buff_uuid.append(movie)
    sleep(5)  # let's avoid hitting the server too heavily
​

(You don’t need dataframes in the first part either :) )

Advertisement

Answer