I am trying to get the id of respective movie name in that i need to check whether the url is working or not . If not then i need to append the movie name in the empty list
JavaScript
x
22
22
1
data = {'id': [nan, nan, nan,nan],
2
'movie_name': ['captain-fantastic', 'passengers', 'transformers','guardians-of-the-galaxy-vol-2']}
3
4
5
data2 = {'id': [nan, nan, nan,nan],
6
'movie_name': ['captain-fantastic', 'passengers', 'transformers','guardians-of-the-galaxy-vol.2']}
7
8
dfa = pd.DataFrame(data)
9
dfa2 = pd.DataFrame(data2)
10
11
from turtle import clear
12
from urllib.request import Request, urlopen
13
import pandas as pd
14
import urllib.request as ur, json
15
16
movie_buff_uuid= []
17
18
for i in dfa["movie_name"]:
19
url = ur.urlopen("https://www.moviebuff.com/"+str(i)+".json")
20
d = json.loads(url.read())['uuid']
21
movie_buff_uuid.append(d)
22
print(movie_buff_uuid)
if i passed the data2 in the above loop i am getting this error urllib.error.HTTPError: HTTP Error 404: Not Found to overcome this error. I have tried this
JavaScript
1
13
13
1
movie_buff_uuid= []
2
3
for i in dfa["movie_name"]:
4
url = ur.urlopen("https://www.moviebuff.com/"+str(i)+".json")
5
if url.getcode() == 404:
6
d = json.loads(url.read())['uuid']
7
else:
8
d = i
9
10
movie_buff_uuid.append(d)
11
12
print(movie_buff_uuid)
13
Expected output:
JavaScript
1
2
1
['f8379c86-1307-4b22-b175-5000284ef6b9', '8f0c611a-4356-454d-a6d6-aac437519540', '7cd2dffa-cb31-4897-a7b0-30dcaee66104', 'guardians-of-the-galaxy-vol.2']
2
Any idea would be appreciated
Advertisement
Answer
As dominik-air said, you’re getting a 404 response when the file doesn’t exist. However Python’s built-in urllib
raises an error when it gets this (unlike, for example, the justly popular requests
library).
In Python generally we use try/catch flow to deal with this (EAFP).
Putting it all together:
JavaScript
1
17
17
1
import json
2
from time import sleep
3
from urllib.error import HTTPError
4
from urllib.request import urlopen
5
6
movie_buff_uuid = []
7
movies = ['captain-fantastic', 'passengers', 'transformers','guardians-of-the-galaxy-vol-2']
8
9
for movie in movies:
10
try:
11
url = urlopen(f"https://www.moviebuff.com/{movie}.json")
12
uuid = json.loads(url.read())['uuid']
13
movie_buff_uuid.append(uuid)
14
except HTTPError:
15
movie_buff_uuid.append(movie)
16
sleep(5) # let's avoid hitting the server too heavily
17
(You don’t need dataframes in the first part either :) )