I am writing a code that is supposed to open a url, identify the 3rd link and repeat this process 3 times (each time with the new url).
I wrote a loop (below), but it seems to each time sart over with the original url.
Can someone help me fix my code?
JavaScript
x
37
37
1
import urllib.request, urllib.parse, urllib.error
2
from urllib.parse import urljoin
3
from bs4 import BeautifulSoup
4
5
#blanc list
6
l = []
7
8
#starting url
9
url = input('Enter URL: ')
10
if len(url) < 1:
11
url = 'http://py4e-data.dr-chuck.net/known_by_Fikret.html'
12
13
#loop
14
for _ in range(4):
15
html = urllib.request.urlopen(url).read() #open url
16
soup = BeautifulSoup(html, 'html.parser') #parse through BeautifulSoup
17
tags = soup('a') #extract tags
18
19
for tag in tags:
20
url = tag.get('href', None) #extract links from tags
21
l.append(url) #add the links to a list
22
url = l[2:3] #slice the list to extract the 3rd url
23
url = ' '.join(str(e) for e in url) #change the type to string
24
print(url)
25
26
Current Output:
27
http://py4e-data.dr-chuck.net/known_by_Montgomery.html
28
http://py4e-data.dr-chuck.net/known_by_Montgomery.html
29
http://py4e-data.dr-chuck.net/known_by_Montgomery.html
30
http://py4e-data.dr-chuck.net/known_by_Montgomery.html
31
32
Desired output:
33
http://py4e-data.dr-chuck.net/known_by_Montgomery.html
34
http://py4e-data.dr-chuck.net/known_by_Mhairade.html
35
http://py4e-data.dr-chuck.net/known_by_Butchi.html
36
http://py4e-data.dr-chuck.net/known_by_Anayah.html
37
Advertisement
Answer
You need to define the empty list within the loop. The following code works:
JavaScript
1
26
26
1
import urllib.request, urllib.parse, urllib.error
2
from urllib.parse import urljoin
3
from bs4 import BeautifulSoup
4
5
#blanc list
6
# l = []
7
8
#starting url
9
url = input('Enter URL: ')
10
if len(url) < 1:
11
url = 'http://py4e-data.dr-chuck.net/known_by_Fikret.html'
12
13
#loop
14
for _ in range(4):
15
l = []
16
html = urllib.request.urlopen(url).read() #open url
17
soup = BeautifulSoup(html, 'html.parser') #parse through BeautifulSoup
18
tags = soup('a') #extract tags
19
20
for tag in tags:
21
url = tag.get('href', None) #extract links from tags
22
l.append(url) #add the links to a list
23
url = l[2:3] #slice the list to extract the 3rd url
24
url = ' '.join(str(e) for e in url) #change the type to string
25
print(url)
26
Result in terminal:
JavaScript
1
5
1
http://py4e-data.dr-chuck.net/known_by_Montgomery.html
2
http://py4e-data.dr-chuck.net/known_by_Mhairade.html
3
http://py4e-data.dr-chuck.net/known_by_Butchi.html
4
http://py4e-data.dr-chuck.net/known_by_Anayah.html
5