I am trying to collect the size of homepages of a list of sites using multiprocessing. Following is the code :
JavaScript
x
37
37
1
import time
2
from multiprocessing import Pool, TimeoutError
3
4
start = time.time()
5
6
7
def sitesize(url):
8
for url in sites:
9
with urllib.request.urlopen(url) as u:
10
page = u.read()
11
print(url, len(page))
12
13
14
sites = [
15
'https://www.yahoo.com',
16
'http://www.cnn.com',
17
'http://www.python.org',
18
'http://www.jython.org',
19
'http://www.pypy.org',
20
'http://www.perl.org',
21
'http://www.cisco.com',
22
'http://www.facebook.com',
23
'http://www.twitter.com',
24
'http://arstechnica.com',
25
'http://www.reuters.com',
26
'http://www.abcnews.com',
27
'http://www.cnbc.com',
28
]
29
30
if __name__ == '__main__':
31
32
with Pool(processes=4) as pool:
33
for result in pool.imap_unordered(sitesize, sites):
34
print(result)
35
36
print(f'Time taken : {time.time() - start}')
37
I am having a Windows 10 laptop with Python 3.9 running. I am not using venv.
This code goes into a loop – executes 4 times and takes 4 times longer.
What is the error here ? Can someone help ?
Thanks in advance
Sachin
Advertisement
Answer
I think you misunderstood how the pool.imap_unordered
works, the provided function will be called with one of the values from the sites
, whereas in your case you actually completely discard the provided url
and loop on all values in the sites
list.
You should simply do
JavaScript
1
5
1
def sitesize(url):
2
with urllib.request.urlopen(url) as u:
3
page = u.read()
4
print(url, len(page))
5
See the doc.