I have one web scraping function which fetches data of 190 URL’s. To complete it fast I used concurrent.future.Threadpool.executor. I am saving that data to SQL Server database. I have to do these all process repeatedly to every 3 mins from 9AM to 4PM. But when I use while loop or scheduler that concurrent future not works. No error and no output.
# required libraries import request urls = [] def data_fetched(url): # data fetching # operations on data # data saving to SQL server return '' while True: with concurrent.future.ThreadPool.executor() as executor: executor.map(data_fetched, url) time.sleep(60)
I want to repeat all these things to every 3 mins, explained flow of code. Please help me how to schedule it.
start = dt.strptime("09:15:00", "%H:%M:%S") end = dt.strptime("15:30:00", "%H:%M:%S") # min_gap min_gap = 3 # compute datetime interval arr = [(start + timedelta(hours=min_gap*i/60)).strftime("%H:%M:%S") for i in range(int((end-start).total_seconds() / 60.0 / min_gap))] while True: weekno = datetime.datetime.today().weekday() now = dt.now() # gets current datetime hour = str(now.hour) # gets current hour minute = str(now.minute) # gets current minute second = str(now.second) current_time = f"{hour}:{minute}:{second}" # combines current hour and minute # checks if current time is in the hours list if weekno < 5 and current_time in arr: print('data_loaded') else: # 5 Sat, 6 Sun pass time.sleep(60)
So under these while loop I want to call that function using concurrent.futures.
Advertisement
Answer
You can create a seperate function and schedule it to execute the data_fetched()
. I hope your urls
variable contains the list of urls and not empty list.
from schedule import every, repeat, run_pending import time import request urls = [] def data_fetched(url): # data fetching # operations on data # data saving to SQL server return '' @repeat(every(3).minutes) def execute_script(): with concurrent.future.ThreadPool.executor() as executor: executor.map(data_fetched, urls) while True: run_pending() time.sleep(1)