I am very new to web scraping and python in general. I am working on a project that requires me to scrape data from a website that refreshes/updates data every 10 minutes. I was able to scrape the data for the current 10 minutes but when the data refreshes the previous data is not valid anymore. I need help with 3 things-
There is an input time stamp at the top of the website. How can I change the time in that input to only fetch data for that particular time period? enter image description here
My current code is –
import requests import pandas as pd import datetime as dt from datetime import datetime URL1 = "URL.com" tables1= pd.read_html(URL1) print("There are : ",len(tables1)," tables1") PartUsage=pd.DataFrame(tables1[8]) now=datetime.now() PartUsage["Date"]=now PartUsage.set_index("Date", inplace=True) from pathlib import Path filepath = Path('Path.csv') filepath.parent.mkdir(parents=True, exist_ok=True) PartUsage.to_csv(filepath)
I added time stamp because there is no timestamp in the required table. How can I link the time stamp to use that as an input?
This is company specific data and hence I cannot provide the link or any further details. Any help will be appreciated. Thank you
Advertisement
Answer
You can use Cron app for this. This is an application, that runs some scripts on a specific schedule. You can also deploy it in an auto-running docker container for convenience. More about cron, you can find there: How do I get a Cron like scheduler in Python?