Skip to content
Advertisement

Scraping data from website that refreshes every 10 minutes in python

I am very new to web scraping and python in general. I am working on a project that requires me to scrape data from a website that refreshes/updates data every 10 minutes. I was able to scrape the data for the current 10 minutes but when the data refreshes the previous data is not valid anymore. I need help with 3 things-

  1. There is an input time stamp at the top of the website. How can I change the time in that input to only fetch data for that particular time period? enter image description here

  2. My current code is –

    import requests
    import pandas as pd
    import datetime as dt
    from datetime import datetime
    
    URL1 = "URL.com"
    
    tables1= pd.read_html(URL1)
    
    print("There are : ",len(tables1)," tables1")
    
    PartUsage=pd.DataFrame(tables1[8])
    
    now=datetime.now()
    PartUsage["Date"]=now
    PartUsage.set_index("Date", inplace=True)
    
    from pathlib import Path  
    filepath = Path('Path.csv')  
    filepath.parent.mkdir(parents=True, exist_ok=True)  
    PartUsage.to_csv(filepath)

I added time stamp because there is no timestamp in the required table. How can I link the time stamp to use that as an input?

This is company specific data and hence I cannot provide the link or any further details. Any help will be appreciated. Thank you

Advertisement

Answer

You can use Cron app for this. This is an application, that runs some scripts on a specific schedule. You can also deploy it in an auto-running docker container for convenience. More about cron, you can find there: How do I get a Cron like scheduler in Python?

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement