I am new to python and I m learning by doing.
At this moment, my code is running quite slow and it seems to take longer and longer by each time I run it.
The idea is to download an employee list as CSV, then to check the location of each Employee ID by running it trough a specific page then writing it to an excel file.
We have around 600 associates on site each day and I need to find their location and to keep refreshing it each 2-4 minutes.
EDIT:
For everyone to have a better understanding, I have a CSV file ( TOT.CSV ) that contains Employee ID’s, Names and other information of the associates that I have on site.
In order to get their location, I need to run each employee ID from that CSV file trough https://guided-coaching-dub.corp.amazon.com/api/employee-location-svc/GetLastSeenLocationOfEmployee?employeeId= 1 by 1 and at the same time to write it in another CSV file ( Location.csv ). Right now, it does in about 10 minutes and I want to understand if the way I did it is the best possible way, or if there is something else that I could try.
My code looks like this:
# GET EMPLOYEE ID FROM THE CSV data = read_csv("Z:\_Tracker\Dump\attendance\TOT.csv") # converting column data to list TOT_employeeID = data['Employee ID'].tolist() # Clean the Location Sheet with open("Z:\_Tracker\Dump\attendance\Location.csv", "w") as f: pass print("Previous Location data cleared ... ") # go through EACH employee ID to find out location for x in TOT_employeeID: driver.get( "https://guided-coaching-dub.corp.amazon.com/api/employee-location-svc/GetLastSeenLocationOfEmployee?employeeId=" + x) print("Getting Location data for EmployeeID: " + x) locData = driver.find_element(By.TAG_NAME, 'body').text aaData = str(locData) realLoc = aaData.split('"') # write to excel with open("Z:\_Tracker\Dump\attendance\Location.csv", "a") as f: writer = csv.writer(f) writer.writerow(realLoc) time.sleep(5) print("Employee Location data downloaded...")
Is there a way I can do this faster?
Thank you in advance!
Regards, Alex
Advertisement
Answer
Something like this.
import concurrent.futures def process_data(data: pd.DataFrame) -> None: associates = data['Employee ID'].unique() with concurrent.futures.ProcessPoolExecutor() as executer: executer.map(get_location, associates) def get_location(associate: str) -> None: driver.get( "https://guided-coaching-dub.corp.amazon.com/api/employee-location-svc/GetLastSeenLocationOfEmployee?" f"employeeId={associate}") print(f"Getting Location data for EmployeeID: {associate}") realLoc = str(driver.find_element(By.TAG_NAME, 'body').text).split('"') with open("Z:\_Tracker\Dump\attendance\Location.csv", "a") as f: writer = csv.writer(f) writer.writerow(realLoc) if __name__ == "__main__": data = read_csv("Z:\_Tracker\Dump\attendance\TOT.csv") process_data(data)