Skip to content
Advertisement

Python chunks write to excel

I am new to python and I m learning by doing.

At this moment, my code is running quite slow and it seems to take longer and longer by each time I run it.

The idea is to download an employee list as CSV, then to check the location of each Employee ID by running it trough a specific page then writing it to an excel file.

We have around 600 associates on site each day and I need to find their location and to keep refreshing it each 2-4 minutes.

EDIT:

For everyone to have a better understanding, I have a CSV file ( TOT.CSV ) that contains Employee ID’s, Names and other information of the associates that I have on site.

In order to get their location, I need to run each employee ID from that CSV file trough https://guided-coaching-dub.corp.amazon.com/api/employee-location-svc/GetLastSeenLocationOfEmployee?employeeId= 1 by 1 and at the same time to write it in another CSV file ( Location.csv ). Right now, it does in about 10 minutes and I want to understand if the way I did it is the best possible way, or if there is something else that I could try.

My code looks like this:

# GET EMPLOYEE ID FROM THE CSV


data = read_csv("Z:\_Tracker\Dump\attendance\TOT.csv")

# converting column data to list
TOT_employeeID = data['Employee ID'].tolist()


# Clean the Location Sheet


with open("Z:\_Tracker\Dump\attendance\Location.csv", "w") as f:
    pass

print("Previous Location data cleared ... ")


# go through EACH employee ID to find out location


for x in TOT_employeeID:
    driver.get(
        "https://guided-coaching-dub.corp.amazon.com/api/employee-location-svc/GetLastSeenLocationOfEmployee?employeeId=" + x)
    print("Getting Location data for EmployeeID: " + x)
    locData = driver.find_element(By.TAG_NAME, 'body').text
    aaData = str(locData)
    realLoc = aaData.split('"')

    # write to excel
    with open("Z:\_Tracker\Dump\attendance\Location.csv",
              "a") as f:
        writer = csv.writer(f)
        writer.writerow(realLoc)

time.sleep(5)
print("Employee Location data downloaded...")

Is there a way I can do this faster?

Thank you in advance!

Regards, Alex

Advertisement

Answer

Something like this.

import concurrent.futures


def process_data(data: pd.DataFrame) -> None:
    associates = data['Employee ID'].unique()
    with concurrent.futures.ProcessPoolExecutor() as executer:
        executer.map(get_location, associates)


def get_location(associate: str) -> None:
    driver.get(
        "https://guided-coaching-dub.corp.amazon.com/api/employee-location-svc/GetLastSeenLocationOfEmployee?"
        f"employeeId={associate}")
    print(f"Getting Location data for EmployeeID: {associate}")
    realLoc = str(driver.find_element(By.TAG_NAME, 'body').text).split('"')

    with open("Z:\_Tracker\Dump\attendance\Location.csv", "a") as f:
        writer = csv.writer(f)
        writer.writerow(realLoc)


if __name__ == "__main__":
    data = read_csv("Z:\_Tracker\Dump\attendance\TOT.csv")
    process_data(data)
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement