I am working on a timetable system and need all the free slots (where the student has no lectures). Right now it prints the entire timetable out. I just need to store all the free slots somewhere. They are showing up as “NaN” on the time table. here is my code.
from bs4 import BeautifulSoup import pandas as pd import requests import time import natsort as ns from selenium import webdriver from selenium.webdriver.common.keys import Keys from selenium.webdriver.common.by import By from selenium.webdriver.chrome.service import Service from webdriver_manager.chrome import ChromeDriverManager from selenium.webdriver.support.ui import Select s = Service(ChromeDriverManager().install()) driver = webdriver.Chrome(service=s) driver.maximize_window() #opens up website, probably not needed. driver.implicitly_wait(30) driver.get('https://opentimetable.dcu.ie/') select = Select(driver.find_element_by_tag_name("select")) select.select_by_visible_text("Programmes of Study") search = driver.find_element_by_id("textSearch") search.send_keys("CASE2") checkbox = driver.find_element_by_xpath('.//input[following-sibling::div[contains(text(), "CASE2")]]') # it works but it is harder to remeber checkbox.click() time.sleep(3) html = driver.find_element_by_id("week-pdf-content").get_attribute('outerHTML') df2 = pd.read_html(html)[0] #trying to print free slots x = 0 if df2[x] == "NaN": print(df[x]) x+=1 print(df2.to_string()) # to_string() to display all columns without `...`
Expected output would be all the free times printing out. e.g Monday : 11-12, 13-14, 16-17. for each day.
Here is the current output As you can see it is just printing out the timetable. I would like to store all the times where it says NaN
Advertisement
Answer
Note: This is just an impulse to adapt, you should then be able to make fine tuning on your own.
Based on your update, I now have an idea of what the output should be. To simply output the information in a similar structure, you can iterate over each column of the dataframe if you have set the first column, that keeps the time, as index.
df = df2.set_index('Unnamed: 0') #or df = df2.set_index(list(df2.columns[[0]])) for column in df: print(f'{column}:{", ".join(df[df[column].isna()].index.drop_duplicates().to_list())}')
Output
Mon:8:00, 9:00, 13:00, 18:00, 19:00, 20:00, 21:00 Tue:8:00, 13:00, 14:00, 15:00, 17:00, 18:00, 19:00, 20:00, 21:00 Wed:8:00, 13:00, 15:00, 17:00, 18:00, 19:00, 20:00, 21:00 Thu:8:00, 11:00, 13:00, 14:00, 15:00, 18:00, 19:00, 20:00, 21:00 Fri:8:00, 9:00, 10:00, 11:00, 12:00, 13:00, 14:00, 15:00, 16:00, 17:00, 18:00, 19:00, 20:00, 21:00
To store these infromation in dict:
data = [] for column in df: data.append({column:df[df[column].isna()].index.drop_duplicates().to_list()})
Output
[{'Mon': ['8:00', '9:00', '13:00', '18:00', '19:00', '20:00', '21:00']}, {'Tue': ['8:00', '13:00', '14:00', '15:00', '17:00', '18:00', '19:00', '20:00', '21:00']}, {'Wed': ['8:00', '13:00', '15:00', '17:00', '18:00', '19:00', '20:00', '21:00']}, {'Thu': ['8:00', '11:00', '13:00', '14:00', '15:00', '18:00', '19:00', '20:00', '21:00']}, {'Fri': ['8:00', '9:00', '10:00', '11:00', '12:00', '13:00', '14:00', '15:00', '16:00', '17:00', '18:00', '19:00', '20:00', '21:00']}]