Description of the situation: It is a script that scrolls in a frame in order to extract the information.
<ul> <li> </li> <li> </li> <li> </li> <li> </li> <li> </li> ... </ul>
The list length of about 30 items, when scrolling, no new items are added <li> </li>
, only updated. The structure of the DOM does not increase.
Explaining the problem:
When the script scrolls, it must extract all the elements of the <li> </li>
for each iteration because they are renewed.
Here is the logic of scrolling and extracting elements. The code I use:
SCROLL_PAUSE_TIME = 5 # Get scroll height last_height = driver.execute_script("return document.querySelector('div[data-tid="pane-list-viewport"]').scrollHeight;") all_msgs_loaded = False while not all_msgs_loaded: li_elements: List[WebElement] = self._driver.find_elements(By.XPATH, "//li[@data-tid='pane-item']") driver.execute_script("document.querySelector('li[data-tid="pane-item"]').scrollIntoView();") # Wait to load page time.sleep(SCROLL_PAUSE_TIME) # Calculate new scroll height and compare with last scroll height new_height = driver.execute_script("return document.querySelector('div[data-tid="pane-list-viewport"]').scrollHeight;") if new_height == last_height: all_msgs_loaded = True last_height = new_height
For each iteration li_elements receives about 30 WebElements. If i comment on the line with find_elements, the script works for hours without increasing the RAM consumption. I mention that I do not save anything in runtime, that I don’t have an increase in consumption elsewhere.
Another way I used to get li_elements is through
self._driwer.execute_script ()
Example:
li_elements = (self._driver.execute_script( "return document.querySelectorAll('li[data-tid="pane-item"]');", WebDriverWait(self._edge_driver, 20).until( EC.visibility_of_element_located((By.XPATH, "//li[@data-tid='pane-item']")))
By both methods I get the same result that I have, but the RAM increase is the same. RAM grows indefinitely until TaskManager destroys the process on its own for security.
I analyzed the internal structure of these functions, but I did not find anything that could load the RAM.
Another modality would be find_elements_by_css_selector ()
, but inside it is called find_elements ()
.
I also tried different combinations with sleep (), but nothing helps, RAM does not decrease.
Can you please explain to me what is happening in reality, I do not understand why RAM consumption is increasing.
Can you tell me if there is another method of extracting the elements without consuming RAM?
Advertisement
Answer
Try getting just what you need instead of the full element:
lis = driver.execute_script(""" return [...document.querySelectorAll('li[data-tid="pane-item"]')].map(li => li.innerText) """)
I can’t tell what you’re doing with them, but if you’re adding elements to a big array, and there’s enough of them, you will hit a RAM limit