Skip to content
Advertisement

CUDA Error: out of memory – Python process utilizes all GPU memory

Even after rebooting the machine, there is >95% of GPU Memory used by python3 process (system-wide interpreter). Note that memory consumption keeps even if there are no running training scripts, and I’ve never used keras/tensorflow in the system environment, only with venv or in docker container.

UPDATED: The last activity was the execution of NN test script with the following configurations:

tensorflow==1.14.0 Keras==2.0.3

JavaScript
JavaScript

After rebooting in recovery mode, I’ve tried to run nvidia-smi -r but It didn’t solve the issue.

Advertisement

Answer

By default Tf allocates GPU memory for the lifetime of a process, not the lifetime of the session object (so memory can linger much longer than the object). That is why memory is lingering after you stop the program. In a lot of cases, using the gpu_options.allow_growth = True parameter is flexible, but it will allocate as much GPU memory needed as the runtime process requires.

To prevent tf.Session from using all of your GPU memory, you can allocate a fixed amount of memory for the total process by changing your gpu_options.allow_growth = True to allow for a defined memory fraction (let’s use 50% since your program seems to be able to use a lot of memory) at runtime like:

JavaScript

This should stop you from reaching the upper limit and cap at ~2GB (since it looks like you have 4GB of GPU).

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement