Skip to content
Advertisement

(Tensorflow) Stuck at Epoch 1 during model.fit()

I’ve been trying to make Tensorflow 2.8.0 work with my Windows GPU (GeForce GTX 1650 Ti), and even though it detects my GPU, any model that I make will be stuck at Epoch 1 indefinitely when I try to use the fit method till the kernel (I’ve tried on jupyter notebook and spyder) hangs and restarts.

Based on Tensorflow’s website, I’ve downloaded the respective cuDNN and CUDA versions, for which I’ve further verified (together with tensorflow’s detection of my GPU) by running the various commands:

CUDA (Supposed to be 11.2)

JavaScript

cuDNN (Supposed to be 8.1)

JavaScript

GPU Checks

JavaScript

When I then try to fit any sort of model, it just fails following what I described above. What is surprising is that even though it can’t load code such as that described in Tensorflow’s CNN Tutorial, the only time it ever works is if I run the chunk of code from this stackoverflow question. This chunk of code looks almost the same as every other chunk that failed.

Can someone help me with this issue? I’ve been desperately testing TensorFlow with every chunk of code that I came across for the past couple of hours, and the only time where it does not get stuck at Epoch 1 is with the link above.

**(I’ve also tried running only on my CPU via os.environ['CUDA_VISIBLE_DEVICES'] = '-1' and everything seems to work fine)

Advertisement

Answer

Update (Solution)

It seems like the suggestions from this post helped – I’ve copied the following files from the zipped cudnn bin sub folder (cudnn-11.2-windows-x64-v8.1.1.33cudabin) into my cuda bin folder (C:Program FilesNVIDIA GPU Computing ToolkitCUDAv11.2bin)

JavaScript

It seems like I initially misinterpreted the copy all cudnn*.dll files as only copying over the cudnn64_8.dll file, rather than copying every other file listed above.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement