I am building a neural network on Keras, including multiple layers of LSTM, Permute and Dense.
It seems LSTM is GPU-unfriendly. So I did research and use
With tf.device('/cpu:0'): out = LSTM(cells)(inp)
But based on my understanding about
try...finally block to ensure that clean-up code is executed. I don’t know whether the following CPU/GPU mixture usage code works or not? Will they accelerate speed of training?
With tf.device('/cpu:0'): out = LSTM(cells)(inp) With tf.device('/gpu:0'): out = Permute(some_shape)(out) With tf.device('/cpu:0'): out = LSTM(cells)(out) With tf.device('/gpu:0'): out = Dense(output_size)(out)
As you may read here –
tf.device is a context manager which switches a default device to this passed as its argument in a context (block) created by it. So this code should run all
'/cpu:0' device at
CPU and rest on
The question will it speed up your training is really hard to answer because it depends on the machine you use – but I don’t expect computations to be faster as each change of a device makes data to be copied between
GPU RAM and machine
RAM. This could even slow down your computations.