I have an autoencoder that takes an image as an input and produces a new image as an output.
The input image (1x1024x1024x3) is split into patches (1024x32x32x3) before being fed to the network.
Once I have the output, also a batch of patches size 1024x32x32x3, I want to be able to reconstruct a 1024x1024x3 image. I thought I had this sussed by simply reshaping, but here’s what happened.
First, the image as read by Tensorflow:
I patched the image with the following code
patch_size = [1, 32, 32, 1] patches = tf.extract_image_patches([image], patch_size, patch_size, [1, 1, 1, 1], 'VALID') patches = tf.reshape(patches, [1024, 32, 32, 3])
Here are a couple of patches from this image:
But it’s when I reshape this patch data back into an image that things go pear-shaped.
reconstructed = tf.reshape(patches, [1, 1024, 1024, 3]) converted = tf.image.convert_image_dtype(reconstructed, tf.uint8) encoded = tf.image.encode_png(converted)
In this example, no processing has been done between patching and reconstructing. I have made a version of the code you can use to test this behaviour. To use it, run the following:
echo "/path/to/test-image.png" > inputs.txt mkdir images python3 image_test.py inputs.txt images
The code will make one input image, one patch image, and one output image for each of the 1024 patches in each input image, so comment out the lines that create input and output images if you’re only concerned with saving all the patches.
Somebody, please explain what happened :(
Advertisement
Answer
Use Update#2 – One small example for your task: (TF 1.0)
Considering image of size (4,4,1) converted to patches of size (4,2,2,1) and reconstructed them back to image.
import tensorflow as tf image = tf.constant([[[1], [2], [3], [4]], [[5], [6], [7], [8]], [[9], [10], [11], [12]], [[13], [14], [15], [16]]]) patch_size = [1,2,2,1] patches = tf.extract_image_patches([image], patch_size, patch_size, [1, 1, 1, 1], 'VALID') patches = tf.reshape(patches, [4, 2, 2, 1]) reconstructed = tf.reshape(patches, [1, 4, 4, 1]) rec_new = tf.space_to_depth(reconstructed,2) rec_new = tf.reshape(rec_new,[4,4,1]) sess = tf.Session() I,P,R_n = sess.run([image,patches,rec_new]) print(I) print(I.shape) print(P.shape) print(R_n) print(R_n.shape)
Output:
[[[ 1][ 2][ 3][ 4]] [[ 5][ 6][ 7][ 8]] [[ 9][10][11][12]] [[13][14][15][16]]] (4, 4, 1) (4, 2, 2, 1) [[[ 1][ 2][ 3][ 4]] [[ 5][ 6][ 7][ 8]] [[ 9][10][11][12]] [[13][14][15][16]]] (4,4,1)
#Update – for 3 channels (debugging..) working only for p = sqrt(h)
import tensorflow as tf import numpy as np c = 3 h = 1024 p = 32 image = tf.random_normal([h,h,c]) patch_size = [1,p,p,1] patches = tf.extract_image_patches([image], patch_size, patch_size, [1, 1, 1, 1], 'VALID') patches = tf.reshape(patches, [h, p, p, c]) reconstructed = tf.reshape(patches, [1, h, h, c]) rec_new = tf.space_to_depth(reconstructed,p) rec_new = tf.reshape(rec_new,[h,h,c]) sess = tf.Session() I,P,R_n = sess.run([image,patches,rec_new]) print(I.shape) print(P.shape) print(R_n.shape) err = np.sum((R_n-I)**2) print(err)
Output :
(1024, 1024, 3) (1024, 32, 32, 3) (1024, 1024, 3) 0.0
#Update 2
Reconstructing from the output of extract_image_patches seems difficult. Used other functions to extract patches and reverse the process to reconstruct which seems easier.
import tensorflow as tf import numpy as np c = 3 h = 1024 p = 128 image = tf.random_normal([1,h,h,c]) # Image to Patches Conversion pad = [[0,0],[0,0]] patches = tf.space_to_batch_nd(image,[p,p],pad) patches = tf.split(patches,p*p,0) patches = tf.stack(patches,3) patches = tf.reshape(patches,[(h/p)**2,p,p,c]) # Do processing on patches # Using patches here to reconstruct patches_proc = tf.reshape(patches,[1,h/p,h/p,p*p,c]) patches_proc = tf.split(patches_proc,p*p,3) patches_proc = tf.stack(patches_proc,axis=0) patches_proc = tf.reshape(patches_proc,[p*p,h/p,h/p,c]) reconstructed = tf.batch_to_space_nd(patches_proc,[p, p],pad) sess = tf.Session() I,P,R_n = sess.run([image,patches,reconstructed]) print(I.shape) print(P.shape) print(R_n.shape) err = np.sum((R_n-I)**2) print(err)
Output:
(1, 1024, 1024, 3) (64, 128, 128, 3) (1, 1024, 1024, 3) 0.0
You could see other cool tensor transformation functions here: https://www.tensorflow.org/api_guides/python/array_ops