tflearn / tensorflow does not learn xor

Question

Following code was written to learn the XOR function, but about half of the time the network does not learn and the loss after each epoch stays the same. Sometimes I get correct results like this: But often this: My 2x2x1 network should be able to perform XOR, and there is even some evidence that suggests that this network should

Accepted Answer

I&#8217;ve decided to add another answer: I&#8217;ve done some more research and have some substantially different advice to provide.After skimming this paper, it dawned on me that the reason why you&#8217;re not seeing convergence might have to do with the initial weights.  The paper specifically references some work by Hirose et al (Hirose, Yamashita, and Hijiya 1991) that found that initialization with a limited range of weights results in a very low probability of convergence.  The &#8220;sweet spot&#8221; seemed to be a range between 0.5 and 1 on average to reliably converge.It turns out that tflearn will default to using truncated normal initialization with a stddev of 0.02.  So the weights have a very limited range.  I&#8217;ve found that I can get reasonably reliable results using random uniform initialization of -1.0 to 1.0.Also, incidentally it turns out that you&#8217;ve added a 3rd layer.  XOR requires only one hidden layer, so you can remove the second one.  Here&#8217;s the code that works for me:import tensorflow as tfimport tflearnX = [[0., 0.], [0., 1.], [1., 0.], [1., 1.]]Y_xor = [[0.], [1.], [1.], [0.]]# Graph definitionwith tf.Graph().as_default():    tnorm = tflearn.initializations.uniform(minval=-1.0, maxval=1.0)    net = tflearn.input_data(shape=[None, 2])    net = tflearn.fully_connected(net, 2, activation='sigmoid', weights_init=tnorm)    net = tflearn.fully_connected(net, 1, activation='sigmoid', weights_init=tnorm)    regressor = tflearn.regression(net, optimizer='sgd', learning_rate=2., loss='mean_square')    # Training    m = tflearn.DNN(regressor)    m.fit(X, Y_xor, n_epoch=10000, snapshot_epoch=False)     # Testing    print("Testing XOR operator")    print("0 xor 0:", m.predict([[0., 0.]]))    print("0 xor 1:", m.predict([[0., 1.]]))    print("1 xor 0:", m.predict([[1., 0.]]))    print("1 xor 1:", m.predict([[1., 1.]]))Note that I am using mean square error.  To my surprise, it seems to work best for this problem.  Cross-entropy seems to cause the optimizer to languish in relatively flat regions of the problem space.  I would have expected the opposite; maybe someone better versed in the mathematics will be able to better explain that.

Advertisement

Answer