Skip to content
Advertisement

tflearn / tensorflow does not learn xor

Following code was written to learn the XOR function, but about half of the time the network does not learn and the loss after each epoch stays the same.

train_f = [[0, 0], [0, 1], [1, 0], [1, 1]]
train_c = [[0], [1], [1], [0]]
test_f = train_f
test_c = train_c

import tensorflow as tf
import tflearn

X = [[0., 0.], [0., 1.], [1., 0.], [1., 1.]]
Y_xor = [[0.], [1.], [1.], [0.]]

# Graph definition
with tf.Graph().as_default():
    # Building a network with 2 optimizers
    net = tflearn.input_data(shape=[None, 2])
    # Nand operator definition
    net = tflearn.fully_connected(net, 2, activation='relu')
    net = tflearn.fully_connected(net, 2, activation='relu')
    net = tflearn.fully_connected(net, 1, activation='sigmoid')
    regressor = tflearn.regression(net, optimizer='adam', learning_rate=0.005, loss="mean_square",)

    # Training
    m = tflearn.DNN(regressor)
    m.fit(X, Y_xor, n_epoch=256, snapshot_epoch=False)

    # Testing
    print("Testing XOR operator")
    print("0 xor 0:", m.predict([[0., 0.]]))
    print("0 xor 1:", m.predict([[0., 1.]]))
    print("1 xor 0:", m.predict([[1., 0.]]))
    print("1 xor 1:", m.predict([[1., 1.]]))

Sometimes I get correct results like this:

Testing XOR operator
0 xor 0: [[0.1487255096435547]]
0 xor 1: [[0.9297153949737549]]
1 xor 0: [[0.9354135394096375]]
1 xor 1: [[0.1487255096435547]]

But often this:

Testing XOR operator
0 xor 0: [[0.4999997615814209]]
0 xor 1: [[0.5000002384185791]]
1 xor 0: [[0.4999997615814209]]
1 xor 1: [[0.5000001788139343]]

My 2x2x1 network should be able to perform XOR, and there is even some evidence that suggests that this network should always converge http://www.ncbi.nlm.nih.gov/pubmed/12662805

I have also tried to change the relu layers to sigmoid, to perform 2048 iterations, and to make a 4x4x1 and 6x6x1 networks, but the same problem still occurs sometimes.

Could there be something wrong with how the weights are initialized? How do I use tflearn to have a neural net learn the xor function?

Advertisement

Answer

I’ve decided to add another answer: I’ve done some more research and have some substantially different advice to provide.

After skimming this paper, it dawned on me that the reason why you’re not seeing convergence might have to do with the initial weights. The paper specifically references some work by Hirose et al (Hirose, Yamashita, and Hijiya 1991) that found that initialization with a limited range of weights results in a very low probability of convergence. The “sweet spot” seemed to be a range between 0.5 and 1 on average to reliably converge.

It turns out that tflearn will default to using truncated normal initialization with a stddev of 0.02. So the weights have a very limited range. I’ve found that I can get reasonably reliable results using random uniform initialization of -1.0 to 1.0.

Also, incidentally it turns out that you’ve added a 3rd layer. XOR requires only one hidden layer, so you can remove the second one. Here’s the code that works for me:

import tensorflow as tf
import tflearn

X = [[0., 0.], [0., 1.], [1., 0.], [1., 1.]]
Y_xor = [[0.], [1.], [1.], [0.]]

# Graph definition
with tf.Graph().as_default():
    tnorm = tflearn.initializations.uniform(minval=-1.0, maxval=1.0)
    net = tflearn.input_data(shape=[None, 2])
    net = tflearn.fully_connected(net, 2, activation='sigmoid', weights_init=tnorm)
    net = tflearn.fully_connected(net, 1, activation='sigmoid', weights_init=tnorm)
    regressor = tflearn.regression(net, optimizer='sgd', learning_rate=2., loss='mean_square')

    # Training
    m = tflearn.DNN(regressor)
    m.fit(X, Y_xor, n_epoch=10000, snapshot_epoch=False) 

    # Testing
    print("Testing XOR operator")
    print("0 xor 0:", m.predict([[0., 0.]]))
    print("0 xor 1:", m.predict([[0., 1.]]))
    print("1 xor 0:", m.predict([[1., 0.]]))
    print("1 xor 1:", m.predict([[1., 1.]]))

Note that I am using mean square error. To my surprise, it seems to work best for this problem. Cross-entropy seems to cause the optimizer to languish in relatively flat regions of the problem space. I would have expected the opposite; maybe someone better versed in the mathematics will be able to better explain that.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement