I have been trying to build a simple neural network myself (3 layers) to predict the MNIST dataset. I referenced some codes online and wrote some parts my own, the code runs without any errors, but something is wrong with the learning process. It seems like the prediction result is all “random”. Applying the learning process to the network and use the network to predict the same image always gives me different results every time. Could someone please give me some hints where I did wrong?
import pandas as pd import numpy as np from PIL import Image import os np.set_printoptions(formatter={'float_kind':'{:f}'.format}) def init_setup(): #three layers perception w1=np.random.randn(10,784)-0.8 b1=np.random.rand(10,1)-0.8 #second layer w2=np.random.randn(10,10)-0.8 b2=np.random.randn(10,1)-0.8 #third layer w3=np.random.randn(10,10)-0.8 b3=np.random.randn(10,1)-0.8 return w1,b1,w2,b2,w3,b3 def activate(A): # use ReLU function as the activation function Z=np.maximum(0,A) return Z def softmax(Z): return np.exp(Z)/np.sum(np.exp(Z)) def forward_propagation(A,w1,b1,w2,b2,w3,b3): # input A :(784,1)-> A1: (10,1) ->A2: (10,1) -> prob: (10,1) z1=w1@A+b1 A1=activate(z1) z2=w2@A1+b2 A2=activate(z2) z3=w3@A2+b3 prob=softmax(z3) return z1,A1,z2,A2,z3,prob def one_hot(Y:np.ndarray)->np.ndarray: one_hot=np.zeros((10, 1)).astype(int) one_hot[Y]=1 return one_hot def back_propagation(A,z1,A1:np.ndarray,z2,A2:np.ndarray,z3,prob,w1,w2:np.ndarray,w3,Y:np.ndarray,lr:float): m=1/Y.size dz3=prob-Y # print('loss ', np.sum(dz3)) dw3=m*dz3@A2.T db3= dz3 dz2=ReLU_deriv(z2)*w3.T@dz3 dw2 = dz2@A1.T db2 = dz2 dz1=ReLU_deriv(z1)*w2.T@dz2 dw1 = dz1@A.T db1 = dz1 return db1,dw1,dw2,db2,dw3,db3 def ReLU_deriv(Z): Z[Z>0]=1 Z[Z<=0]=0 return Z def step(lr,w1,b1,w2,b2,w3,b3,dw1,db1,dw2,db2,dw3,db3): w1 = w1 - lr * dw1 b1 = b1 - lr * db1 w2 = w2 - lr * dw2 b2 = b2 - lr * db2 w3 = w3 - lr * dw3 b3 = b3 - lr * db3 return w1,b1,w2,b2,w3,b3
Put functions together
def learn(): lr=0.00002 w1,b1,w2,b2,w3,b3=init_setup() # read the data from a csv file df=pd.read_csv('data.csv') # Shuffle the data df = df.sample(frac=1).reset_index(drop=True) for epoch in range(0,5): lr=lr/10 for _,row in df.iterrows(): A=row.values[1:] A=A.reshape(784,1) Y=int(row.values[0]) Y=one_hot(Y) z1,A1,z2,A2,z3,prob=forward_propagation(A,w1,b1,w2,b2,w3,b3) db1,dw1,dw2,db2,dw3,db3=back_propagation(A,z1,A1,z2,A2,z3,prob,w1,w2,w3,Y,lr) w1,b1,w2,b2,w3,b3=step(lr,w1,b1,w2,b2,w3,b3,dw1,db1,dw2,db2,dw3,db3) return w1,b1,w2,b2,w3,b3 optimize_params=learn() w1,b1,w2,b2,w3,b3=optimize_params img=Image.open(r'C:UsersDesktopMNIST - JPG - training216.jpg') A=np.asarray(img) A=A.reshape(-1,1) z1,A1,z2,A2,z3,prob=forward_propagation(A,w1,b1,w2,b2,w3,b3) print(prob) print(np.argmax(prob))
Running the code three times the results are like
>>>[[0.020815] >>>[[0.025916] >>>[[0.161880] [0.019490] [0.031197] [0.104364] [0.113170] [0.006868] [0.093192] [0.051033] [0.426709] [0.041726] [0.107867] [0.043123] [0.062953] [0.009533] [0.001528] [0.324685] [0.148977] [0.080894] [0.102557] [0.333544] [0.273520] [0.043415] [0.147408] [0.049245] [0.009269] [0.048163]] [0.060999]] [0.055960]] >>>7 >>>3 >>>5
Running the same code three times, I’m having three largely different results. I know there are randomness in neural netowrk, but isn’t the parameters about the same after learning process? Could anyone please give me some hints or suggestion where I did wrong in the learning process or what causes the randomness in the result?
Advertisement
Answer
Given the code itself is correct, I would increase the learning rate and increase the number of epochs. You even decrease the learning rate every epoch (lr=lr/10)
. Feels like the model doesn’t have the time to converge (to actually learn). For starters, I would fix the learning rate at 0.001 and increase the number of epochs to maybe 25? If your results get better you can start fiddling around.