i’m training a alexnet with footage from playing an emulated nes game (f1 racer), to further on let it play the game by itself.
now while i’m capturing the training data, the background of the game is changing heavily when it comes to gray pixel values (like light yellow to black for the same areas). is there a function (cv2 perhaps?) or an algorithm who let me compare the pictures when it comes to pixel values (in specific regions if possible)?
maybe im completely wrong and that does help the net a bit to overfit less, some hints would be great on that, as said i’m not even sure if that is real noise – i’ll have to test. so far i only convert them to gray, resize them to 160*120 and balance out the amout of frames with the desired outputs (mostly going forward).
tf board showed me that after 220/1700 training steps the net stopped gaining accuracy (~75%), and also loss stopped to decrease.
picture examples:
Advertisement
Answer
i’m processing the image now like following:
screen = grab_screen(region=(100, 100, 348, 324)) processed_img = cv2.cvtColor(screen, cv2.COLOR_BGR2GRAY) processed_img = cv2.Canny(processed_img, threshold1=200, threshold2=300) kernel = np.ones((2, 2), np.uint8) processed_img = cv2.dilate(processed_img, kernel,iterations = 1) processed_img = processed_img[120:248, :] processed_img = cv2.resize(processed_img, (160, 60))
this gives me a pretty good result already.
original picture (from stream):
old image processing (only rgb2gray):
new image after processing:
training result: orange line … training with the old images blue line … training with the newly processed images