Skip to content
Advertisement

Represent a video as a 2D Array where each column represents a frame – OpenCV and Python

My goal is to transform a video into a 2D matrix X, where the column vectors represent a frame. So the matrix has the dimension: X.shape —> (# features of a frame, # total number of frames)

I need this form because I want to apply different ML algorithms on X. To get X I proceed as follows:

  1. upload the video in python with the OpenCV library and save all frames.

  2. Loop{

a) Frame (=3D array with dimensions height, width, depth=3 rbg) is converted into a 1D vector x 

b) Append vector x to Matrix X

}

For step 2 b) I use

video_matrix = np.column_stack((video_matrix, frame_vector))

This operation takes about 0.5s for a 640×320 frame. For a small video with 3min (8000 frames) the calculation of X takes almost 150 minutes. Is there a way to make it faster?

Code for the first part:

video = cv2.VideoCapture('path/video.mp4') 
if not os.path.exists('data'): 
    os.makedirs('data') 

counter = 0 
while(True):   
    # reading from frame 
    ret,frame = video.read() 
  
    if ret: 
        # if video is still left continue creating images 
        name = './data/frame' + str(counter) + '.jpg'
        #print ('Creating...' + name) 
        
        # writing the extracted images 
        cv2.imwrite(name, frame) 
  
        # increasing counter so that it will 
        # show how many frames are created 
        counter += 1
    else: 
        break
   
# Release all space and windows once done 
video.release() 
cv2.destroyAllWindows()

And the second part which is to slow

video_matrix = np.zeros(width * height * 3) # initialize 1D array which will become the 2D array; first column will be deleted at the end

for i in range(counter): # loops over the total amount of frames
    
    current_frame = np.asarray(Image.open('./data/frame'+str(i)+'.jpg')) # 3D-array = current frame
    frame_vector = image_to_vector(current_frame) #convert frame into a 1D array
    video_matrix = np.column_stack((video_matrix, frame_vector)) # append frame x to a matrix X that will represent the video
    
video_matrix = np.delete(video_matrix, 0, 1) # delete the initialized zero column

Advertisement

Answer

do not repeatedly append single frames to your accumulated data. that’ll cost you O(n^2), i.e. the program will run ever slower the more it has to read. numpy can’t enlarge arrays in-place. it has to create a copy every time. the copying effort increases with every additional frame.

append each frame to a python list. when you’re done reading the video, convert the whole list into a numpy array once.

Advertisement