- Context: I work with satellite images that I filter to transform to arrays of 1s and 0s, based on the presence of snow (0 for snow, 1 for non-snow). My code creates an array of
NaNs
, searches for each snow pixel if at least one of the neighbor is non-snow (in a cross patter, cells painted red in the picture below), and inputs “1” in thenan
array. Once I do that for my entire matrix I end up with lines where a line cell = 1, rest are nans. - Problem: I end up with a matrix with several lines inside. What I count as a line is at least two cell equal to 1, in the direct neighborhoods. Meaning that for each line cell, if any of the 8 surrounding cells has a
1
inside, they are forming a line (figure below shows the boundary between snow (purple) and non-snow cells (yellow). - What I have: I wrote an algorithm that counts the amount of cells in a line and records its starting/ending cells (see figure below, amount of cells through which the red line passes) so I can filter my lines by size at the end.
- What I want: my code works but is extremely slow. I coded it the best way I could but O was wondering if there was a way to be more efficient ?
Ps: Sorry about the clanky explanation, it is hard for me to explain clearly. The code will show you how it works, and the figures generated should make it clearer.
Some code to generate a “lines” matrix:
import numpy as np import rasterio as rio import numpy as np import glob import matplotlib.pyplot as plt import pandas as pd # Create an array with a line of 1s (not a straight line) array = np.array([[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., 1., np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., 1., np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, 1., np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], [np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, 1., np.nan, 1., np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], [np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, 1., np.nan, 1., np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, 1., np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, np.nan, 1., np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], [np.nan, np.nan, np.nan, np.nan, 1., 1., np.nan, np.nan, np.nan, 1., np.nan, 1., np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], [np.nan, np.nan, np.nan, np.nan, np.nan, 1., 1., 1., 1., np.nan, 1., np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], [np.nan, np.nan, np.nan, np.nan, 1., 1., 1., 1., 1., np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]]) array = array.reshape((1,array.shape[0],array.shape[1])) # Create a dataframe that will store the start and endpoint of the lines, as well as their length templines = pd.DataFrame(columns=['i','Rstart','Cstart','Rend','Cend','Length']) # Create a 2nd dataframe that will be used to snip together the lines that end and start at the same point lines = templines.copy() # Create the straight and tilted cross search patterns crossr = [-1, 0, 0, 1] crossc = [0, -1, 1, 0] xr = [-1, -1, 1, 1] xc = [-1, 1, -1, 1] # Constrain our search to an Area of Interest (AOI), useful when dealing with bigger arrays. r1 = 0 r2 = array.shape[1] c1 = 0 c2 = array.shape[2] # Initialize our search indexes i = 0 j = r1 k = c1 # Create an empty array which will be filled with 1s everytime a pixel is on the line (ensures our programe is replicating the lines' paths) test = np.zeros((r2-r1, c2-c1)) # Start the search while i < array.shape[0]: j = r1 while j < r2: k = c1 while k < c2: # If we find a boundary pixel, start searching its neighbors' values if array[i, j, k] == 1: # We initialize a counter. If exactly 2 cells in the square centered on our cell are 1s, we are at the start of a line counter = 0 for c in range(j-1, j+2): for cc in range(k-1, k+2): if array[i, c, cc] == 1: counter +=1 # If the pixel is the start of a line (meaning that on a 3x3 cells matrix, 2 cells are equal to 1) if counter == 2: # Gather the indexes of the next line cell # Vertical Cross search for c in range(0, len(crossr)): if array[i,j+crossr[c], k+crossc[c]] == 1: jj = j+crossr[c] kk = k+crossc[c] test[jj-r1, kk-c1] = 1 break # Tilted cross search elif array[i,j+xr[c], k+xc[c]] == 1: jj = j+xr[c] kk = k+xc[c] test[jj-r1, kk-c1] = 1 break # Save the indexes of the previous cell in line source = [j,k] source = np.vstack((source,[jj,kk])) linelength = 1 # When we find an extremity of a line, we "travel along" the line by finding the next neighboring cell while jj > r1 and jj < r2: while kk > c1 and kk < c2: flag = 0 for c in range(0, len(crossr)): # If we find a cell = 1, with different indexes than the previous searching cell if array[i,jj+crossr[c], kk+crossc[c]] == 1 and [jj+crossr[c],kk+crossc[c]] not in source.tolist(): source = np.vstack((source,[jj,kk])) jj = jj+crossr[c] kk = kk+crossc[c] test[jj-r1, kk-c1] = 1 linelength += 1 flag = 1 break # Tilted cross search elif array[i,jj+xr[c], kk+xc[c]] == 1 and [jj+xr[c],kk+xc[c]] not in source.tolist(): source = np.vstack((source,[jj,kk])) jj = jj+xr[c] kk = kk+xc[c] test[jj-r1, kk-c1] = 1 linelength += 1 flag = 1 break # If there is no other cell on the line, flag is 0 so we stop the search # We gather the information of that line in our dataframe if flag == 0: print('End of Line') templines.loc[len(templines.index)] = [i, j, k, jj, kk, linelength] break if flag == 0: break if flag == 0: break k += 1 j += 1 i += 1 # We snip together lines that are ending and starting at the same point for m in range(len(templines)): for n in range(m+1, len(templines)): if templines.Rend[m] == templines.Rstart[n] and templines.Cend[m] == templines.Cstart[n]: lines.loc[len(lines.index)] = [templines.i[m], templines.Rstart[m], templines.Cstart[m], templines.Rend[n], templines.Cend[n], templines.Length[m]+templines.Length[n]] # We check that the program "traveled" along the line plt.figure() plt.imshow(array[0,:,:]) plt.figure() plt.imshow(test[:,:])
Advertisement
Answer
There’s an idea in image processing, which is find to a group of pixels which is contiguous, or a connected component. Once you break the image up into connected components, you can find the size of each component, and filter out small ones.
There’s a fast way of doing this in the scipy package, called scipy.ndimage.label
which you could apply like this:
import numpy as np import matplotlib.pyplot as plt import scipy.ndimage # array = ... # array is assumed to be 2D here array[np.isnan(array)] = 0 # diagonally adjacent pixels count as the same feature structure = [ [1,1,1], [1,1,1], [1,1,1] ] array_labelled, num_features = scipy.ndimage.label(array, structure) size_for_feature = {} for feature_num in range(1, num_features + 1): size_for_feature[feature_num] = (array_labelled == feature_num).sum() # Filter out features which are too small threshold = 10 size_for_feature = { feature_num: size for feature_num, size in size_for_feature.items() if size > threshold } array_out = np.zeros_like(array) for feature_num in size_for_feature: array_out[array_labelled == feature_num] = 1 plt.imshow(array_out)
Output:
This solution is not exactly the thing you asked for – but it’s similar enough that it may work for your application.