- Context: I work with satellite images that I filter to transform to arrays of 1s and 0s, based on the presence of snow (0 for snow, 1 for non-snow). My code creates an array of
NaNs
, searches for each snow pixel if at least one of the neighbor is non-snow (in a cross patter, cells painted red in the picture below), and inputs “1” in thenan
array. Once I do that for my entire matrix I end up with lines where a line cell = 1, rest are nans. - Problem: I end up with a matrix with several lines inside. What I count as a line is at least two cell equal to 1, in the direct neighborhoods. Meaning that for each line cell, if any of the 8 surrounding cells has a
1
inside, they are forming a line (figure below shows the boundary between snow (purple) and non-snow cells (yellow). - What I have: I wrote an algorithm that counts the amount of cells in a line and records its starting/ending cells (see figure below, amount of cells through which the red line passes) so I can filter my lines by size at the end.
- What I want: my code works but is extremely slow. I coded it the best way I could but O was wondering if there was a way to be more efficient ?
Ps: Sorry about the clanky explanation, it is hard for me to explain clearly. The code will show you how it works, and the figures generated should make it clearer.
Some code to generate a “lines” matrix:
JavaScript
x
182
182
1
import numpy as np
2
import rasterio as rio
3
import numpy as np
4
import glob
5
import matplotlib.pyplot as plt
6
import pandas as pd
7
8
# Create an array with a line of 1s (not a straight line)
9
array = np.array([[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan,
10
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
11
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan,
12
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
13
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan,
14
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
15
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan,
16
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
17
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan,
18
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
19
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan,
20
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
21
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., 1., np.nan, np.nan, np.nan,
22
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
23
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., 1., np.nan, np.nan, np.nan, np.nan, np.nan,
24
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
25
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, 1., np.nan, np.nan, np.nan,
26
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
27
[np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, 1., np.nan, 1., np.nan, np.nan,
28
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
29
[np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, 1., np.nan, 1., np.nan, np.nan,
30
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
31
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, 1., np.nan, np.nan, np.nan,
32
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
33
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,
34
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
35
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,
36
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
37
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, np.nan, 1., np.nan, np.nan,
38
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
39
[np.nan, np.nan, np.nan, np.nan, 1., 1., np.nan, np.nan, np.nan, 1., np.nan, 1., np.nan,
40
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
41
[np.nan, np.nan, np.nan, np.nan, np.nan, 1., 1., 1., 1., np.nan, 1., np.nan, np.nan,
42
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
43
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, np.nan,
44
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
45
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 1., np.nan, np.nan, np.nan,
46
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
47
[np.nan, np.nan, np.nan, np.nan, 1., 1., 1., 1., 1., np.nan, np.nan, np.nan, np.nan,
48
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
49
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,
50
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]])
51
array = array.reshape((1,array.shape[0],array.shape[1]))
52
53
# Create a dataframe that will store the start and endpoint of the lines, as well as their length
54
templines = pd.DataFrame(columns=['i','Rstart','Cstart','Rend','Cend','Length'])
55
# Create a 2nd dataframe that will be used to snip together the lines that end and start at the same point
56
lines = templines.copy()
57
58
# Create the straight and tilted cross search patterns
59
crossr = [-1, 0, 0, 1]
60
crossc = [0, -1, 1, 0]
61
xr = [-1, -1, 1, 1]
62
xc = [-1, 1, -1, 1]
63
64
# Constrain our search to an Area of Interest (AOI), useful when dealing with bigger arrays.
65
r1 = 0
66
r2 = array.shape[1]
67
c1 = 0
68
c2 = array.shape[2]
69
70
# Initialize our search indexes
71
i = 0
72
j = r1
73
k = c1
74
75
# Create an empty array which will be filled with 1s everytime a pixel is on the line (ensures our programe is replicating the lines' paths)
76
test = np.zeros((r2-r1, c2-c1))
77
78
# Start the search
79
while i < array.shape[0]:
80
j = r1
81
while j < r2:
82
k = c1
83
while k < c2:
84
85
# If we find a boundary pixel, start searching its neighbors' values
86
if array[i, j, k] == 1:
87
88
# We initialize a counter. If exactly 2 cells in the square centered on our cell are 1s, we are at the start of a line
89
counter = 0
90
for c in range(j-1, j+2):
91
for cc in range(k-1, k+2):
92
if array[i, c, cc] == 1:
93
counter +=1
94
95
# If the pixel is the start of a line (meaning that on a 3x3 cells matrix, 2 cells are equal to 1)
96
if counter == 2:
97
98
# Gather the indexes of the next line cell
99
# Vertical Cross search
100
for c in range(0, len(crossr)):
101
if array[i,j+crossr[c], k+crossc[c]] == 1:
102
jj = j+crossr[c]
103
kk = k+crossc[c]
104
test[jj-r1, kk-c1] = 1
105
break
106
107
# Tilted cross search
108
elif array[i,j+xr[c], k+xc[c]] == 1:
109
jj = j+xr[c]
110
kk = k+xc[c]
111
test[jj-r1, kk-c1] = 1
112
break
113
114
# Save the indexes of the previous cell in line
115
source = [j,k]
116
source = np.vstack((source,[jj,kk]))
117
118
linelength = 1
119
120
# When we find an extremity of a line, we "travel along" the line by finding the next neighboring cell
121
while jj > r1 and jj < r2:
122
while kk > c1 and kk < c2:
123
124
flag = 0
125
for c in range(0, len(crossr)):
126
127
# If we find a cell = 1, with different indexes than the previous searching cell
128
if array[i,jj+crossr[c], kk+crossc[c]] == 1 and [jj+crossr[c],kk+crossc[c]] not in source.tolist():
129
source = np.vstack((source,[jj,kk]))
130
jj = jj+crossr[c]
131
kk = kk+crossc[c]
132
test[jj-r1, kk-c1] = 1
133
linelength += 1
134
flag = 1
135
break
136
137
# Tilted cross search
138
elif array[i,jj+xr[c], kk+xc[c]] == 1 and [jj+xr[c],kk+xc[c]] not in source.tolist():
139
source = np.vstack((source,[jj,kk]))
140
jj = jj+xr[c]
141
kk = kk+xc[c]
142
test[jj-r1, kk-c1] = 1
143
linelength += 1
144
flag = 1
145
break
146
147
# If there is no other cell on the line, flag is 0 so we stop the search
148
149
# We gather the information of that line in our dataframe
150
if flag == 0:
151
print('End of Line')
152
templines.loc[len(templines.index)] = [i, j, k, jj, kk, linelength]
153
break
154
155
if flag == 0:
156
break
157
158
if flag == 0:
159
break
160
161
k += 1
162
j += 1
163
i += 1
164
165
166
167
# We snip together lines that are ending and starting at the same point
168
for m in range(len(templines)):
169
for n in range(m+1, len(templines)):
170
171
if templines.Rend[m] == templines.Rstart[n] and templines.Cend[m] == templines.Cstart[n]:
172
lines.loc[len(lines.index)] = [templines.i[m], templines.Rstart[m], templines.Cstart[m], templines.Rend[n],
173
templines.Cend[n], templines.Length[m]+templines.Length[n]]
174
175
176
# We check that the program "traveled" along the line
177
plt.figure()
178
plt.imshow(array[0,:,:])
179
plt.figure()
180
plt.imshow(test[:,:])
181
182
Advertisement
Answer
There’s an idea in image processing, which is find to a group of pixels which is contiguous, or a connected component. Once you break the image up into connected components, you can find the size of each component, and filter out small ones.
There’s a fast way of doing this in the scipy package, called scipy.ndimage.label
which you could apply like this:
JavaScript
1
35
35
1
import numpy as np
2
import matplotlib.pyplot as plt
3
import scipy.ndimage
4
5
# array = ...
6
7
# array is assumed to be 2D here
8
9
array[np.isnan(array)] = 0
10
11
# diagonally adjacent pixels count as the same feature
12
structure = [
13
[1,1,1],
14
[1,1,1],
15
[1,1,1]
16
]
17
array_labelled, num_features = scipy.ndimage.label(array, structure)
18
19
size_for_feature = {}
20
for feature_num in range(1, num_features + 1):
21
size_for_feature[feature_num] = (array_labelled == feature_num).sum()
22
23
# Filter out features which are too small
24
threshold = 10
25
size_for_feature = {
26
feature_num: size
27
for feature_num, size in size_for_feature.items()
28
if size > threshold
29
}
30
31
array_out = np.zeros_like(array)
32
for feature_num in size_for_feature:
33
array_out[array_labelled == feature_num] = 1
34
plt.imshow(array_out)
35
Output:
This solution is not exactly the thing you asked for – but it’s similar enough that it may work for your application.