Skip to content
Advertisement

Creating DataFrame of groups by pixel values in Python (number, size, etc.)

I have the following data (simple representation of black particles on a white filter):

data = [
    [0, 0, 0, 255, 255, 255, 0, 0],
    [0, 255, 0, 255, 255, 255, 0, 0],
    [0, 0, 0, 255, 255, 255, 0, 0, ],
    [0, 0, 0, 0, 255, 0, 0, 0],
    [0, 255, 255, 0, 0, 255, 0, 0],
    [0, 255, 0, 0, 0, 255, 0, 0],
    [0, 0, 0, 0, 0, 255, 0, 0],
    [0, 0, 0, 0, 0, 255, 0, 0]
]

And I have counted the number of particles (groups) and assigned them each a number using the following code:

arr = np.array(data)
groups, group_count = measure.label(arr > 0, return_num = True, connectivity = 1)
print('Groups: n', groups)

With the Output:

Groups: 
 [[0 0 0 1 1 1 0 0]
 [0 2 0 1 1 1 0 0]
 [0 0 0 1 1 1 0 0]
 [0 0 0 0 1 0 0 0]
 [0 3 3 0 0 4 0 0]
 [0 3 0 0 0 4 0 0]
 [0 0 0 0 0 4 0 0]
 [0 0 0 0 0 4 0 0]]

I then have four (4) particles (groups) of different sizes.

I am looking to create a DataFrame representing each particle. Like this:

DataFrame of particles (groups)

Any help is much appreciated!

Advertisement

Answer

There should be a more elegant approach, but here is what I have come up with:

import pandas as pd
customDict = {}
for group in groups:
  for value in group:
    if str(value) not in customDict:
      customDict[str(value)] = [0]
    customDict[str(value)][0] += 1
df = pd.DataFrame.from_dict(customDict, orient="index").reset_index()
df.rename(columns={"index": "particle #", 0: "size"}, inplace=True)
df.drop(0, inplace=True)
df

Output

particle # size
1 1 10
2 2 1
3 3 3
4 4 4
Advertisement