4 of 9 for Introduction to Image Processing

Blob Detection in Action

What are blobs and why are they important?

Cyril Benedict Lugod
8 min readJun 12, 2023

In the field of image processing, blob detection and connected component analysis are fundamental techniques for identifying and analyzing distinct objects or regions within an image. In this blog post, we will explore three popular blob detection methods — Laplacian of Gaussian (LoG), Difference of Gaussian (DoG), and Determinant of Hessian (DoH) — as well as the utilization of the regionprops function from the scikit-image library for connected component analysis. I will provide code snippets and sample implementations to help you understand these techniques better.

How many candies are there? (Photo by Prof. Benjur Borja)

Blob Detection

What even is a blob in the first place?

Blobs are defined as bright objects against a dark background, or vice versa.

Going by that definition, bright objects on another bright background may not be considered blobs. As a result, the algorithm may have a hard time or poor performance when working with such images. Thus, it is imperative that we adjust the contrast between the objects and the background. One way we can achieve this is by performing binarization.

Binarization involves converting the RGB image to grayscale and performing thresholding such that a pixel will be pure black above a certain threshold and pure white below that threshold.

from skimage.io import imread, imshow
from skimage.color import rgb2gray

im = rgb2gray(imread('blobs.png'))
imshow(im);

We used the rgb2gray function of scikit-image to convert the RGB image to grayscale.

Gray-scaling still does not maximize the contrast between the candies and the background

In order to convert this grayscale image to a binary image, we need to set a threshold. For this exercise, let us assume a threshold of 0.5.

im_bw = (im < 0.5) * 1
im_mask = im < 0.5
imshow(im_bw, cmap='gray');

im_bw and im_mask are both analogous with each other, with the former composed of 0 and 1, while the latter with booleans (True or False). We will need both of them for the following implementation.

Now, those are definitely blobs!

Before we proceed any further, I wish to make it clear that we will not be discussing the complex math behind these three blob detection methods. We will be merely using predefined functions from scikit-image since our focus here is the application for image processing. You can definitely find a myriad of resources online if you want to learn more about the nitty-gritty of these three methods.

Laplacian of Gaussian (LoG)

This method involves convolving an image with the Laplacian of a Gaussian filter at multiple scales. The resulting image highlights regions with significant intensity variations, which correspond to blobs in the original image.

from skimage.feature import blob_log

blobs = blob_log(im_mask, max_sigma=30, num_sigma=10, threshold=0.1)

fig, ax = plt.subplots()
ax.imshow(im_bw, cmap='gray')
for blob in blobs:
y, x, area = blob
ax.add_patch(plt.Circle((x, y), area*np.sqrt(2), color='y', fill=False))
plt.show()
Laplacian of Gaussian (LoG)

This method seemed to over-segment our candies and split them into more blobs than intuitively reasonable. For elongated shapes, the tendency of this method is to simply segment them into much smaller blobs.

Difference of Gaussian (DoG)

This method involves computing the difference between two Gaussian-filtered versions of an image at different scales. This technique emphasizes regions with significant intensity changes, which are indicative of blobs.

from skimage.feature import blob_dog

blobs = blob_dog(im_mask, max_sigma=30, threshold=.1)

fig, ax = plt.subplots()
ax.imshow(im_bw, cmap='gray')
for blob in blobs:
y, x, area = blob
ax.add_patch(plt.Circle((x, y), area*np.sqrt(2), color='g', fill=False))
plt.show()
Difference of Gaussian (DoG)

Comparing this method (DoG) with the previous method (LoG) shows an evident improvement in blob detection performance. Large objects are being provided with fewer and bigger blobs instead of having them chopped up into several small blobs. However, as with the previous method, elongated objects still remain awkwardly subdivided into numerous blobs.

Determinant of Hessian (DoH)

Lastly, this method calculates the determinant of the Hessian matrix for each pixel in an image. The Hessian matrix describes the local curvature of intensity variations. Peaks in the determinant map represent potential blob locations.

from skimage.feature import blob_doh

blobs = blob_doh(im_mask, max_sigma=30, threshold=.01)

fig, ax = plt.subplots()
ax.imshow(im_bw, cmap='gray')
for blob in blobs:
y, x, area = blob
ax.add_patch(plt.Circle((x, y), area*np.sqrt(2), color='r', fill=False))
plt.show()
Determinant of Hessian (DoH)

While this method did not break the large objects down into tiny blobs, it seemed to use blobs that exceeded the effective area of the objects being detected. It handled elongated blobs better at the expense of less precise blob detection on the more regularly shaped objects in the image.

Comparison of the three blob detection methods

Comparing the performance of the three blob detection models shows that their approximation algorithms are different, and each blob detection model may be best for certain use cases but not advisable for others. However, while each of these techniques presented its strengths and weaknesses, all the blobs approximated were circular in nature. This is one of the greatest weaknesses of these techniques, considering there are no perfectly circular blobs in any practical application of blob detection.

Connected Components

An alternative method of blob detection is to use the concept of connected components. In the context of binary images, where each pixel is either foreground (an object) or background, connected components are formed by grouping adjacent foreground pixels together. Two pixels are considered adjacent if they share a common edge or corner. The connected components can vary in size, shape, and arrangement, representing individual objects or regions of interest.

The regionprops function from the scikit-image library provides a convenient way to extract properties of connected components.

For our sample implementation of regionprops, let us use a different image with a more practical and real-life application: counting red blood cells.

Counting red blood cells is typically manually performed by skilled laboratory technicians.

Since we want to detect blobs, the objects in the image must first meet the definition of what a blob is.

Blobs are defined as bright objects against a dark background, or vice versa.

Do you know what time it is? It’s time to binarize our image!

import skimage.io as io
import skimage.color as color
from skimage.filters import threshold_otsu

image = io.imread('rbc.jpeg')


gray_image = color.rgb2gray(image)
threshold = threshold_otsu(gray_image)
binary_image = gray_image < threshold
imshow(binary_image, cmap='gray');
Some cells seem to be overlapping with each other in this image

From the implementation using LoG, DoG, and DoH, you might remember our binarization threshold set to 0.5 just because. For this implementation, the heuristical approach to setting a binarization threshold is replaced by Otsu’s method.

Otsu’s method, also known as Otsu’s thresholding or Otsu’s algorithm, is a popular technique in image processing for automatic threshold selection. It aims to find the optimal threshold value that effectively separates foreground objects from the background in a grayscale image. By maximizing the inter-class variance and minimizing the intra-class variance, Otsu’s method identifies a threshold that achieves the best segmentation result. This approach eliminates the need for manual thresholding and provides an efficient way to segment images into foreground and background regions.

Since some cells appear to overlap with each other, using this image without further preprocessing will make our determination of any statistic about the cell’s perimeter, area, and other geometrical properties unreliable. Hence, we have to employ the scikit-image’s opening function, which we have discussed in earlier blogs.

from skimage.morphology import binary_opening, disk

# Apply morphological opening to separate connected cells
selem = disk(3)
opened_image = binary_opening(binary_image, selem)
imshow(opened_image);
The gaps between cells have relatively increased

While we cannot realistically separate all of the cells in the image without compromising the shape and size of certain cells, we can take a histogram plot of the area to get an idea of the distribution of the area of these preprocessed cells.

from skimage.measure import label, regionprops

labeled_components = label(opened_image)
component_props = regionprops(labeled_components)

# Count the number of cells
num_cells = len(component_props)
print(f"Number of cells: {num_cells}")

# Calculate the mean area of the cells
areas = [region.area for region in component_props]
mean_area = np.mean(areas)

# Plot a histogram of cell sizes
fig, ax = plt.subplots(figsize=(10, 5))
plt.hist(areas, bins=30)
plt.xlabel("Area")
plt.ylabel("Frequency")
plt.show()

Number of cells: 371

Image contains numerous noise objects which we can ignore by thresholding

From the histogram, a tall peak is visible towards the 0 end of the area. These are the noise points generated when we binarized the image. While we may collectively call them noise in the context of image processing, there are possibly components of the blood that are significantly smaller than the red blood cell (RBC), like platelets. Meanwhile, on the far right of the spectrum, we can possibly see RBCs that have not been separated by our image preprocessing techniques, such as the opening we performed previously. Ignoring these potential outliers, the mean of the distribution seems to lie around the 600–700 range.

Now that we have used regionprops to identify and segment the RBCs in the image, we can simply filter them based on area to identify any potentially irregular cells. For the following implementation, we will highlight RBCs that are 10%, 30%, and 90% larger than the mean area of the cells, respectively.

# Cells above this threshold considered irregular
thresholds = [1.1, 1.3, 1.9]

for t in thresholds:
highlighted_image = np.copy(image)
for region in component_props:
if region.area > t * mean_area:
for coord in region.coords:
highlighted_image[coord[0], coord[1]] = [255, 0, 0]

plt.imshow(highlighted_image)
plt.axis("off")
plt.show()
Highlighted cells are 10% larger than average
Highlighted cells are 50% larger than average
Highlighted cells are 90% larger than average

A major benefit of using regionprops over the three previous blob detection techniques is the multitude of properties that can be accessed through regionprops, such as perimeter, area, bounding box, and centroid, among many others. An exhaustive list is present in the documentation for regionprops, which can be found here.

Conclusion

Blob detection methods such as Laplacian of Gaussian (LoG), Difference of Gaussian (DoG), and Determinant of Hessian (DoH) provide valuable techniques for identifying objects in images, each with its own strengths and weaknesses. An even more robust blob detection technique, connected components (through functions like regionprops) allow for labeling and analysis of regions of interest.

Mastering blob detection and connected component techniques empowers us to extract meaningful information from images, enabling data-driven decisions across fields like biomedical research, industrial automation, agriculture, environmental monitoring, and surveillance. These techniques are invaluable for identifying and analyzing objects of interest, aiding in disease diagnosis, quality control, object tracking, anomaly detection, and more.

--

--

Cyril Benedict Lugod

Data Scientist | MS Data Science @ Asian Institute of Management