7 of 9 for Introduction to Image Processing

Image Processing in Machine Learning

Machine Learning in Action

8 min readJun 18, 2023

Finally, we will see how image processing complements machine learning!

In this blog post, we will classify photos of leaves using machine learning (ML) techniques, both traditional ML and deep learning (DL) techniques.

There are 5 types of leaves which we will label as Plants A, B, C, D, and E.

Traditional Machine Learning

Traditional ML techniques will need the following steps:

Read and clean the images
Segment objects of interest
Extract features from the objects
Train an ML model

Reading and Cleaning

The first step simply involves reading the images from the folders where the images are placed and cleaning them such that they are properly binarized which is important for the following steps.

This will be done for all leaves across all five classes but for demonstration purposes, I will show how it was done for one set of leaf images belonging to Plant A.

image_raw = io.imread('leaves/plantA_1.jpg')
fig, ax = plt.subplots()
ax.imshow(image_raw,cmap='gray')
fig.show()

Binarization is straightforward using vanilla thresholding.

gray_leaves = rgb2gray(image_raw[:,:,:3])
binary_leaves = util.invert(gray_leaves > 0.5)
plt.figure()
plt.imshow(binary_leaves, cmap='gray')
plt.axis('off')
plt.show()

Segmentation

Segmentation is performed using the regionprops function which has already been discussed in prior blog posts.

label_leaves = label(binary_leaves)
plt.figure()
plt.imshow(label_leaves);

Segmented image returns individual leaves

These regionprops properties will be used to differentiate the leaves for the ML model.

Area
Perimeter
Eccentricity
Solidity
Extent

Area represents the total surface area occupied by a leaf. Leaves with different shapes and sizes will have varying areas. For example, round leaves tend to have larger areas compared to oblong or serrated leaves, which may have smaller or more irregular areas.

Perimeter refers to the length of the outer boundary of a leaf. Leaves with different shapes will have different perimeters. For instance, round leaves typically have shorter perimeters compared to oblong or serrated leaves, which may have longer and more intricate perimeters.

Eccentricity measures how elongated or elongation ratio of an object is. In the context of leaf classification, it can help differentiate between oblong and round-shaped leaves. Leaves with higher eccentricity values will tend to be more oblong, while leaves with lower eccentricity values will appear more round.

Solidity describes the compactness of a leaf shape, representing the ratio of leaf area to the convex hull area. Convex hull is the smallest convex polygon that completely encloses the leaf shape. Leaves with different shapes, such as serrated or round, will have varying solidity values. Serrated leaves may have lower solidity due to their irregularities, while round leaves will typically have higher solidity.

Extent is the ratio of the leaf area to the total bounding box area. It represents how much of the bounding box is occupied by the leaf. Different leaf shapes will have distinct extent values. For example, larger leaves may have higher extent values compared to smaller leaves, indicating that they occupy a larger portion of the bounding box.

Object Feature Extraction

Features for the ML model will be derived from the five regionprops properties mentioned earlier.

def get_class(fpath):
    '''
    Extracts the class of the leaves from the filepath.
    '''
    return fpath.split('/')[1].split('.')[0].split('_')[0]

leaves_data = []

folder_path = 'leaves'

for filename in tqdm(os.listdir(folder_path)):
    file_path = os.path.join(folder_path, filename)
    
    if os.path.isfile(file_path):
        
        image_raw = io.imread(file_path)
        gray_leaves = rgb2gray(image_raw[:,:,:3])
        binary_leaves = util.invert(gray_leaves > 0.5)
        
        label_leaves = label(binary_leaves)
        
        raw_props = regionprops(label_leaves)[1:] # remove the background class
        clean_props = [prop for prop in raw_props if prop.area > 1000] # just the leaves, remove specks
        
        for prop in clean_props:
            
            leaves_data.append({'area': prop.area,
                                'perim': prop.perimeter,
                                'ecc': prop.eccentricity,
                                'solid': prop.solidity,
                                'extent': prop.extent,
                                'label': get_class(file_path)
                               })
            
df_leaves = pd.DataFrame(data=leaves_data)
display(df_leaves)

Pandas DataFrame showing the features and target for the ML model

Training the ML model

Since the counts for each of the five classes are fairly balanced, the model can already be trained from this data without the need for resampling.

We will explore 3 different traditional ML techniques representing similarity-based learning (kNN), error-based learning (Logistic Regression), and information-based learning (Gradient Boosting Method).

The target accuracy must be about 1.25 x PCC. The Proportional Chance Criterion (PCC) is the chance of objects to be classified correctly by chance alone. In our case, the PCC is around 0.20 and the 1.25 x PCC is 0.25.

X = df_leaves.drop(['label'], axis=1)
y = df_leaves['label']

X_trainval, X_hold, y_trainval, y_hold = train_test_split(X, y, test_size=0.1, random_state=69)

The trainval (train + validation) and test split was set at 0.9–0.1.

First, we can try using kNN on these features.

pipeline = Pipeline(steps=[('scl', StandardScaler()),
                                ('model', KNeighborsClassifier())])


param_grid = {'model__n_neighbors': list(range(5, 31, 5)),
             }

scoring = 'accuracy'
cv = 3

grid_search = GridSearchCV(estimator=pipeline,
                            param_grid=param_grid,
                            scoring=scoring,
                            cv=cv,
                            n_jobs=-1,
                            verbose=1,
                            return_train_score=True)

grid_search.fit(X_trainval, y_trainval)
# grid_search.fit(X_trainval_res, y_trainval_res)

val_acc = grid_search.best_score_
train_acc = grid_search.cv_results_[
    'mean_train_score'][grid_search.best_index_]
hold_acc = grid_search.score(X_hold, y_hold)

print(f'\nKNN Classifier\n\nTrain score: {train_acc:.3f}\nVal score: {val_acc:.3f}\n\nTest score: {hold_acc:.3f}')

We can also try using Logistic Regression.

pipeline = Pipeline(steps=[('scl', StandardScaler()),
                                ('model', LogisticRegression())])


param_grid = {'model__C': [0.1, 1, 5, 10, 100, 1000],
              'model__penalty': ['l2'],
              'model__solver': ['liblinear'],
              'model__random_state': [69]
             }

scoring = 'accuracy'
cv = 3

grid_search = GridSearchCV(estimator=pipeline,
                            param_grid=param_grid,
                            scoring=scoring,
                            cv=cv,
                            n_jobs=-1,
                            verbose=1,
                            return_train_score=True)

grid_search.fit(X_trainval, y_trainval)
# grid_search.fit(X_trainval_res, y_trainval_res)

val_acc = grid_search.best_score_
train_acc = grid_search.cv_results_[
    'mean_train_score'][grid_search.best_index_]
hold_acc = grid_search.score(X_hold, y_hold)

print(f'\nLogistic Regression\n\nTrain score: {train_acc:.3f}\nVal score: {val_acc:.3f}\n\nTest score: {hold_acc:.3f}')

Lastly, we can try an ensemble tree-based model (Gradient Boosting Method).

pipeline = Pipeline(steps=[('scl', StandardScaler()),
                                ('model', GradientBoostingClassifier())])


param_grid = {'model__learning_rate': [0.001],
              'model__max_features': [3, 4, 5],
              'model__max_depth': [10, 20],
              'model__random_state': [69]
             }

scoring = 'accuracy'
cv = 3

grid_search = GridSearchCV(estimator=pipeline,
                            param_grid=param_grid,
                            scoring=scoring,
                            cv=cv,
                            n_jobs=-1,
                            verbose=1,
                            return_train_score=True)

grid_search.fit(X_trainval, y_trainval)
# grid_search.fit(X_trainval_res, y_trainval_res)

val_acc = grid_search.best_score_
train_acc = grid_search.cv_results_[
    'mean_train_score'][grid_search.best_index_]
hold_acc = grid_search.score(X_hold, y_hold)

print(f'GBM\n\nTrain score: {train_acc:.3f}\nVal score: {val_acc:.3f}\n\nTest score: {hold_acc:.3f}')

Gradient Boosting classification results

All of the three traditional ML techniques were able to beat the 1.25 x PCC. GBM had the highest accuracy while Logistic Regression and kNN followed.

While traditional ML may have sufficed, machine learning for images are already typically done using deep learning techniques such as convolutional neural networks (CNNs).

Deep Learning

While traditional ML techniques needed 4 different steps for classification, there is only one step required for deep learning techniques:

Train an ML model

Yes, there is no need to binarize and segment the leaf images. You can simply feed it into our deep learning pipeline.

For this case, we will perform feature extraction on a pretrained VGG-19 model. We skipped over the creation of the datasets and dataloaders but that is already part of the deep learning realm and may be above and beyond the scope of the image processing perspective for this topic.

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load train, validation, and test datasets
train_data = image_datasets['train']
val_data = image_datasets['validation']
test_data = image_datasets['test']

# Define data loaders
train_loader = dataloaders['train']
valid_loader = dataloaders['validation']
test_loader = dataloaders['test']

# Load the pretrained VGG19 model
model = models.vgg19(pretrained=True)

# Freeze the parameters of the pretrained layers
for param in model.parameters():
    param.requires_grad = False

# Modify the last fully connected layer to match the number of classes
num_classes = len(class_names)
model.classifier[6] = nn.Linear(4096, num_classes)

# Move the model to the appropriate device
model = model.to(device)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train the model
num_epochs = 50
best_valid_loss = float('inf')

for epoch in range(num_epochs):
    train_loss = 0.0
    valid_loss = 0.0
    
    # Training
    model.train()
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        
        optimizer.zero_grad()
        
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        loss.backward()
        optimizer.step()
        
        train_loss += loss.item() * images.size(0)
    
    # Validation
    model.eval()
    with torch.no_grad():
        for images, labels in valid_loader:
            images, labels = images.to(device), labels.to(device)
            
            outputs = model(images)
            loss = criterion(outputs, labels)
            
            valid_loss += loss.item() * images.size(0)
    
    train_loss = train_loss / len(train_loader.dataset)
    valid_loss = valid_loss / len(valid_loader.dataset)
    
    print(f"Epoch: {epoch+1}/{num_epochs}, Train Loss: {train_loss:.4f}, Valid Loss: {valid_loss:.4f}")
    
    # Save the best model based on validation loss
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'best_leaves_model_vgg19.pt')

# Test the model
model.load_state_dict(torch.load('best_leaves_model_vgg19.pt'))
model.eval()
correct = 0
total = 0

with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = 100 * correct / total
print(f"Test Accuracy: {accuracy:.2f}%")

After training the VGG-19 for 50 epochs, it is evident that this model gave us the highest accuracy after comparing with the traditional ML techniques discussed earlier.

Looking at the test set, misidentified leaves were between plant C and E. There two leaves are almost similar in shape based on their binarized representations. This is possibly one pitfall of my current implementation. I have used the segmentation from the regionprops earlier for my training and test data instead of raw cropped images of leaves. For cropping the images of leaves, other pretrained models may be used. If we are to work with the raw images (not binarized), we might be able to even classify all the photos of the leaves correctly since information about the texture and the vein structures will still be present and the model can learn about those during training.

Conclusion

In our class, we learned about the application of image processing in machine learning, specifically in the classification of leaf photos. We explored both traditional machine learning (ML) techniques and deep learning techniques to classify five different types of leaves. The traditional ML approach involved several steps, including reading and cleaning the images, segmenting the objects of interest (leaves), extracting features such as area, perimeter, eccentricity, solidity, and extent, and training ML models like kNN, logistic regression, and gradient boosting. These models achieved accuracies higher than the Proportional Chance Criterion (PCC), which is a baseline measure for random predictions proportional to the class distribution.

However, we realized that deep learning techniques, such as using a pretrained VGG-19 model, achieved even higher accuracy without the need for explicit image processing steps like binarization and segmentation. We understood that working with raw images could capture more information about leaf texture and vein structures, leading to potentially better classification performance.

Image processing is vital in machine learning as it prepares and enhances raw image data, extracts relevant features, leverages domain-specific knowledge, and enables the use of powerful deep learning techniques for accurate classification. It plays a crucial role in transforming visual information into meaningful numerical representations and unlocks insights in diverse domains.