Which machine learning algorithm does your computer vision project need?

Published on: February 9, 2022

The recent advances in machine learning have given computer vision algorithms the ability to perform tasks with ever increasing speed and accuracy. But with a growing number of machine learning techniques, it can be hard to figure out which algorithm will deliver the best results. Will you go for a traditional machine learning algorithm or do you need a deep learning network to accomplish your computer vision task? Let’s have a look at the possibilities.

But first, the basics. What is machine learning, and what is the difference with deep learning? We could define it as follows:

  • Machine learning algorithms are able to observe, analyze and learn directly from data. They build a model based on sample data (or training data) in order to make predictions or decisions about other data they have never seen before.
  • Deep learning is a subset of machine learning which uses neural networks to make intelligent decisions. The design of artificial neural networks was inspired by functional principles of the human brain. This similarity often makes deep learning much more powerful than traditional machine learning models.

Let’s have a look at the pros and cons of both types of machine learning.

When to use traditional machine learning

In traditional machine learning, there usually is a manual (human) process of feature engineering. This is the process of transforming raw data into numerical features that can be processed while preserving the information in the original data set. Efficient feature engineering is not an easy task and might be even more important than the choice of the algorithm itself. The input data for traditional algorithms must be presented in a structured way, in tabular or vector representations.

Traditional machine learning makes sense when:

  • The problem is constrained and well-defined.
  • The variability in the data is low.
  • The amount of available data is limited.

Traditional algorithms can perform classification and regression tasks, but fail for more complicated ones, such as multiple object detection, object segmentation or object tracking in video. Most of the state-of-the-art traditional algorithms are implemented in the scikit-learn library, a free machine learning library for Python.

  • Classification algorithms classify data into distinct categorical classes, such as a quality (for example, in the left figure above we predict if a robot is hungry or not). Examples of classification algorithms are Logistic Regression, K-Nearest Neighbors, Support Vector Machines, Decision Trees and Random Forest.
  • Regression algorithms are used to predict continuous variables, such as a quantity (for example , in the right figure above we predict the amount of carrots a robot is able to eat). Examples of regression algorithms are Linear Regression, Support Vector Machines, Decision Trees and Random Forest.

As you might have noticed, some of the algorithms can perform both tasks, but require different training targets (“loss functions”). Spoiler alert: we have an article coming up on traditional machine learning algorithms for machine vision that will give you more details about these algorithms.

When to use deep learning

Modern machine learning algorithms have advanced significantly, benefiting from improvements in computing power, memory capacity, and optics performance. Deep learning is probably one of the most impressive examples. Compared to traditional computer vision algorithms, deep learning makes it possible to achieve greater accuracy in extremely complex tasks such as image classification, object detection, semantic segmentation, and Simultaneous Localization and Mapping (SLAM).

In contrast to traditional machine learning, deep learning does not need human intervention for feature engineering. Instead, deep learning algorithms learn all transformations by themselves by working directly with the source data (images and video). This is one of the reasons why deep learning is very popular in computer vision applications. New research and network architectures are being published continuously.

So, which deep learning algorithm do you need for your computer vision task? Let’s look at the possibilities for some of the most common industrial problems:

  • Image classification aims to predict the class of one object in an image and assign a corresponding class label to it. The number of classes depends on the problem, but is typically higher than two. The network predicts the probability of each class and assigns a label corresponding to the class with the highest probability. Usually, the ResNet(-based) architecture is a good choice for this task.
  • Object detection refers to detecting objects in the image and putting a bounding box around the detected object in combination with a class label. Several objects of different classes can be detected in one image. The task provides more information compared to image classification, because it provides spatial information about the location of the object. Currently, the best algorithms for this task are “You Only Look Once” (YOLO) and “Single Shot Detector” (SSD). Both names highlight the fact that the algorithm finds all objects within an image in one pass. This makes the execution of the model extremely fast, allowing users to deploy the model in real time. Multiple versions of YOLO with slightly different architectures are available: YOLO v1, v2, v3, v4 and v5.
  • Semantic segmentation refers to marking each pixel of the image with the label of the corresponding class. As a result, the user receives accurate information about the shape of the object and its location in the image. Usually, autoencoders, networks with symmetric downsampling and upsampling parts, are the types of neural networks employed for this task (e.g. SegNet). U-Net and its modifications are a special type of autoencoder with skipped connections that has replaced almost all other types of segmentation networks, thanks to its simplicity and efficiency.
  • Instance segmentation differs from semantic segmentation, as it assigns a different label for different object instances of the same category. In this case, you can not only segment the objects in the image, but also distinguish between instances by drawing a separate contour around each of them. Mask R-CNN is a straightforward and efficient instance segmentation approach.

Spoiler alert: we have an article coming up on best practices for employing a deep learning algorithm for industrial applications, which will give you more insights into how to achieve the best performance for your algorithm.

The choice is yours

There is such a wealth of machine learning algorithms out there, that choosing the right algorithm for your computer vision application can feel overwhelming. Especially in the area of deep learning, the industry has seen great improvements. But the question whether you need a traditional machine learning algorithm or a deep learning technique can only be answered based on the type of problem you want to solve and on the results you expect to achieve. In this article, we have briefly discussed the difference between traditional algorithms and deep learning, but if you’re still not sure which approach you need to follow, there’s definitely a Kapernikov expert who can help you decide.

It seems like you're really digging this article.

Subscribe to our newsletter and stay up to date.

    guy digging a hole


    Maksim Markov

    As a consultant, Maksim is now enjoying his work days with a wide range of machine vision and machine learning projects. The variety of work appeals to him: “Up to now, I ha ...