Blog

Machine learning

How to Know Which Machine Learning Algorithms to Use: Techniques in Machine Learning

Yura Velichko

5 May 2022

8 min

How to Know Which Machine Learning Algorithms to Use: Techniques in Machine Learning

01 What Is Supervised Learning?

02 What Is Unsupervised Learning?

03 What Is Reinforcement Learning Theory?

04 Choosing the Right Approach

Machine learning is in driverless vehicles, weather forecasts, medical research, and voice recognition — and it’s all really complex. This article will break machine learning algorithms into three main branches — from models that require full human control to those that don’t need us at all (well, almost) — and explain the main rules governing them. Let’s start — so you could figure out what technique is right for your project.

What Is Supervised Learning?

Supervised machine learning definition is that it’s a machine learning technique that uses labeled data to train models. Labeled data means that output is already known to you. Everything the model needs to do is connect the inputs to the outputs. The most used algorithms of this type are regressions — linear and logistic — and:

decision trees;
Naive Bayes;
SVM;
random forest;
neural networks.

A lot of predictive modeling techniques in machine learning are also supervised.

How Does Supervised Learning Work?

How to Know Which Machine Learning Algorithms to Use: Techniques in Machine Learning - photo 1

Model training is the chief process in all supervised machine learning methods. During its training phase, labeled datasets enter the system. They help the system connect the output and input values. After that, test data enters the algorithm. It’s labeled, but the labels are unknown to the algorithm. Test data helps measure the accuracy of the algorithm. If your model can — if we’re going by the picture above — distinguish squares from triangles on the test data set, you can move on: your model makes accurate predictions.

Training data must be cleaned and balanced before it’s presented to the model. Duplicates and low-quality data that doesn’t fit predefined labels will alter the algorithm, and model accuracy will drop as well. Low-quality data often causes a model to fail to detect the relationships between the input and output variables; it’s called underfitting. High accuracy on the training set, on the other hand, is not always a positive indicator — often, it’s a sign of overfitting. It’s when the algorithm sticks to the features and data you’ve fed it so much that it starts looking for its exact copies in the test data sets, failing to generalize and recognize patterns.

A supervised machine learning approach is applied to build regression and classification algorithms.

Regression-based models are meant to figure out numerical relationships and connections between the output and input data. For instance, based on the square footage of houses and zip codes, regression models can forecast changes within real estate prices based on historical data connected to similar houses. Regression algorithms could be used to analyze the demand for a product, expected sales volume, and so on. It’s perfect for any tasks with the time (re: historical data) involved.

Classification aims to map inputs into a given number of classes or categories — so, instead of numbers, we’re predicting a category. It classifies input data based on the labeled data. This type of algorithm can be used for categorizing customer feedback as negative or positive and filtering email into spam. Classification is also used by banks when they decide whether or not to give customers credit — they classify “good” and “bad” cases within their credit history and weigh them out — that’s the simplistic breakdown of a decision tree algorithm that’s also in a classification segment of supervised machine learning.

Advantages of Supervised Learning

Here are some of the advantages of using supervised learning:

Great for forecasting based on historical data;
Solves computation challenges in statistics & research;
You provide training data — you have the most control over the training process.

Examples of Supervised Machine Learning

Here are some of the most popular use cases of this machine learning technique:

Spam Filtering. Recognizes junk mail — or emails with suspicious links, fishing letters that are designed like letters from credible companies (e.g. Paypal, LinkedIn) and contain malware or links to dummy login pages — and sends it so spam via specific keywords & markers of threat.
Object Recognitions. Face ID system uses a front camera to detect a user’s face and match what’s camera sees to the pre-downloaded facial features of an owner and automatically unblock the device.
Fraud detection. Suspicious activity from the user’s account? Block access. Classic move in all popular social networks people use.

The accuracy, heterogeneity, linearity, and redundancy of the data should also be analyzed before selecting a supervised learning algorithm.

What Is Unsupervised Learning?

Unsupervised learning uses unlabeled data to train models. Unlabeled data means that there are no fixed output variables. The model learns from data, figuring out what exactly you’ve given to it on its own by discovering features, patterns, and behaviors in the data. Unsupervised learning is used for clustering, feature learning, and dimensionality reduction. The most commonly used unsupervised learning algorithms are:

k-means;
hierarchical clustering;
mixture models;
OPTICS;
autoencoders;
self-organizing maps.

How Does Unsupervised Learning Work?

How to Know Which Machine Learning Algorithms to Use: Techniques in Machine Learning - photo 2

This learning technique uses machine learning algorithms to identify patterns in data sets containing data points that are not classified or labeled. The algorithms are allowed to classify, label, or group the data points contained within the data sets on their own.

In unsupervised learning, an AI system will group information according to differences and similarities. The algorithms analyze the underlying structure of the data sets by extracting useful features or information from them. For instance, an algorithm may be given datasets containing images of animals. The algorithm classifies the animals according to their features like fur, ears, tail, etc. Unsupervised learning is a basis for many data mining techniques in machine learning.

Benefits of Unsupervised Learning

Here are some of the advantages of using unsupervised learning:

Gains handy insights from raw data sets;
Finding hidden patterns is no longer an issue;
AI can use a variety of algorithms to discover the relationships or differences between various data sets and points.

Examples of Unsupervised Machine Learning

Here are some of the most common uses for unsupervised learning:

Dimensionality Reduction. Unsupervised algorithms can reduce the number of features or variables within the datasets. It removes noisy data and allows focusing on the relevant information for various objectives. For example, people use this technique to see blurry images better.
Customer Segmentation. For more efficient marketing, companies utilize unsupervised learning algorithms like DBSCAN and k-means that help to divide customers into groups based on their behavior and habits.
Clustering Anomaly Detection. Unsupervised algorithms can identify unusual data points in datasets. See: cases where deep learning algorithms identify out-of-pattern features — abnormalities — on X-rays for tuberculosis screening.

These examples are only scratching the surface of unsupervised learning capabilities.

What Is Reinforcement Learning Theory?

Reinforcement in learning theory trains machines to take suitable accents and maximize rewards in any situation. It uses an agent and an environment to produce actions and rewards. The agent has a start and end state, but there might be different parts for reaching the end state (like in a maze). Widely-used algorithms of this type include:

Q-learning;
Deep Q-Networks;
SARSA.

There are no predefined target variables in this learning technique.

How Does Reinforcement Learning Work?

How to Know Which Machine Learning Algorithms to Use: Techniques in Machine Learning - photo 3

The agent can perceive and interpret its environment, take actions, and learn through trial and error. It learns to perturb and sense the state of the environment. The goal of the model is to “survive” conditions you’ve thrown it in and stick to “rewarding” behavior as much as possible. Autonomous automobiles are learning not to drive over people via reinforcement learning, for example. They’ve got thrown into simulations of city and learn as much as they need to stop on the red lights, not drive on the pavements, and so on, learning to avoid the negative (e.g.: collisions) and seek the positives (e.g. reaching the destination without collisions & breaking the traffic rules.)

Reinforcement learning is in most robots out there, it’s on the verge of the world right now: cute robotic vacuum cleaners learn via reinforcement learning, video games are employing it, and so on.

Reinforcement Learning — Benefits

Here are some of the advantages of using reinforcement learning:

runs in real-time allowing to find better solutions and exploitation;
learns on its own;
doesn’t require large labeled datasets;
[bonus] gives the impression of “real” artificial intelligence;

Examples of Reinforcement in Machine Learning

Applications of reinforcement learning aren’t limited to automobiles and games, though. Here’s what else these models can do.

Text Data Mining. A cloud computing company, Salesforce, used reinforcement learning along with an advanced contextual text generation model to develop a system that can produce readable summaries of texts.
Healthcare. Reinforcement learning is used in the healthcare industry. It’s used for clinical trials, medication dosing, and optimization of treatment policies.
Trade Execution. Financial companies like JPMorgan already implement reinforcement learning solutions. The company announced that it would start using a robot for trading execution of large orders.
Robotics. Reinforcement learning is a good instrument for solving high-dimensional control problems. For instance, Google implements AI technologies such as Deep Mind that allow cutting the company’s energy consumption by 40%.

Reinforcement learning’s reliance on environment exploration is one of the deployment barriers to this type of machine learning — tests are often pretty expensive and time-consuming. But we’re sure we’ll be seeing more of it in 2022.

Choosing the Right Approach

Each machine learning technique has its strong points and shortcomings — you make a choice based on what you need your model to accomplish.

If we talk about supervised versus unsupervised machine learning, unsupervised algorithms aren’t capable of performing processing tasks of the same complexity as supervised. Supervised models are more reliable because of their predictability. An unsupervised learning AI system can figure out on its own how to sort data, but it might also add undesired categories to the output.

Reinforcement learning, on the other hand, is made up of several algorithms. It can be used for sequences of actions, while supervised and unsupervised learning is mostly used in an input-output manner. Which machine learning technique suits you the most depends on your company’s objectives. We can help you make that choice and pick the right solution for your particular situation. Our company provides custom AI software development services to fulfill your business needs, has extensive knowledge and experience in creating machine learning solutions for various projects. Contact us to discuss your AI-related idea.