A Brief Introduction to Machine Learning

If you’ve ever been shopping online and the timing of the promotions seem to match up to your intent to buy, or there are tempting offers to add a few extra items to your basket, then you’ve likely experienced the results of Machine Learning (ML) first hand

But not everything is about making money! ML has been used in a wide range of fields, from healthcare to assist with diagnosis allowing for earlier intervention, or to help understand biodiversity, extinction and species distribution within ecosystems.

However, the results from Machine Learning is only as good as the human-made model, including aspects such as the available data, how it is interpreted and the feedback loops in place.

It can also come with high costs and ethical considerations; where machine learning helps automate tasks, it can result in job loss.

Even if it isn’t employed it directly, MLs broad application across fields make it a rich topic to explore from a multi-disciplinary perspective. Terms like regression, clustering and downstream tasks can feel like barriers at first, so this series starts with the basics, leaving the complexity for later.

How does Machine learning differ from traditional programming?

Traditional programming usually relies on a solution that is designed by the programmer to solve a problem. Machine Learning on the other hand can rely on data to learn a solution.

How does the machine learn?

To begin with, ML often relies on datasets that are used as training data. In simple terms they are used to label the data and identify the features that help describe it.

There are various methods for learning that determine how the data are used and how the labels and features will be generated.

What is data labelling?

Labelling is a way to give data context in order for the machine to learn.

Adding one or more meaningful labels allows a model to categorise the data and understand its relationship to the features that describe it.

What are features?

Features represent data that describe attributes or characteristics that the machine will use to help determine which label should be applied to it.

The data can be either discrete or continuous, providing categorical or numerical information.

Learning methods

There are three types of learning that can be employed by a machine:

  • Supervised
  • Unsupervised
  • Reinforcement

What is supervised learning?

The supervised learning approach relies on providing the machine with training material in the form of labelled datasets, which serve as examples for the algorithm to detect patterns and identify the relationship between input data and the expected outputs.

What is unsupervised learning

Unsupervised learning relies on data that has not been labelled and so the algorithm has no knowledge or context surrounding the output, and so, it will be expected to come to its own conclusions.

Some of the key methods for unsupervised learning are clustering, dimensionality reduction, and rule association.

Unsupervised learning is useful at identifying patterns within data that may not have been recognised before.

Additionally it can be used to create clusters; groups of data grouped based on similarity.

What is reinforcement learning?

Reinforcement learning is different in that it uses an intelligent agent to interact with its environment. The agent will be able to take actions for some of which it will receive a reward.

Through trial and error the agent learns to optimise behaviour to maximise the reward it receives and in the most efficient manner that it can find.

What is a machine learning pipeline?

A machine learning pipeline describes the steps involved in preparing a machine model for deployment.

It begins by gathering data and preparing them for the model to create a training dataset, a testing dataset and a validation data set. This includes data hygiene, such as removing errors and ensuring there are no gaps, and generating the features and labels.

Once the data are prepared, it is necessary to identify the kind of model required for the problem that needs solving.

Upon deciding which model is required the algorithm is prepared and is fed data from the training dataset so that it can analyse the relationship between the features and labels.

As soon as the results from the training dataset are deemed accurate, the testing dataset is employed to ensure the accuracy is repeatable.

In reality numerous models might be tested against a variety of requirements. Accuracy is important, but efficiency in terms of computational power can make for a more cost effective solution in the long run.

Once testing is completed, the model can be deployed where it starts making predictions using real-world data. Ideally results are monitored over time to ensure it performs as expected, and revisions are made as required.

Why Machine Learning

It is human nature to recognise patterns, create context and make predictions, which, in a world that gets more complex by the day, is increasingly difficult.

Machine learning is an attempt to replicate that process mechanically. It can certainly process large amounts of data and spot patterns humans can’t, however it can also exacerbate any human bias in the model or data.

Understanding it can tell you a lot about decision making, human error and the tools we use, and even the importance of AI governance as the results of ML begin to have real world impact.