Machine Learning


Sing along with Sam Ball as he walks you through the basics of Machine Learning. In this module you will learn why Machine Learning is important and take a detailed look at the types of algorithm that are used in this exciting field. This module will add some serious firepower to your bursting Python toolbox.

Module Coordinator

Sam Ball




Introduction to Machine Learning



Machine learning is a relatively new part of programming that is about making models with known data, and being able to use these models with new data to predict outcomes. Machine Learning is extremely linked with statistics, econometrics and optimisation, but we will be calling it machine learning for now. In this lesson we learn about the two main types of Machine Learning algorithms.



K Nearest Neighbours



K Nearest Neighbours is a supervised machine learning algorithm used for classification. It takes in labelled data, and then classifies a new point based on the training data that is most similar. K Nearest Neighbours (KNN) is used heavily in search engines to recommend similar results, and is also used in face recognition software to identify a person by their facial features. By the end of this lesson you should be confident in your understanding in how the algorithm works, as well as be able to apply it to some simple examples.



K Means Clustering



K Means clustering is an unsupervised learning algorithm, which takes unlabelled data and categorises it into groups, or clusters. The obvious application is finding clusters of points in space (for example, electron clouds), however it has much wider applications due to the abstract nature of the algorithm. For example, K means clustering is used in market analysis to find groups of similar customers based on spending metrics like money spent on a shopping website. By the end of this guide you should be confident with the K Means clustering algorithm and how to apply it to real data, as well as understanding the limitations of the algorithm and how to deal with them.



Support Vector Machines



A support vector machine is a supervised algorithm that takes a number of catagorised data points with a number of attributes and finds an object that seperates the catagories. We will be looking at the application to classification, which looks very similar to K Nearest Neighbours but works better when data is seperated. We can then use the model to classify new data based on it's characteristics. Support vector machines are used in a wide variety of applications, including cancer diagnosis, which we'll have a look at later in this guide. By the end of this guide you should be comfortable explaining the theory behind support vector machines as well as applying it to some simple problems.