Machine learning is the future for many automation tasks. Enabling computers to learn will increase their ability to do complex tasks which currently require human interaction.It allows us to use computers to automate tasks that are hard or time consuming to manually create a program for. For example, writing a program that can label objects in a picture is very time consuming as you would have to create a set of rules to identify each object separately. Using machine learning we can just let a computer find them for us. This although, similarly time consuming, is much cheaper as a computer only needs electricity to function. Another major advantage of using a computer to do such tasks is that we can run a lot of computers in parallel 24/7.
The core objective
Machine learning is generalising from experience, it is a form of artificial intelligence. It should be able to find patterns in data with which it can later predict new data.
It can be categorised by the type of learning algorithm and the type of model. Typically a machine learning model requires a large set of data. This is divided into a training set and a test set to measure the performance of the ML model.
A framework can either support all types of learning and models or just a few. The frameworks are not limited to any programming language, although some languages may be more popular than others.
Supervised
Supervised learning is a category of machine learning where the data set is labeled. The goal is to accurately label unseen inputs.
Unsupervised
Unsupervised learning is a category of machine learning where the data set is not labeled. Here the goal is to group similar data
together. One of the difficulties here is that the amount of groups is often unknown.
Reinforcement learning
Here we train the model by rewarding it for correct actions. The model is often seen as an computer agent in a virtualized environment. The agent has a limited set of actions it can take.
What is natural language processing?
Natural language processing (NLP) is the study of interactions between computers and human (natural) language. It can be used to create useful data points for machine learning. One of the most useful products of NLP is machine translation, the translation of text from one language into another.
Our Project
We are using a combination of NLP and supervised learning to classify text data into different groups. One of the biggest
challenges is gathering the necessary data on which we can train our model.
Once we have gathered the necessary data, we need to convert the text into something the computer can work with. One such method is tokenization, each word is transformed into a token. To simplify the process punctuation some words, like ‘a’ and ‘the’, are discarded. Once transformed we can train our model.