Unit 18 Artificial Intelligence (AI)

A summary of the key specification points (not those relating to graphs) written by ChatGPT-3 can be found here.

Files and Resources

Specification

Show understanding of how graphs can be used to aid Artificial Intelligence (AI)
- Purpose and structure of a graph
- Use A* and Dijkstra’s algorithms to perform searches on a graph
  - Candidates will not be required to write algorithms to set up, access, or perform searches on graphs
Show understanding of how artificial neural networks have helped with machine learning
Show understanding of Deep Learning, Machine Learning and Reinforcement Learning and the reasons for using these methods.
- Understand machine learning categories, including supervised learning, unsupervised learning
Show understanding of back propagation of errors and regression methods in machine learning

If the video does not play above, follow this hyperlink: https://www.youtube.com/watch?v=1fkV5rB13jQ

Shortest Path Algorithms

In summary

There are two algorithms with which you need to be familiar: Dijkstra and A*.

Dijkstra finds the shortest path from a given node to all the remaining nodes. A* looks to only find the shortest path between two given nodes. It is more optimised than Dijkstra as we often look for a specific path.

Detail

Dijkstra’s shortest path is an algorithm which finds the shortest path from a source/node. Similar to Prim’s algorithm to find the minimum spanning tree, we always choose the most optimal local solution. We keep an array, or any data structure, of distances where all lengths are infinite. From the starting node, we would set that node to visited and go through its neighbouring nodes, updating its new values in the distance data structure if needed (if the new path is shorter than existing path to that node). Then, going through the distance array, we find the node, closest to the current tree, and repeat until all nodes have been visited.

It's big O is: O(|E| log |V|). E is the number of edges, V the number of vertices.

The textbooks go into sufficient detail on these algorithms and along with the lesson slides, should provide all the help you required.

Animation of A* (taken from Wikipedia)

Useful Websites

A* pseudocode example

create OpenSet list # containing only the starting node

create ClosedSet list # it should be empty

while (the destination node has not been reached):

select the node with the lowest f score in the open list

# if multiple nodes have same f score, go by lowest h score, else pick from tied nodes at random

if (this node is our destination node) :

Goal achieved, determine path by stepping backwards through previous nodes

if not:

put the current node in the closed list and look at all of its neighbours

for (each neighbour of the current node):

if (neighbour has lower g value than current and is in the closed list) :

replace the neighbour with the new, lower, g value

current node is now the neighbour's parent

else if (current g value is lower and this neighbour is in the open list ) :

replace the neighbour with the new, lower, g value

change the neighbour's parent to our current node

else if this neighbour is not in both lists:

add it to the open list and set its g

You can find the VB.NET (Python will be done at some point) base program on my GitHub page: GitCoder001 here. You can then clone this to begin building your A* solving algorithm. The program is fully documented in the comments.

Side note: Most Optimal Algorithm Discussion

An excellent answer was posted to StackExchange in regard to BFS (breadth first search), DFS (depth first search), Dijkstra and A*. The link to the full answer is here.

I recently made a project to solve a given maze using different pathfinding algorithms. I did this by importing a black and white maze image, and making each junction a node. I tried solving this using DFS, BFS, Dijkstra and A*, but noticed that surprisingly DFS gave me the shortest running time. My question then is, does it ever make sense to use a more advanced algorithm such as Dijkstra or A* on a perfect maze(one that only has one solution)? Or do those algorithms only make sense in a maze with multiple solutions? I researched this online, and found that a lot of people like using A* for this sort of problem, but I don’t understand how that’s better, at least for a perfect maze.

This is an interesting question. Let's explore it to see why you might be seeing what you're seeing.

Of the four algorithms you've mentioned - BFS, DFS, Dijkstra's and A* - three of them (BFS, Dijkstra's, and A*) are designed to find shortest paths in structures where there are multiple different paths available and you're interested in finding the shortest. In that sense, running BFS, Dijkstra's, and A* will all, in some sense, incur a cost overhead because you're paying for something you aren't using. Dijkstra's algorithm, in particular, should perform no better than BFS in this case. Taking any step will cost you the same amount, and the cost of maintaining a priority queue or some other structure to find the lowest-cost node in the frontier will likely cost more than a standard queue. In that sense, we can probably rule out Dijkstra's as a candidate for the fastest algorithm here.

That leaves us BFS, A*, and DFS. Let's first compare BFS and DFS. The advantage of DFS is that it's theoretically fast (linear time) and the memory access patterns involved in running DFS (maintaining the top of a stack and probing places near the most-recently-visited spot) plays well with caches. The advantage of BFS is that it will stop searching as soon as it finds the shortest path, with the drawback being that memory accesses are more scattered and play less well with caches.

Let's make a quick geometric argument here. BFS expands outward from the starting location by exploring paths of progressively longer and longer lengths. In that sense, you can imagine that the regions searched by BFS will form something that vaguely approximates a circle centered on the start location. The radius of this circle will be equal to the length of the shortest path found. In that sense, if there were no obstacles, you'd expect BFS to visit some constant fraction of the total spaces in the maze before finding the exit, and with obstacles present it's likely to explore most, if not all, of the spaces. DFS stops as soon as it finds the exit, and it's likely to explore lots of dead ends along the way, so there's similarly a good chance that it'll explore a large fraction of the maze cells. Given the choice between the two, my bet is that DFS would be slightly faster, since generally speaking the constant factor for DFS is lower than BFS.

Then there's DFS versus A*. That's a harder one to analyze a priori. DFS is generally speaking a much faster algorithm than A* because of the associated overhead of maintaining distances in A*, but A* tends to search in directions that are much more likely to get you to the destination. It would probably depend on the shape of the maze. If the maze was constructed in a way that has a lot of long, twisty passageways, then A* would probably do better because it would avoid going the wrong direction until it absolutely had to, where DFS might spend lots of effort descending the wrong way. But you'd have to look at the balance between those factors to be sure.

There's one other issue and that's how the maze itself was generated. There are many different maze generation algorithms - you can use Kruskal's algorithm, DFS, Prim's algorithm, or Wilson's algorithm, for example, to generate mazes. Mazes made with DFS tend to have fewer, longer corridors, while Kruskal's algorithm and Prim's algorithm tend to make many shorter corridors. It may be the case that DFS tends to do better in some of those cases than others, while A* may do better or worse as well. So perhaps the difference between A* and DFS has to do with the maze shape in addition to their own implementation details.

So overall, I'm not surprised to hear that your DFS was the fastest maze-solving algorithm mostly due to DFS's simplicity and good cache locality compared with the other algorithms. The fact that it's beating A* is likely due to the overhead associated with A* not being worth the savings in spaces explored. But to get more data, perhaps you should look at the following:

What fraction of the maze, on average, does each algorithm explore before finding the shortest path?
How long are the shortest paths in the mazes?
How were the mazes generated, and do the answers to the above questions depend on the algorithm used?

Dijkstra's Shortest Path Algorithm Videos

There are several videos for each algorithm, as all are explained in different ways - one may be more suitable for you than another.

A* Videos

General Graphing Videos

What is AI?

Sections from below have been taken from TowardsDataScience and SearchEnterpriseAI

AI has grown to offer many different benefits across industries like healthcare, retail, manufacturing, banking and many more.

Artificial Intelligence, Machine Learning, Deep Learning, Data Science are popular terms in this era. Knowing what it is and the difference between them is more crucial than ever. Although these terms might be closely related there are differences between them see the image below to visualise it.

Here is a video that (after a couple of minutes) looks at neural networks in a very clear way

Humans have long been obsessed with creating AI ever since the question, “Can machines think?”, was posed by Alan Turing in 1950. AI enables the machine to think, that is without any human intervention the machine will be able to take its own decision. It is a broad area of computer science that makes machines seem like they have human intelligence. So it’s not only programming a computer to drive a car by obeying traffic signals but it’s when that program also learns to exhibit the signs of human-like road rage.

Types of Artificial Intelligence System

AI systems are classified by their ability to imitate human behaviours, the hardware they use to do so, their applications in the real world and the theory of mind. Using these features for comparison, all systems of artificial intelligence actual and hypothetical fall into one of three types:

ANI: Artificial Narrow Intelligence

Artificial Narrow intelligence is also known as weak AI and it is the only type of AI that exists in our world today. Narrow AI is goal oriented and is programmed to perform a single task and is very intelligent in completing the specific task that it is programmed to do. Some examples of ANI are Siri, Auto pilot in an airplane, chat bots, self driving cars etc.

Narrow AI systems are not conscious, sentient or driven by emotions as humans are; they use information from a specific dataset and do not perform any task that is outside of the single task that they are designed to perform.

AGI: Artificial General Intelligence

Artificial General Intelligence, also referred to as strong AI (the Watson textbook classes this as separate - which it isn't) is a concept in which machines exhibit human intelligence. In this the machines have the ability to learn, understand and act in a way that is indistinguishable from a human in a given situation. The General AI does not currently exist but has been used in many sci-fi Hollywood movies in which humans interact with machines that are conscious, driven by emotions and self-aware.

Using strong AI we can have the ability to build machines that can think, strategize and perform multiple tasks under uncertain conditions. They can integrate their prior knowledge in decision-making to come up with innovative, creative and unconventional solutions.

ASI: Artificial Super Intelligence

I am sure you remember Arnold Schwarzenegger’s “The Terminator” where a machine's cognizance superseded human intelligence in all aspects. Artificial Super Intelligence is a hypothetical AI where machines will be capable of exhibiting intelligence that surpasses that of the brightest humans. In this type of AI, apart from having multifaceted intelligence of human beings, machines will have greater problem-solving and decision-making capabilities that will be far superior to human beings. It is the type of AI that will have a great impact on humanity and may lead to the extinction of the human race from the planet.

What is Machine Learning?

Machine Learning is a subset of Artificial Intelligence that uses statistical learning algorithms to build systems that have the ability to automatically learn and improve from experiences without being explicitly programmed.

Most of us use machine learning in our day-to-day lives when we use services like recommendation systems on Netflix, YouTube, Spotify; search engines like Google; voice assistants like Google home and Amazon Alexa. In Machine Learning we train the algorithm by providing it with a lot of data and allowing it to learn more about the processed information.

Machine learning fuels all sorts of automated tasks that span across multiple industries, from data security firms that hunt down malware to finance professionals who want alerts for favourable trades. The AI algorithms are programmed to constantly be learning in a way that simulates as a virtual personal assistant—something that they do quite well.

Machine learning involves a lot of complex math and programming that, at the end of the day, serves a mechanical function the same way a flashlight, a car, or a computer screen does. When we say something is capable of “machine learning”, it means it’s something that performs a function with the data given to it and gets progressively better over time. It's like if you had a flashlight that turned on whenever you said “it's dark,” so it would recognize different phrases containing the word "dark."

ML algorithms (Deep Learning, Linear Regression, Clustering, etc) can be broadly classified into four categories: Supervised, Unsupervised, Reinforcement learning and Semi-Supervised learning.

Supervised Learning

In supervised learning we have input variables (x) and an output variable (Y) and we use an algorithm to learn the mapping from input to output. In other words, a supervised learning algorithm takes a known set of input dataset and its known responses to the data (output) to learn the regression/classification model. A learning algorithm then trains a model to generate a prediction for the response to new data or the test datasets.

Supervised learning requires labelled data, that is data that has been given to it within a known category. E.g., pictures of cats or dogs, etc. Data labelling is the process of identifying raw data (images, text files, videos, etc.) and adding one or more meaningful and informative labels to provide context so that a machine learning model can learn from it. For example, labels might indicate whether a photo contains a bird or car, which words were uttered in an audio recording, or if an x-ray contains a tumour.

Today, most practical machine learning models utilise supervised learning, which applies an algorithm to map one input to one output. For supervised learning to work, you need a labelled set of data that the model can learn from to make correct decisions. Data labelling typically starts by asking humans to make judgments about a given piece of unlabelled data. For example, labellers may be asked to tag all the images in a dataset where “does the photo contain a bird” is true. The tagging can be as rough as a simple yes/no or as granular as identifying the specific pixels in the image associated with the bird. The machine learning model uses human-provided labels to learn the underlying patterns in a process called "model training." The result is a trained model that can be used to make predictions on new data.

In machine learning, a properly labelled dataset that you use as the objective standard to train and assess a given model is often called “ground truth.” The accuracy of your trained model will depend on the accuracy of your ground truth, so spending the time and resources to ensure highly accurate data labelling is essential.

Unsupervised Learning

Unsupervised Learning is used when we do not have labelled data. Its main focus is to learn more about the data by inferring patterns in the dataset without reference to the known outputs. It is called unsupervised because the algorithms are left on their own to group the unsorted information by finding similarities, differences and patterns in the data. Unsupervised learning is mostly performed as a part of exploratory data analysis. It is most commonly used to find clusters of data and for dimensionality reduction. This type of ML is useful for Big Data applications, where extremely large sets of structured and unstructured data that cannot be handled with traditional methods. Big data analytics can make sense of the data by uncovering trends and patterns. Machine learning can accelerate this process with the help of decision-making algorithms. It can categorize the incoming data, recognize patterns and translate the data into insights helpful for business operations.

Reinforcement Learning

In simple terms, reinforcement learning can be explained as learning by continuously interacting with the environment. It is a type of machine learning algorithm in which an agent learns from an interactive environment in a trial and error way by continuously using feedback from its previous actions and experiences. The reinforcement learning uses rewards and punishments, the agents receive rewards for performing correct actions and penalties for doing it incorrectly. Just as with humans, the algorithms develop traits that favour actions leading to rewards and steer away from those which may have penalties.

Reinforcement learning differs from the supervised learning in that in supervised learning the training data has the answer key with it so the model is trained with the correct answer itself whereas in reinforcement learning, there is no answer but the reinforcement agent decides what to do to perform the given task. In the absence of a training dataset, it is bound to learn from its experience.

Semi-Supervised (Active) Learning

Semi-supervised learning is an approach to machine learning that combines a small amount of labelled data with a large amount of unlabelled data during training. Semi-supervised learning falls between unsupervised learning (with no labelled training data) and supervised learning (with only labelled training data). It is a special instance of weak supervision.

Unlabelled data, when used in conjunction with a small amount of labelled data, can produce considerable improvement in learning accuracy. The acquisition of labelled data for a learning problem often requires a skilled human agent (e.g. to transcribe an audio segment) or a physical experiment (e.g. determining the 3D structure of a protein or determining whether there is oil at a particular location). The cost associated with the labelling process thus may render large, fully labelled training sets infeasible, whereas acquisition of unlabelled data is relatively inexpensive. In such situations, semi-supervised learning can be of great practical value. Semi-supervised learning is also of theoretical interest in machine learning and as a model for human learning.

What is Deep Learning? Downsides?

Deep learning is a machine learning technique that is inspired by the way a human brain filters information, it is basically learning from examples. It helps a computer model to filter the input data through layers to predict and classify information. Since deep learning processes information in a similar manner as a human brain does, it is mostly used in applications that people generally do. It is the key technology behind driver-less cars, that enables them to recognize a stop sign and to distinguish between a pedestrian and lamp post. Most of the deep learning methods use neural network architectures, so they are often referred to as deep neural networks.

ML refers to an AI system that can self-learn based on the algorithm. Systems that get smarter and smarter over time without human intervention is ML. Deep Learning (DL) is a machine learning (ML) applied to large data sets. Most AI work involves ML because intelligent behaviour requires considerable knowledge

There are lots of confusion and overblown expectations about deep learning and in many cases relatively simpler models more suitable than their deep learning counterparts. Deep learning takes the concept of adaptive AI behaviour a step further from machine learning using unstructured data available. It deals with such concepts as deep neural networks, machine translation, bioinformatics and many more.

The problems most, especially small, businesses are facing do not really require such complex and sophisticated methods. Deep learning is also inherently slow. For example, it can take several weeks of just training using multiple GPUs. Deep learning also demands huge sample sizes to train on. Deep neural networks often take hundreds of thousands or even more samples to achieve high performance. Many problems require so-called labelled datasets (each sample is annotated with an expected value). Such labelling is time-consuming and may often require to do it manually. Deep neural networks are often considered black boxes whose inner operations are not really interpretable. But interpretability is important because it can give new insights into relationships between numerous variables and expected outcomes. It also increases the trust of the people who use it. This ability to explain solutions is inherent to many simpler methods, in particular, linear ones, where the direct relationship between parameters can be analysed.

Deep Learning is basically mimicking the human brain, it can also be defined as a multi neural network architecture containing a large number of parameters and layers. The three fundamental network architectures are as listed below:

Convolutional Neural Networks (CNN)

Convolutional Neural Network (CNN) is basically an artificial neural network that is most widely used in the field of Computer Vision for analysing and classifying images. It is a deep learning algorithm that takes the input image and assigns weights/biases to various aspects or objects in the image, so that it can differentiate one from the other. The hidden layers of a CNN typically consist of convolutional layers, pooling layers, fully connected layers, and normalization layers. The architecture of a ConvNet is analogous to that of the connectivity pattern of Neurons in the Human Brain and was inspired by the organization of the Visual Cortex.

Recurrent Neural Networks

Recurrent Neural Networks is a type of neural network architecture that is used in sequence prediction problems and is heavily used in the field of Natural Language Processing. RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being dependent on the previous computations. Another way to think about RNNs is that they have a “memory” which captures information about what has been calculated so far.

Recursive Neural Networks

“A recursive neural network is a kind of deep neural network created by applying the same set of weights recursively over a structured input, to produce a structured prediction over variable-size input structures, or a scalar prediction on it, by traversing a given structure in topological order.”

A recursive neural network is more like a hierarchical network where there is really no time aspect to the input sequence but the input has to be processed hierarchically in a tree fashion. Here is an example of how a recursive neural network looks. It shows the way to learn a parse tree of a sentence by recursively taking the output of the operation performed on a smaller chunk of the text.

How our brains work

The human brain is a complex network containing around 90 billion cells called neurons that communicate with each other via connections called synapses. The brain communicates by sending signals that travel through the neuron to the synapses. When the message reaches the end of a neuron, called the synapse, it is converted into a chemical signal called a neurotransmitter. The neurotransmitter that travels across the synaptic gap to the neuron on the other side, which turns the neurotransmitter back into a signal and sends it down the line.

The number of these synaptic connections is mind-boggling, and researchers estimate that neurons interconnect at a hundred trillion to one thousand trillion synapses. Neurons have specialized appendages called dendrites and axons. Dendrites bring information to the center of the neuron and axons take information away from the center. The neuron collects multiple incoming signals through several extremely long arms called dendritic trees that form branches.

Every thought, experience, physical sensation and feeling triggers thousands of neurons, which form a neural network. When we repeat an experience, the brain learns to trigger the same neurons each time, by strengthening the synapses, according to Hebb’s axiom.

Hebb’s Axiom

In 1949 Canadian neuropsychologist Donald Hebb suggested that learning occurs by strengthening the synapses, with neurons functioning as merely the computational elements. Dr. Hebb’s axiom has remained the widely held assumption in the field of neuroscience and gave rise to what is known as Hebb’s axiom,

“neurons that fire together wire together.”

A phrase that Donald Hebb coined in 1949 and had remained a central tenet of neuroscience.

Wiring neurons together to create a neural network is beneficial in that it helps us to store and recall information – and thereby learn – in an efficient way. For example, a neural network helps you to remember the name of a new acquaintance by creating connections. However, the neural network can be stubborn when we try to rewire neurons to respond to a familiar situation in a different way, giving rise to expressions like ‘you can’t teach an old dog new tricks.’. Further research has pointed to learning developing more in the synapses but this is irrelevant for this topic.

What is an Artificial Neural Network?

We talked above about different types of neural network. Artificial neural networks are key to deep learning (supervised learning) and mimic the structure of the human brain. The extract below is from an excellent article published by Bernard Marr and you can read the full article here.

Artificial neural networks (ANN) give machines the ability to process data similar to the human brain and make decisions or take actions based on the data. While there’s still more to develop before machines have similar imaginations and reasoning power as humans, ANNs help machines complete and learn from the tasks they perform.

What else can artificial neural networks do?

Artificial neural networks are a main component of machine learning and they are designed to spot patterns in data. This makes ANNs an optimal solution for classifying (sorting data into predetermined categories), clustering (finding like characteristics among data and pulling that data together into categories) and making predictions from data (such as helping determine infection rates for COVID, the next catastrophic weather event or box-office smash). In everyday life, ANNs are powering the “watch next” feature of YouTube videos, creating realistic CGI faces, helping detect fraud, giving us the ability to chat with chatbots and more. In fact, there are probably not many tasks an artificial neural network can’t do as long as it’s trained to do it.

How do artificial neural networks work?

Ultimately, ANNs try to replicate how our human brains process information and make decisions. While ANNs are based on mathematical theory created in the 1940s, it wasn’t until the last couple of decades that it became a focus for artificial intelligence. When back propagation was developed to help these networks learn and adjust actions based on outcomes its development and adoption really began to accelerate.

When a human brain receives an input, it processes it through a series of neurons. Different neurons of the human brain are responsible for processing different aspects of input in a hierarchical fashion. ANNs try to replicate this through artificial neurons called units that are arranged in layers and connected to each other to create a web-like structure.

ANNs have an input layer and output layer. Between these two layers there are other hidden layers that perform the mathematical computations that help determine the decision or action the machine should take. Ultimately, these hidden layers are in place to transform the input data into something the output unit can use.

The data is processed by each hidden layer and then moves on to the next based on connections that are weighted. Think of this process as an assembly line in a factory—raw materials as the input and different stops on the conveyor belt to add an element to the product equate to the hidden layers of an ANN that processes the data until you get to the output. Based on what the machine learns about the data when processed by one layer, it determines how to move it through to the next, more senior layer based on the value it receives when evaluated. Based on the complexity of the issue at hand, it can continue to process through more senior units until delivered to the output layer.

Before an ANN can be fully deployed, it must be trained. This training involves comparing an outcome a machine gets with the human-provided description of what outcome is expected. If these don’t match, the machine uses this feedback and goes back to adjust the weights of the layers (called backpropagation). These new learning rules are applied and help guide the neural networks on future processing.

To illustrate how this works for the human brain, consider how humans might learn how to shoot a basketball so they score more baskets. Over time and with experience, different techniques are tried to improve the odds the shot will make it in the basket—bending legs less or more, adjusting the hand position, shooting force, the angle of the shot, use of backboard, etc. When a shot doesn’t make it in, the brain adjusts based on this feedback and tries something else. Over time, there is enough learning to improve the outcome so that more balls make it through the net than get rejected.

Types of artificial neural networks

There are several types of artificial neural networks including the feedforward neural network, recurrent neural network and a variety of others. The network you use is based on the data set you have to train it with as well as the task you want to accomplish.

A feedforward neural network, the most basic type of neural network, can only process data from input to output in one direction. This is what is used for supervised machine learning when you already know what outcome you want the network to achieve. It’s the basis for many commercial applications such as machine vision. A recurrent neural network has data flow in multiple directions and is widely used for more complex tasks. Use cases for recurrent neural networks include document generation and real-time language translation.

Future of ANNs

While ANNs can tackle most tasks if they are allowed to train for it, the biggest obstacle to overcome is the amount of time it takes to train ANNs and the computing power required for a complex task. In addition, it’s impossible for humans to fully understand what happens in the hidden layers of an artificial neural network. Although researchers are actively working on this, there is still a lot to learn even though we’ve come so far in helping machines think and act like a human.

Summary

Hidden layers are where the processing is done. The front and end take the input and final outputs, but the processing is carried out on the layers. For very complex problems, more layers are required. More layers also improves the accuracy of the result and enables the neural network to learn and make decisions on its own.

How does an Artificial Neural Network operate?

Neurons are the fundamental units of the brain which contains Dendrites, Axons, Synapses, etc. Dendrites act as a receiver whereas axon acts as a transmitter of signals to and from other neurons. In case of a human brain, the input comes from our senses like ear, nose, eyes etc., and gets processed by our brain.

In an artificial neural network (ANN), inputs (I1, I2, I3..In) are independent variables whereas the output(s) are dependent variables. The synapses are the weights assigned to each input neurons. An independent variable is one that can represent any value and do not have to be related to other inputs. A dependent variable is one whose value is calculated based on the inputs and a given function.

By changing the weights, neural networks learn which signal is important and these are the things that get adjusted during the process of learning.

Step 1: Once the input neurons(independent variables) are chosen its weights are assigned.

Step 2: Now the activation function is applied to the summation of inputs and weights . Depending on the activation function, neurons learn which signal needs to be passed.

Step 3: Output (y) is generated and the cost function is calculated for the actual and predicted output. Based on the feedback from this cost function, all the weights get reassigned and again the whole process repeats until we get better feedback. This whole process is known as model training.

Note: There may be many layers between input and output neurons and these intermediate layers are known as hidden layers. Neural networks depend on the number of hidden layers, so be very calculative and selective while selecting hidden layers.

Deeper Look at an ANN Through Example

The following is an extract from a Towards Data Science article

The following section goes into detail surrounding the function and mathematics of an artificial neural network. This is beyond what you need to know for the exam but does help to clarify how ANNs operate.

To make things clearer, lets understand ANN using a simple example: A bank wants to assess whether to approve a loan application to a customer, so, it wants to predict whether a customer is likely to default on the loan. It has data like the table on the right

So, we have to predict Column X. A prediction closer to 1 indicates that the customer has more chances to default.

Lets try to create an Artificial Neural Network architecture loosely based on the structure of a neuron using this example below:

In general, a simple ANN architecture for the above example could be:

Key Points related to the architecture:

The network architecture has an input layer, hidden layer (there can be more than one) and the output layer. It is also called MLP (Multi Layer Perceptron) because of the multiple layers.
The hidden layer can be seen as a “distillation layer” that distils some of the important patterns from the inputs and passes it onto the next layer to see. It makes the network faster and efficient by identifying only the important information from the inputs leaving out the redundant information
The activation function serves two notable purposes:
- It captures non-linear relationship between the inputs
- It helps convert the input into a more useful output.
- In the above example, the activation function used is sigmoid:

O1 = 1 / (1+exp(-F))

Where F = W1*X1 + W2*X2 + W3*X3

Sigmoid activation function creates an output with values between 0 and 1. There can be other activation functions like Tanh, softmax and RELU.

Similarly, the hidden layer leads to the final prediction at the output layer:

O3 = 1 / (1+exp(-F 1))

Where F 1= W7*H1 + W8*H2

Here, the output value (O3) is between 0 and 1. A value closer to 1 (e.g. 0.75) indicates that there is a higher indication of customer defaulting.

The weights W are the importance associated with the inputs. If W1 is 0.56 and W2 is 0.92, then there is higher importance attached to X2: Debt Ratio than X1: Age, in predicting H1.
The above network architecture is called “feed-forward network”, as you can see that input signals are flowing in only one direction (from inputs to outputs). We can also create “feedback networks where signals flow in both directions.
A good model with high accuracy gives predictions that are very close to the actual values. So, in the table above, Column X values should be very close to Column W values. The error in prediction is the difference between column W and column X:
The key to get a good model with accurate predictions is to find “optimal values of W — weights” that minimizes the prediction error. This is achieved by backpropagation and this makes ANN a learning algorithm because by learning from the errors, the model is improved.
The most common method of optimisation algorithm is called gradient descent, where, iteratively different values of W are used and prediction errors assessed. So, to get the optimal W, the values of W are changed in small amounts and the impact on prediction errors assessed. Finally, those values of W are chosen as optimal, where with further changes in W, errors are not reducing further.

Backpropagation

Most deep neural networks are feed-forward, meaning they flow in one direction only from input to output. However, you can also train your model through backpropagation; that is, move in opposite direction from output to input. Backpropagation allows us to calculate and attribute the error associated with each neuron, allowing us to adjust and fit the algorithm appropriately.

The Watson book and PowerPoint slides (L8) go into better depth on this, but the overall aim of backpropagation is to reduce the error rate. That is, we compare the outputs of the model against the expected output. We then use this knowledge to tweak the network's weightings to reduce the errors to an acceptable limit. In complex systems, it is not always possible to reduce this to zero.

Derivatives (calculus) are used to determine the appropriate weights for each neuron. The 'back' part is in reference to feeding backwards along the path from outputs to improve the model. Backpropagation, is essentially the chain rule of calculus applied to computational graphs.

There are two types of backpropagation networks.

Static backpropagation
Recurrent backpropagation

Static backpropagation

In this network, mapping of a static input generates static output. Static classification problems like optical character recognition will be a suitable domain for static backpropagation.

Recurrent backpropagation

Recurrent backpropagation is conducted until a certain threshold is met. After the threshold, the error is calculated and propagated backward.

The difference between these two approaches is that static backpropagation is as fast as the mapping is static.

In conclusion, Neural network is a collection of connected units with input and output mechanism, each of the connections has an associated weight. Backpropagation is the “backward propagation of errors” and is useful to train neural networks. It is fast, easy to implement and simple. Backpropagation is very beneficial for deep neural networks working over error prone projects like speech or image recognition.

Regression

Some prediction problems require predicting both numeric values and a class label for the same input.

A simple approach is to develop both regression and classification predictive models on the same data and use the models sequentially.

An alternative and often more effective approach is to develop a single neural network model that can predict both a numeric and class label value from the same input. This is called a multi-output model and can be relatively easy to develop and evaluate using modern deep learning libraries such as Keras and TensorFlow.

It is common to develop a deep learning neural network model for a regression or classification problem, but on some predictive modelling tasks, we may want to develop a single model that can make both regression and classification predictions.

Regression refers to predictive modelling problems that involve predicting a numeric value given an input (e.g. selecting
Classification refers to predictive modelling problems that involve predicting a class label or probability of class labels for a given input.

Difference Between Classification and Regression in Machine Learning

There may be some problems where we want to predict both a numerical value and a classification value.

One approach to solving this problem is to develop a separate model for each prediction that is required.

The problem with this approach is that the predictions made by the separate models may diverge.

An alternate approach that can be used when using neural network models is to develop a single model capable of making separate predictions for a numeric and class output for the same input.

Regression is method dealing with linear dependencies, neural networks can deal with nonlinearities. So if your data will have some nonlinear dependencies, neural networks should perform better than regression.

Coding ANNs in Python (Keras)

This is extra information and beyond the specification

import keras

from keras.models import Sequential

from keras.layers import Dense

Keras is a Python library for deep learning that runs on the top of Theano or TensorFlow. The sequential model represents the linear stack of layers. The Dense model represents the number of neurons in each in each layer.

# Initialising the ANN by creating object of sequential.

classifier = Sequential()

# Adding the input layer and the first hidden layer

classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 11))

# Adding the second hidden layer

classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu'))

# Adding the output layer

classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))

Here, we are making two hidden layers, each of six neurons. We can then compile our model where epoch = 100 represents iteration to train our model.

# Compiling the ANN

classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

# Fitting the ANN to the Training set

classifier.fit(X_train, y_train, batch_size = 10, epochs = 100)

The full code can be found here

Further information

The following web links give more information on the topics covered in this section and often go far beyond the specification

Large Language Models (ChatGPT, PALM, CALM, LaMDA, etc)

Large language models are a type of artificial intelligence that are trained to generate human-like text. They are trained on vast amounts of data, such as books, articles, and websites, in order to learn the patterns and structures of natural language.

Once trained, the model can then generate new text that is similar to the text it was trained on. For example, if a large language model was trained on a dataset of news articles, it might be able to generate a new article on a similar topic that reads like it was written by a human.

To generate text, the model processes an input prompt and predicts the next word in the sequence based on the words that came before it. The model uses this process to generate a sequence of words that form a coherent piece of text.

Large language models are able to generate human-like text because they have learned the patterns and structures of language from a large amount of data. They can also understand and respond to context and generate text that is appropriate for the given situation.