During the summer holiday I studied deep learning and neural networks. In this post I am going to summarise my thoughts of the last summer’s work on AI. (28-30 August 2019, © George Dong)
AI, short for Artificial Intelligence, is a branch of study and practice of machines (computers) gaining human like abilities such as face recognition, language translation and control of other systems.
“Artificial”: In school, we have learned that artificial means man-made and is the opposite of natural. As I lifted my head and looked out of the window, I could see many natural things: trees, grass and flowers, wood board fence on one side and meshed steel wires covered in plastics on the other, and my laptop stuck in the lower part of my view. I am sure you can identify the natural and man-made (artificial) stuff in this list. Artificial objects: the computer, steel wires, plastic cover of the wires; natural: wood boards, trees and grass and flowers.
“Intelligence”: this word has two meanings. The first one means brain power, the ability to learn and to solve problems, as in IQ, intelligence quotient. Well, the second one means information and secrets, as in CIA - central intelligence agency. If a person learns new things fast and answers more questions correctly, we say he or she is clever, intelligent, brainy or even a genius, which means they have a higher level of intelligence.
Intelligence is something that living things have and machines are not living things and they are not supposed to have intelligence. Try it! If you kick a dog, it will run away or bark at you. If you kick a tyre of a car or a bicycle, will it do anything? It will not because it does not have intelligence, no brain. If scientists and engineers somehow make them to get clever - be able to do things and/or, to learn new things, then we call that artificial intelligence because it is made by humans. These machines can then find out their environment, make decisions on what to do. For example, in east London there is DLR - Dockland Light Railway system which started operation in the mid 1980s. The DLR trains don’t need a driver, but you usually find one on the train anyway, because its control system has some intelligence - can sense the signals and decide to drive or stop, open doors or close them. In most cases like this we used to call them automatic-something, automatic doors, automatic elevators, automatic (aeroplane) pilot, and so on.
In the simplest terms, artificial intelligence is the fact that machines make their own decisions on what action to take. It has been around for some time!
How do these machines gain the intelligence, or how do scientists and engineers make them intelligent?
Initially, electrical switches and various sensors to control operations, such as when to start and stop water flowing in the machine, when to turn on and off the heater, when to speed up and when to slow down. Even though they appear to be clever as they know these without humans telling them, they are not called AI. They are simply automatic systems, control systems. But I, personally, believe they are the forefathers of AI! (bear in mind that there are two groups of people initially working on control systems and on AI, they wanted to keep things clear in the old days. But nowadays, scientists and engineers from a wide range of disciplines (subject areas) tend to work together so that they could work better.
AI specifically refers to computers with programs stored in them and to their abilities and knowledge. In this sense the first wave of AI is the expert systems and systems similar to that. I these systems, all the intelligence they have are hand-coded into the programs and the programs are stored inside the computer to run, to drive it to make the right decisions and to take the right actions. The software engineers will work with the experts in that field to translate all the human knowledge into the code, with a lot of if statements and loops! The experts provide the expertise, the methods to diagnose and to solve problems. The software engineers call that the algorithm, the step by step actions to solve a problem and they write them down in a special language called programming language so that the computer will be able to understand the program.
This type of ai is the majority in real life upto 2012-2016 when another type of AI has matured in many areas, not all areas yet. Contrasting to the hand-coded knowledge, this new brand of AI gains knowledge by learning from past data, from examples. This is called machine learning. (Question in case I forget, do all machine learning use neural networks? Does sklearn use nn?)
We know what learning means, right? A baby learns to talk by mimicking and repeating after others. A student learns by answering questions and get feedback from teachers. In machine learning the machine, actually its program, behaves like a student and learns to answer questions. The answers are then marked by the program and force it to try again and again until the answer is the same as the answer or close enough. At that moment we say the machine has learned the knowledge, aka the model has been built. It can be used to solve problems now.
Wikipedia has a longer definition but I am quoting two shorter ones below from (source 1; https://machinelearningmastery.com/what-is-machine-learning/)
The field of machine learning is concerned with the question of how to construct computer programs that automatically improve with experience.
Machine Learning is the training of a model from data against a generalised performance measure.
The beauty of machine learning is that the software engineers do not need to code the algorithms to solve the problem. Instead, the machine will learn from samples to work out how to do it. People invest
Automatic control of machines was the starting point of present day AI. robots, drones Satnav, short for satellite navigation system, on most phones and in many cars is another example of artificial
Remember, machine learning is simply model building, or computer programming. So when you see model, agent, algorithms and machine learning in most publications, they mean the same or very similar things.
The main reasons are for people, businesses and individuals to have a better life!
To do what people are doing and do it better
To do what people could not do
It cannot be ignored that there is a distrust and fear over AI from the public. The ones who are investing are governments, financiers and big corporates. Is that correct?
ML, especially DML (Deep machine learning) can do miracles! Using computers to recognise
Here we constrain the discussion in machine learning because this is the new development that caught worldwide attention since 2012.
There are three types of learning:
Supervised Learning, Unsupervised Learning, and Reinforced Learning. Or it could be said two types: Supervised and Unsupervised, to start with. I’ll explain this later.
With supervised learning, the scientists and engineers will prepare training data for the computer to learn with. Each sample consists of a pair of data, the input and the correct output. These two pieces of data act as the question and the correct answer. In their terminology, the correct answer is called “the labels”. Once the program worked out an answer, the correct answer will be used to check if that is correct. The difference between the two answers will be used to adjust the internal settings of the model until the difference is minimum. In simple terms, if the machine’s answer is the same as the correct answer, then great, nothing inside the model needs to be changed. If not, the difference is used to judge how good the result is and the model is asked to change a little and to do the question again so that the difference gets smaller. Once the difference is small enough, we would say, well done, you are trained. Or “you have learned”.
Have you seen something similar to this near you?
You can imagine that a student works on a question and then the teacher checks if the answer is right. If not, try again and again until it is close to perfect. In order to learn well, the student needs to do a variety of exercises that way until he or she is confident about the topic. For ML, the student and the teacher are all in the algorithm, in the program. The number of exercises needs to be sufficiently large to ensure the model learns well. Usually, a model will have to work on at least thousands, upto to millions or even billions of questions to complete training.
With unsupervised learning, the correct answers are not provided by humans, and the program has to find a way of working that out by itself, somehow. Depends on the ways of finding correct answers, unsupervised learning could be further divided into unsupervised and reinforced learning. We’ll discuss this in another post. But have a think about it and see if you can find the correct answers when there is no teacher around you.
To summarise, the common steps for all ML are as follows:
First, convert all input into tensors - matrices of numbers (see footnote for more information). Computers can only work on numbers, binary numbers to be exact! So we need to convert every bit of information into numbers and then into binary numbers. By using tensor, large amount of calculations can be processed quickly, which is necessary as neural network models have a large number of parameters (internal settings) to be learned.
Then the input is entered into the input layer and its output is fed forward to the next layer
This working and feeding process is repeated all the way to the output layer, where the final answer is obtained.
At this moment, the correct answer is used to measure the difference between the two answers, known as the loss professionally. This loss is used to work out the derivative for backpropagation. This can be thought as blame attribution as the derivative will indicate which parameter caused more error, contributed more to the final error. The word backpropagation is a subject specialist term and you can see it as back and propagation. Back means backwards, propagation means the movement of information. The information here is “how much each contributor (input element) has contributed to the final loss”. Then that information is passed back to the contributors and they will make adjustments so that next time they would not make such big an error.
At this stage, the parameter will be adjusted so hopefully next time it will do better.
Then different questions (input) will be fed into the model. Some of the old questions will be fed back again randomly. In the end, the model would have “experienced” a wide range of questions. If the training dataset is large enough and if the data are shuffled, the final model parameters will be the best fit for all the samples, producing the best answers for all different questions.
That same principle works on all of us too. That is why we need to do may exercises, many different types of exercises, repeatedly, so that we could be confident to solve any questions thrown at us in the end.
Until very recently, all machine learning models use neural networks, usually deep ones and all employ backpropagation to learn. That is they all work in the ways we discussed above, with tweaks and twists of various types and magnitudes. However, recently a new report claims that backpropagation is not necessary. I have not read it and could not comment. Its main advantage would be the simplification of training.
No matter what happens with backpropagation, exercising and checking answers are the core of learning and we cannot skip them.
In a pyramid shape, most people in the world will be a user, which forms the base of the pyramid. The experts and specialists are .
As a designer, you need to find applications to design - find problems and needs -
Chatbot
Information extraction
Classification
Sentiment analysis
As a researcher, fundamental research requires funds.
Three types of deep learning neural networks according to Brownlee (Computer Vision)
Cnn
Mlp (fully connected layers)
Rnn
But Transformer is different from them. Not RNN, not CNN, has
Optical character recognition
Classification of image objects. This can be used in many different fields. For example, A production line can use this to pick out the faulty products e.g. a broken biscuit on the conveyor belt.
Object recognition
Face recognition - to identify people in pictures and videos. This can be used for crime prevention and public safety.
The good news is that neural network models are performing better than humans consistently in all aspects of them.
Tensorflow, Pytorch, Theanos - frameworks to build models
Ready made models for fine tuning - VGG, ResNet, Mask R-CNN, FaceNet 2, etc.
All image processing models are based on CNN - convolutional neural networks, not the American CNN cable news network.
Convolutional neural networks all have the same basic construct: convolution layer + pooling layer + flattening layer. This construct may be repeated many times, e.g. ResNet101 has 101 layers of them. It is amazing how these layers work together to pick up all the features from a picture and recognise them.
In a separate post I’ll explain in more detail about how CNNs work in principle, especially the way filters of convolution picking up features.
Loading image: cv2.imread(),
matplotlib.image.imread()
skimage.io.imread()
Skimage is part of scipy, same as numpy, pandas, and matplotlib
Loading label files: Pandas.read_csv(), …
Builtin Python read(), readln() for text files
keras_model.load()
keras_model.save()
model.fit()
model..evaluate()
model.predict()
Matplotlib has imread() and can plot images, apart from charts.
5.1 Recognition
5.2 Motion analysis
5.3 Scene reconstruction
5.4 Image restoration
Natural Language Processing is the next field where machine learning will make breakthroughs. That is what people often referred as “the ImageNet moment” of NLP. The years 2012-2016 was the time when machine learning via neural networks has surpassed humans in all computer vision tasks and is often called “the moment of ImageNet”.
My feeling is that current models alone will not bring that moment about simply because language is too complex compared even with face recognition and the statistic models simply don’t have enough understanding of the language, mostly mimicking.
NER, relation recognition - to collect information
Summarisation
Translation
QnA
NLU - natural language understanding is hard and the current predictive models are not ready yet.
To see more tasks in NLP, check out this website. (http://nlpprogress.com/)
BERT (transformer) seems to be the star model at the moment, compared with earlier models of LSTM and BiLSM and so on. All the state of the art models are based on TRANSFORMER, such as OpenAI’s GPT2 and Facebook’s
A chatbot requires the ability to identify intent - a classification problem. From user input, a word or a sentence, the bot needs to find out the key issue, the intention. Then direct the correct handler to respond. The handler may need to connect to other services to get data such as
Word2vec
Sentence to vector
Let’s take NER as an example to see what python skills are required.
Web page
from bs4 import BeautifulSoup as BS
import requests
Url = 'https://www.dataquest.io/blog/web-scraping-tutorial-python/'
Page = requests.get(Url)
if Page.status_code == 200:
print('download completed')
print(Page.content) ## this is the page’s content
Soup = BeautifulSoup(Page.content, 'html.parser')
Soup.prettify()
list(Soup.children)
ps=Soup.find_all('p')
ps[0].get_text()
Soup.find_all('p', class_='out-text')
Soup.find_all(class_='out-text')
Text file
With open(textfile.txt) as f:
Content = f.read() # one file in one long string
Or
Line = f.readline()
While Line!= “”:
Lines.append(Line)
Line = f..readline()
## One file in a list of lines, which could be paragraphs
Json file
import json
with open('jsonfile.json', 'r') as f:
json_obj = json.load(f)
j_string = json.dumps(json_obj)
with open('jsonfile.json', 'w') as f:
json.dump(json_data, f) # json_data = json object, not json_string
spacy provides a one-stop-shop for tasks commonly used in any NLP project, including:
Basic syntax (... import, function,
Basic builtin functions and data structures (int, float, boolean, string, list, dictionary, set,
Pandas library (for Computer vision and NLP - natural language processing)
Numpy library (for Computer vision and NLP - natural language processing)
Spacy library (for NLP)
Nltk library (for NLP)
Linear algebra,
Vector, matrix, tensor,
Matrix operations
Derivatives
Polynomials
Probability
Statistics
Brownlee, J. 2019 Machine Learning Mastery with Jason. Ebook.
Python documentation
Pandas documentation
Numpy documentation
Spacy documentation
Matplotlib documentation
Brownlee, J. 2019, Deep Learning for Computer Vision, ebook