SLE Notes

SLE 1 - Turing Test 8 outcomes

8 Possible Alternatives To The Turing Test


The Turing Test, which is intended to detect human-like intelligence in a machine, is fundamentally flawed. But that doesn't mean it can't be improved or modified. Here are eight proposed alternatives that could help us distinguish bot from human.Caleb McLaughlin Wants to Play Static ShockOffEnglishPerhaps the best way to detect intelligence in a machine is to have it tell you when it's appropriate to laugh (see #2). Illustration by Tara Jacoby. Can digital computers think? In the 1950s, computer science pioneer Alan Turing asked this question another way: "Are there imaginable digital computers which would do well in the imitation game?" While Turing's original query speculated on a computer's ability to participate in a simple party game, the question today is widely interpreted as "Are there imaginable digital computers which could convincingly imitate a human participating in a conversation?" If such a computer is said to exist, the reasoning goes, then that computer may also be considered intelligent.  Turing's test has been the subject of much debate     over the years. One of the biggest objections revolves around the assessment's heavy emphasis on natural language processing skills, which encompass a very narrow measure of intelligence. Another complaint, fueled by the 2014 Loebner Prize controversy, is that the test encourages deception as a means to achieving victory; the Russian chatbot Eugene Goostman "passed" the Turing Test     by convincing one-in-three Loebner Prize judges that it was a 13-year-old non-native English-speaking Ukrainian boy. The bot used tricks, rather than bona fide intelligence, to win. That's clearly not what Turing intended.  In light of incidents like these, and in consideration of the test's inherent weaknesses, a number of thinkers have put forth ideas on how the Turing test could be improved, modified, or replaced altogether. 1. Winograd Schema Challenge Hector Levesque, a professor of Computer Science at the University of Toronto, says that chatbots are effective at fooling some judges into thinking they're human. But such a test, he says, merely reveals how easy it is to fool some humans — especially via short, text-based conversations.  

To remedy this, Levesque devised the Winograd Schema Challenge (WSC), which he says is a superior alternative to the Turing Test. Named after Stanford University computer scientist Terry Winograd, the test presents a number of multiple-choice questions within a very specific format.  

Here are some examples:  

Q: The trophy would not fit in the brown suitcase because it was too big (small). What was too big (small)?  

Answer 0: the trophy  Answer 1: the suitcase   

Q: The town councillors refused to give the demonstrators a permit because they feared (advocated) violence. Who feared (advocated) violence?   

Answer 0: the town councillors  

Answer 1: the angry demonstrators   


If the first question is posed with the word "big," the answer is "0: the trophy." If it is posed instead with the word "small," the answer is "1: the suitcase." 

The answer to the second question is similarly dependent upon whether the sentence incorporates the word "feared" or "advocated." 

The answers to these questions seem pretty simple, right? Sure – if you're a human. Answering correctly requires skills that remain elusive for computers, such as spatial and interpersonal reasoning, knowledge about the typical sizes of objects, how political protests unfold, and other types of commonsense reasoning.2. 


The Marcus Test NYU cognitive scientist Gary Marcus is an outspoken critic of the Turing Test in its current format. Along with computer scientists Manuela Veloso and Francesca Ross, he recently chaired a workshop on the importance of thinking " Beyond the Turing Test." The event brought together a number of experts who came up with some interesting ideas, some of which appear on this list. Marcus himself has devised his own alternative, which I'm calling the Marcus Test.  

Here's how he explained it to The New Yorker:  

[B]uild a computer program that can watch any arbitrary TV program or YouTube video and answer questions about its content — “Why did Russia invade Crimea?” or “Why did Walter White consider taking a hit out on Jessie?” Chatterbots like Goostman can hold a short conversation about TV, but only by bluffing. (When asked what “Cheers” was about, it responded, “How should I know, I haven’t watched the show.”) But no existing program — not Watson, not Goostman, not Siri — can currently come close to doing what any bright, real teenager can do: watch an episode of “The Simpsons,” and tell us when to laugh.  


Great idea! If a computer can truly detect and comprehend humor, sarcasm, and irony — and then explain it in a meaningful way — then there must be some serious cogitations going on inside its silicon skull. 


3. The Lovelace Test 2.0 Named in honor of Ada Lovelace (pictured) — the world's first computer programmer — this test aims to detect an artificial intelligence by gauging its capacity for creativity. The test was originally developed in 2001 by Selmer Bringsjord and colleagues, who contended that, if an artificial agent could create a true work of art in a way that was inexplicable to its developer, there must be a human-like intelligence at work.  


The Lovelace Test was recently upgraded by Georgia Tech professor Mark Riedl to remedy the ambiguity and subjectivity implicit in this approach.  

The basic rules of the Lovelace 2.0 Test of Artificial Creativity and Intelligence go like this:  The artificial agent passes if it develops a creative artifact from a subset of artistic genres deemed to require human-level intelligence and the artifact meets certain creative constraints given by a human evaluator.  


The human evaluator must determine that the object is a valid representative of the creative subset and that it meets the criteria. (The created artifact needs only meet these criteria — it does not need to have any aesthetic value.)  A human referee must determine that the combination of the subset and criteria is not an impossible standard.  For example, the judge could ask the agent in question to create a jazz piece in the spirit of Dave Brubeck, or paint a Monet-like impressionist landscape. Then judge will then have to decide how well the agent fared in this task given the requirements. So unlike the original test, the judges can work within a defined set of constraints, and without having to make value judgements. What's more, the test makes it possible to compare the relative intelligence of different agents. 


4. The Construction Challenge Charlie Ortiz, senior principal manager of AI at Nuance Communications, came up with this one. Formerly known as the Ikea Challenge, this test is an effort to create a physically embodied version of the Turing Test. A fundamental weakness of the Turing Test, says Ortiz, is that it focuses on verbal behavior while neglecting two important elements of intelligent behavior: perception and physical action. Computers subjected to the Turing Test, after all, don't have eyes or hands. 


As Ortiz pointed out to io9, "These are significant limitations: the field of AI has always assigned great importance to the ability to perceive the world and to act upon it." Simon Butler/Ikea Ortiz's Construction Challenge is a way to overcome this limitation. Here's how he described it to io9:  In the Construction Challenge, a set of regular competitions will be organized around robots that can build physical structures such as Ikea-like modular furniture or Lego structures. To do this, a robot entrant will have to process verbal instructions or descriptions of artifacts that must be built, manipulate physical components to create the intended structures, perceive the structures at various stages of construction, and answer questions or provide explanations during the construction.   


A separate track will look at scenarios involving collaborative construction of such structures with a human agent. Another track will investigate the learning of commonsense knowledge about physical artifacts (as a child might) through the manipulation of toys, such as Lego blocks, while interacting with a human teacher.   The added benefit of creating such a challenge is that it could foster the development of robots that can succeed in many larger-scale construction tasks, including setting up camps, either on Earth or beyond. 


5. The Visual Turing Test Like Ortiz's challenge, the Visual Turing Test is an effort to diminish the natural language bias implicit in Turing's original test. Computer scientists Michael Barclay and Antony Galton from the University of Exeter in the U.K. have developed a test that challenges a machine to mimic the visual abilities of humans.  Humans and software were asked a simple question about the scene depicted above: "Where is the coffee cup?" As you can see each of the multiple choice answers is technically correct — but some, Barclay and Galton note, can be considered more "correct" (i.e. more "human") than others. As Celeste Biever and Richard Fisher explain at New Scientist:  The ability to describe to someone else where an object is relative to other things sounds like a simple task. In fact, making that choice requires several nuanced and subjective judgements, including the relative size of objects, their uniqueness relative to other objects and their relevance in a particular situation. Humans do it intuitively, but machines struggle.  New Scientist has an interactive version of the test, which challenges you to identify "human" answers from those typical of a computer. You can take it for yourself here. 


6. The Reverse Turing Test What if we switched things around a bit, and rejigged the test such that the machine had to be capable of identifying a human? Such a "test" currently exists in the form of CAPTCHAs — those annoying anti-spam procedures. If the test-taker can accurately transpose a series of wobbly characters, the computer knows it's dealing with a human.  This verification technique has given rise to an arms race between CAPTCHA and the developers of CAPTCHA-busting bots    ; but this game of one-upmanship could conceivably lead to evaluative systems that are exceedingly good at identifying humans from machines. It's anyone's guess what such a system might look like in practice, but the case can be made that a machine's ability to recognize a human via a conversation is itself a reflection of intelligence. 


7. Digital Dissection We need more than behavioral tests to prove that a machine is intelligent; we also need to demonstrate that it contains the cognitive faculties required for human-like intelligence. In other words, we need some proof that it possesses the machine equivalent of a complex and dynamic brain (even if that brain amounts to a series of sophisticated algorithms    ). In order to accomplish this, we'll need to identify the machine-equivalents of the neural correlates of consciousness (NCC). Such an understanding would, in theory, let us know whether we're dealing with a simulation (a "pretend" mind) or a bona fide emulation.  This is all easier said than done; neuroscientists are still struggling to define NCCs in humans, and much about the human brain remains a mystery    . As a viable alternative to the Turing Test, we'll have to set this one aside for now. But as a potential pathway towards the development of an artificial brain     — and even artificial consciousness (AC) — it hold tremendous promise. 


8. All of the Above As shown by the work of Gary Marcus and others, the point of all this isn't necessarily to create a successor to the Turing Test, but rather a set of tests. Call it the Turing Olympics. By confronting an AI with a diverse set of challenges, judges stand a far better chance of distinguishing bot from human. One Last Consideration: Revise the Rules of the Loebner Prize All this being said, some experts don't believe the current limitations of the Turing Test don't have to do with the test itself, but the ways in which it's conducted and judged. Writing in Spectrum IEEE, Lee Gomes explains:  Harvard’s Stuart Shieber, for example, says that many of the problems associated with the test aren’t the fault of Turing but instead the result of the rules for the Loebner Prize, under the auspices of which most Turing-style competitions have been conducted, including last summer’s. Shieber says that Loebner competitions are tailor-made for chatbot victories because of the way they limit the conversation to a particular topic with a tight time limit and encourage nonspecialists to act as judges. He says that a full Turing test, with no time or subject limits, could do the job that Turing predicted it would, especially if the human administering the test was familiar with the standard suite of parlor tricks that programmers use to fool people.  Would these considerations constitute an improvement? Absolutely. But they still don't get around the bias toward natural language processing skills. Contact the author at george@io9.com and follow him on twitter

SLE 1.1 - Intro to Deep Learning

Introduction to Deep Learning

=============================

Introduction to Deep Learning with Alexander Amini and Ava. Learn about the foundations of deep learning and AI through hands-on software labs.

Introduction to deep learning and AI at MIT

Deep learning has experienced a huge resurgence and many incredible successes over the past decade and is being used to generate new types of data

This course covers the fundamentals of building and teaching computers to learn tasks directly from data, through both lectures and hands-on software labs.

This course emphasizes deep learning techniques

Lectures cover deep learning foundations, software labs, and project competitions

Guest lectures from industry and academia on cutting-edge AI advancements

Perceptrons use three steps for forward propagation: multiply inputs with weights, add the results, and pass though a non-linear function.

Non-linear functions are necessary to capture non-linear patterns in data.

Common non-linear activation functions include sigmoid and relu.

Neural networks can be built by stacking perceptrons as layers

Perceptron has three steps: dot product, bias, and non-linearity

Layers are connected fully and can be stacked for deep neural networks

Training a neural network involves finding the set of weights that result in the smallest loss function averaged across the entire dataset

Loss function measures the distance between predicted and true values, minimizing this function results in more accurate predictions

Finding the optimal weights involves minimizing the loss function across the entire dataset

Gradient descent and back propagation are the core of training neural networks.

Gradient descent starts at a random weight location, computes the gradient and updates the weights to minimize the loss function.

Back propagation propagates gradients from output to input to determine how much a small change in a weight affects the loss function.

The learning rate affects the speed and accuracy of neural network convergence

The learning rate can be challenging to set and can have large consequences for neural network training

Adaptive learning rate algorithms can adjust the learning rate based on the neural network's landscape

Batching data into mini-batches can improve gradient accuracy and increase learning rate

Regularization techniques prevent overfitting in neural networks

Dropout is a technique to disable random neurons at each training iteration to form an ensemble of different models that teaches neural networks to generalize better.

Early stopping is a technique that enables us to monitor our neural network's performance on a test set and stop the training process at the point where it overfits to our training data but underfits the test data.

SLE 2 - Deep Generative Modeling

Deep Generative Modeling

========================

Learn about deep generative modeling, a subset of deep learning, that can generate new data instances based on learned patterns in unsupervised learning.

Generative modeling is a powerful subset of deep learning that can generate new data instances based on learned patterns.

Generative modeling falls under unsupervised learning, where we take data without labels and try to build a model that can understand the hidden underlying structure of that data.

Generative modeling takes two general forms: density estimation and sample generation. Density estimation involves training a model that learns an underlying probability distribution that describes where the data came from, while sample generation uses that model to generate new instances that are similar to the data that we've seen.

Generative modeling can be used to uncover the underlying features in a data set and encode it in an efficient way, which can be useful in tasks such as facial detection and outlier detection in autonomous cars.

Autoencoders are a type of generative model that use a low-dimensional latent space to efficiently encode and decode data.

Autoencoders map data into a low-dimensional latent space, which is an encoded representation of underlying features.

The goal is to train the model to predict these features, and the low-dimensional space is an efficient, compact encoding of the data.

The autoencoder builds a way to decode the latent variable vector back up to the original data space, generating a reconstructed output.

The network is trained by minimizing the distance between the input data and the reconstructed output, without requiring any labels.

The dimensionality of the latent space has a huge impact on the quality of the generated reconstructions and the efficiency of the encoding.

Variational autoencoders introduce randomness to generate new data instances.

VAEs replace deterministic layers with a random sampling operation.

VAEs define a probability distribution over latent variables to obtain a probabilistic representation of the latent space.

Regularization with a standard normal prior helps enforce continuity and completeness in the latent space of VAEs.

Regularization term captures distance between encoding of latent variables and prior distribution.

Standard normal prior allows for smooth distribution of encoding and prevents divergence from smooth space.

VAEs use re-parametrization to enable end-to-end training

Re-parametrization diverts randomness to a constant, allowing for backpropagation

VAEs use latent variables to capture meaningful features and encourage independence

Beta VAEs encourage disentanglement in latent variables

By introducing a weighting constant, beta, in VAEs, we can encourage greater disentanglement and efficiency in encoding.

Empirically, beta VAEs have been observed to produce more constant features in image reconstructions compared to standard VAEs.

GANs involve a generator and discriminator competing with each other.

The generator produces fake data to try and fool the discriminator.

The discriminator tries to distinguish between real and fake data.

Generative modeling using GANs allows for the creation of completely synthetic new instances of data.

GANs use adversarial training to build a network that can synthesize synthetic examples that fool the best discriminator.

The generator component can be used to create new data instances by starting from random noise and learning a model that goes from random noise to the real data distribution.

Deep generative models and diffusion modeling are driving tools behind the tremendous advances in generative AI.

Deep generative models include latent variable models, autoencoders, and generative adversarial networks.

Diffusion models have the ability to imagine completely new objects and instances, making them the new frontier of generative AI.

SLE 3 - Crypt Arithmetic

SLE 4 - Deep Q Networks

Deep learning is a subset of machine learning that teaches algorithms to learn a task directly from raw data.

Deep learning allows for learning features from raw data.

Traditional machine learning algorithms rely on hand-engineered features.

Deep learning algorithms benefit from big data and modern hardware.

Activation functions introduce nonlinearities into data, allowing neural networks to approximate complex functions.

Linear activation functions are limited in their ability to separate nonlinear data.

Nonlinearities in activation functions allow for more powerful neural networks.

Neural networks can be composed of multiple layers and can be used to solve real-world problems.

Neural networks consist of layers with different weights and biases.

TensorFlow provides pre-implemented layers and models for easy implementation.

Neural networks need to be trained to perform tasks

Loss function quantifies error and tells network when it's wrong

Optimal set of weights must be found to minimize total loss function

Gradient descent is used to update weights in neural networks

Gradient descent involves taking the gradient of the loss with respect to weights to understand the direction of maximum ascent

Back propagation is used to compute the gradient of the loss with respect to all weights in the network

Setting the learning rate can be challenging, but adaptive learning rate algorithms can help.

Adaptive learning rate algorithms change the learning rate during training to increase or decrease depending on optimization progress.

Batching data into mini-batches can increase gradient accuracy estimation and allow for faster convergence.

Regularization techniques to prevent overfitting in neural networks

Dropout randomly sets some hidden neuron activations to zero during training to prevent memorization of training data

Early stopping identifies the point where the network starts to memorize training data and stops training to prevent overfitting


SLE 5 - Deep Reinforcement Learning

The speaker discusses the shift from using deep learning on fixed datasets to deep reinforcement learning, which allows agents to learn from their environment. This has implications for robotics and gameplay. The first three lectures covered supervised learning, while the fourth covered unsupervised learning.

Deep reinforcement learning allows agents to learn and act in environments without fixed datasets.

Reinforcement learning is different from supervised and unsupervised learning.

Agents are the central part of reinforcement learning algorithms and interact with environments through actions and observations.

Reinforcement learning involves maximizing rewards in an environment using a Q function.

Agents observe a state and take an action, receiving a reward in return.

The Q function predicts the expected total discounted reward for a given state-action pair, and the policy is to choose the action with the highest Q value.

Two classes of reinforcement learning algorithms: Q function learning and policy learning.

Q function learning involves learning the Q function directly and using it to determine policy.

Policy learning involves directly learning the policy without the intermediate Q function.

Q-learning with deep Q-networks can achieve superhuman performance on Atari games

Deep Q-networks take state as input and output Q-values for all possible actions

Target Q-value is the true Q-value obtained by rolling out the episode and adding rewards

Predicted Q-value is the output from the network, and the loss function is mean squared error

Q-learning has downsides with complex actions

Q value learning is well-suited for deterministic, discrete action spaces.

Q value learning cannot effectively model or parameterize continuous action spaces.

Policy gradient methods directly optimize the policy, allowing for continuous action spaces and stochastic environments.

Policy gradients can be used to train self-driving cars.

Policy gradients involve penalizing actions that lead to undesirable events and increasing the probability of actions that lead to desirable events.

The output of policy gradient networks can be parameterized with a mean and variance, allowing for continuous action selection.

Policy gradients can be used to maximize total discounted return in reinforcement learning.

The loss function for training policy gradients consists of the log likelihood of selecting an action given a state and the total discounted return received by taking that action.

Training in simulation using photorealistic simulation engines like Vista can allow for safe deployment of reinforcement learning agents in the real world.

Reinforcement learning used to train full-scale autonomous vehicles and beat human champions in Go.

Google DeepMind used reinforcement learning to train a neural network to imitate human moves in Go and then play against its own reinforcement learning agents, achieving superhuman capabilities.

AlphaZero used self-play to train neural networks entirely from scratch, without any need for pre-training with human experts.