Neural Networks and NLP Strategies - Part 2

NEURAL NETWORKS and NLP STRATEGIES -- PART II

By Joseph O'Connor and Brian Van der Horst

This is the second in a series of three articles aimed at updating the classic NLP model of strategies, or how we think human beings might think. In the last article, we pointed out that our current models of internal cognitive processes are based on the metaphor of the digital computer, which treats information in a lock-step, sequential procedure. Modern advances in neurobiology, neurocomputing and cognitive science have demonstrated that the brain is more a massively interconnected, parallel network of widely-distributed, decentralized, simultaneously operating constellations of processes.

Relax. We will endeavor to render this definition clear through examples and a review of some of the major developments in cognitive science.

Cognitive science is the interdisciplinary study of the mind which draws on neuroscience, linguistics, artificial intelligence, psychology, anthropology, cybernetics, and the philosophy of the mind.1 As such, there is much that they share with NLP, and much we can learn from their models and discoveries.

When we left off last time in the exciting story of modern cybernetics, John von Neumann had demonstrated in 1946 how to couple George Boole's digital logic with Alan Turing's syntactic calculating machine to turbo-charge ENIAC, the digital behemoth that ushered in the computer age. 2

Actually, von Neumann was convinced that he was modelling the brain. His famous "First Draft" paper for ENIAC spoke of neurons and biological metaphors. Though now credited as the father of modern computers, he felt they were simply a step towards creating "self-replicating automata," that would reproduce biological phenomena. It is interesting how our lives have been transformed by technology that was originally intended for a different use. Alexander Graham Bell thought the telephone would be a useful way to pipe music to people.

The following illustration of "Von Neumann Architecture" is the standard today for all digital computers, and was also the basis for "TOTE" model conceived by Miller, Galanter, and Pribram, and adopted in NLP as the structure for studying thinking patterns, or strategies.

The results of the work of Miller, Galanter and Pribram in 1950s was Plans and the Structure of Behavior . (DNA was not even discovered until 1960, the year of its publication.) This work also drew heavily from the research of Norbert Wiener, author of the landmark volume, Cybernetics; Warren McCulloch and Walter Pitts, who proposed the theory of neuronal communication as a threshold pattern of inactivity or activity; and that of Claude Shannon credited for creating the discipline of information science.

Of interest to NLPers, Plans co-author George Miller dates the official recognition of cognitive science as September 11, l956. On this day, at a MIT Symposium on information theory, scientists Allen Newell and Herbert Simon demonstrated the first complete proof of a theorem carried out on a computer; followed by a young linguist named Noam Chomsky, who presented "Three Models of Language," which was to become the foundation for the NLP Meta Model; followed by Miller's own paper "The Magic Number Seven, Plus or Minus Two.3"

This period in cognitive science is often referred to as the cognitivism era, or the"cybernetics phase of cognitive science. "4 It was characterized by the use of mathematical logic to understand the workings of the nervous system in terms of symbolic mental representation, the establishment of the metadiscipline of systems theory, the ultilzation of the statistical signal/communications information theory, and the first examples of self-organizing systems. Thinking was defined during this period as information processing: rule-based manipulation of symbols. Mental processes were thought to be unknowable, unconcious processes.

During the late 50's and early 60's, another era of cognitive science began-- the study of emergent properties of systems and connectionism. Emergence is an important idea. It means that after a threshold point, a system suddenly acquires a property that could not be predicted from the sum of its parts. The property emerges. For example, each of your eyes is only capable of seeing in two dimensions. Put the two together and voila, 3-D vision. Who could predict that such a complexity of nerve cells could give rise to this category of consciousness?

Gregory Bateson used to say that wherever there was feedback and sufficient complexity, you would find mental properties. Neuroscientists had known about the interconnectedness of the neurons of the brain for decades. Instead of representing neural communication as straight-foward cell-to-cell stimulus/response, the common model had become more like an interconnected star.

When people were oriented toward the single neuron model, thinking about the brain function produced models like the following, which you can find even today in many physiology texts5:

Connections in the visual pathways of mammals at the thalmic level:

Here the eye receives a light beam and sends a signal to a region in the thalamus called the lateral geniculate nucleus (LGN) and then to the visual cortex for processing.

When people like Karl Pribram (who wrote at least one paper with Chomsky) began studying the processes of vision in the primate brain, they found quite another story. The brain functions for vision were found to be massively interconnected with a constellation of regions that processed a patchwork of visual modalities such as shape, size, color, specular reflectance, 3-D orientation in space, distance, trajectory and rotation in concurrent, sub-networks.

Neuroscientists also found that the LGN, embedded in this network, was also sending the visual cortex and a great variety of cognitive expectations and memories stored in the hypothalmus and the limbic system. In NLP we call these sub-modalities and beliefs. The following revised map of the visual pathways is the brain does not support the traditional view of sequential processing.

Connections in the visual pathways of mammals at the thalmic level, revised:

In the words of cognitive scientist and anatomist Francisco J. Varela, recent research indicates "It is evident that 80 per cent of what any LGN listens to comes not from the retina but from the dense inconnectedness of other regions of the brain... Thus the behavior of the whole system resembles a cocktail party conversation much more than a chain of command." So whatever the brain looks at is really about 20% of signals from the outside world and 80% of pre-existing filters, memories and beliefs. Varela notes that this phenomenon is a uniform principle throughout the brain. The basic mechanism of recognition of a visual object or attribute "Could be said to be the emergence of a global state among resonating neuronal ensembles."6

How did cognitive scientists begin to make sense of this symphony of brain signals? In his1971 book, "Languages of the Brain," Karl Pribram proposed a holographic theory of brain function. Just as a hologram takes form from the resonance of interacting light wave, so could cognition be described as emerging from a confluence of interacting brain waves.

This next phase of cognitive science and neuro-computing is generally referred to as the "connectivist " era. One of the prime theorists of this epoch was John Hopfield, a physicist at California Institute of Technology who began to study the similarities and difference between biological processes and electronic computation.

Hopfield used the analogy of two gas molecules in a box. "They move around the box, and every once in a while they collide. If we put 10 or even 1,000 more molecules in the box, all we get is more collisions. But if we put a billion billion molecules in the box, suddenly there's a new phenomenon-- sound waves. Nothing in the behavior of the two molecules in the box, or ten or 1,000 molecules would suggest to you that a billion billion molecules would be able to produce sound waves. Sound waves are a collective phenomenon of a complex system."7

The brain, it was obvious to this theoretical physicist turned neuroscientist, was a biocomputer that performs collective computation. Hopfield did for neural networks what von Neumann did for digital computers: he demonstrated a mathematical model of neuron-like switches that could operate as a physical system. He also pointed out some major differences between biological and electronic computers:

           1. Brains can tolerate errors.

           2. Brains have a way of repairing and correcting themselves, or as we often term it, learning to learn.

           3. Brains can dissipate free energy, cool down, and settle down to a state of energy relaxation, setting thresholds for lesser and more important information, instead of overheating and blowing up like some computers I have worked with.

           4. Brains are highly interconnected, and can do many things at the same time.

           5. Brains can dream. Memories can be re-worked to give weighted levels of importance.

           6. Brains can forget. Humans have long and short-term memory, if they don't use some information recently or frequently enough, they push the irrelevant data back into the sub-concious. Otherwise we would have to re-think our entire past to remember our favorite color.

Hopfield went on to produce micro-chips (with AT&T) that could begin to do much of the above. But the next contribution to neurocomputing that we would like to mention now is the adaptive resonance theory proposed by Stephen Grossberg of Boston University.

Grossberg first published a series of differential equations describing the mathematics of networks in 1961 and 1964. By the 1970s he was proposing a controversial unifying model of learning theory, neurology, and mathematics.

His model was based on the neurological ability of associative networks to seek and set quenching thresholds of perception and activity. Thresholds are continually being adjusted so that the brain can pay attention to what contrasts and comparisons in the world are important for a given outcome. For example, the volume control in the ear can hear softer sounds at night, or tune out irrelevant noise in a streetcorner conversation; human night vision adjusts to see more in less light.

In NLP terms, many of the changes possible in sub-modality interventions are a function of passing from one threshold of representation to another.

Grossberg was interested in how brains learn. How do sensations in short-term memory become long-term memory perceptions? How does incoming sensation and memory result in a stable conciousness?

According to Grossberg's theory, attention, or conciousness, is a mixture of the resonance between raw sensation in short-term memory combined with myriad associations from long-term memory. Attention is a function of tunable filters for making neural networks sensitive to categories, or codes of experiences by adjusting the weights, and thresholds, of signals arriving to a given network.

Grossberg proposes that the eye tends to reach out for objects and experiences in the external world, by comparing old patterns of memories with what is seen. To explain this process, he has created a model of a cooperative/competitive hierarchy of functions that link sensory perceptions and the brain.

In his outstar learning theorem, Grossberg suggests that the brain sends out a pattern to the eye-- a sampling signal-- or to any of the organs of perception, which resonates with in-coming neural activity in a network to create a short-term memory pattern, which in humans tends to fade in about 15 seconds if not committed to long-term memory.

Grossberg proposes a complementary instar category development theorem that explains how repeated, persistant, or intense patterns of experience are installed in long-term memory by readjusting the synaptic weights of neural networks, and effecting lasting learning.

Another key parameter of adaptive resonance theory is vigilance, which sets the level of comparison, or mismatch between a particular long-term memory pattern and a short-term sensation from the outside world. Here is a diagram of Grossberg's Adaptive Resonance Theory model:8

Here we can see some opportunities for revising the NLP strategies model. The "vigilance" component in this model could correspond to the choice point in the TOTE model, and creating a system of notation for sub-modalities could reflect the Outstar/Instar functions of learning, which we tend to represent as the first test in a strategy (outstar) and the final test (instar).If we include what meta-program distinctions create the mismatch/reset interactions, we will have even more predictive ability in a strategy model.

We are not trying to explain such NLP distinctions on a neurological level, but showing the isomorphic pattern. A simple example. One man has a belief that people are basically untrustworthy. He does not see any evidence to the contrary, although there are many occasions when this could be so. These examples do not get into long term memory to set a different pattern. There will however, be a threshold, perhaps set by a metaprogram. When he sees enough instances he may flip to the opposite belief that people are basically trustworthy. Or one very intense example may also change his belief.

But before we begin suggesting specific applications of neurocomputing to NLP, let us take a look at the work of several scientists that are more from the artificial intelligence and linguistics, rather than the neural network side of the cognitive spectrum.

This is the domain of the enactive era of cognitive science. Here, cognition is defined not as the representation of a pre-existing world by pre-existing patterns in the mind, but as rather the enactment of the world and a mind on the basis of a history of a variety of actions that a being in the world performs.

Thinking is now defined as the accumulated series of experiences of the individual that form a structural coupling between a network of multiple levels of interconnected, sensorimotor sub-networks and the sum of the actions performed during the evolution of the individual. Varela, Thompson and Rosch give a nifty example.9

"In a classic experiment, Held and Hein raised kittens in the dark and exposed them to light only under controlled conditions. A first group of animals was allowed to move about normally, but each of them was harnessed to a simple carriage and basket that contained a member of the second group of animals. The two groups therefore shared the same visual experience, but the second group was entirely passive. When the animals were released after a few weeks of this treatment, the first group of kittens behaved normally, but those who had been carried around behaved as if they were blind: they bumped into objects and fell over edges. This beautiful study supports the enactive view that objects are not seen by the visual extraction of features, but rather by the visual guidance of action."10

Linguists George Lakoff and Mark Johnson suggest what they term an experientialist approach to cognition which complements the enactive paradigm. In their work, they feel "Meaningful conceptual structures arise from two sources (1) from the structured nature of bodily and social experience and (2) from well-structured aspects of bodily and interactional experience to abstract conceptual structures. Rational thought is the application of very general cognitive processes-- focusing, scanning, superimposition, figure ground reversal, etc-- to such structures."11

Human beings tend to think in general cognitive structures and metaphors such as "kinesthetic image schemas such as containers, part-whole, or source-path-goal patterns that originate in body experience," argues Johnson.12 When hundreds of color swatches were shown to various cultures around the world, nearly all people from all countries chose the same as the "best" example of a given primary color. When you look at the "best" example of blue, suggests Lakoff, the neurons fire in a particular optimum pattern common to the brains of everyone. "The color red isn't a property of the outside world,"says Lakoff. "It's a property of the mind."13

Another person suggesting that the mind has certain specific structural properties is artificial intelligence proponent Marvin Minsky, of MIT, who along with Seymour Papert criticizes the emergent theory of cognition put forth by connectionists. Minsky favors the proposition that the mind consists of many "agents" that chunk down problems of thinking into a microworld of agencies of cognition to form "The Society of Mind."

Then there is the school of psychoanalysis called object relations theory which tends to treat human beings as constellations of sub-identities. For Melanie Klein, the basic mental developmental process is described as the internalizing of a rich array of significant persons in one's life.

All-in-all, these approaches could be seen as isomorphic to the NLP "Parts Model," which allows us in the field a conceptual domain in which six-step reframing, v/k dissociation, and parts negotiation can work.

How well do neural network computers work? Neurocomputing is used now in everything from hand-written character recognition, to sonar detection, to stock trading and fraud protection in credit card companies. One of our favorite examples is Terrence Sejnowki's NETalk, a neural- based simulation system at John Hopkins University that uses only 300 neurons with 18,000 electronically-weighted connections to read text and synthesize speech.

This is an example of how we might begin to think about strategies. A key element in neurocomputing today is not just interconnecting two tiers of neurons, but in using hidden units , a third, intermediary level of neurons. As you can see from the chart below, hidden units use fewer connections between layers of information. In computers, this architecture tends to create emergent, representational pathways of meaning. Important choices are encoded as tests and choice-points in these units, much like traffic signals at busy intersections.

Sejnowkski used 203 input neurons to scan a sample of text in English at the output end, while 26 neurons compare the text with phonemes for pronounciation. Eighty hidden units in the middle compute for relationships. The result is hauntingly effective. The system starts with a random set of values that sounds like a continuous unintelligible wail. After a training session, the adjusted system groups into words. Although the system continues to mispronounce, its babble suddenly starts to sound more human, babylike, with familiar rhythms, and then, meaningful sounds begin to emerge.

NETalk learned to read a 100 word example text with 98% accuracy- about third grade reading level- in 16 hours. Even more remarkable, it could read other, similar texts with almost the same accuracy.14

A final example. Rodney Brooks at MIT constructs robots which approach the intelligence of insects. His approach is to find an escape from what he calls the "deception of AI," the tendency in intelligence for abstraction, for factoring out perception and motor skills. His goal: "to build completely autonomous robots, mobile agents that co-exist in the world with humans, and are seen by those humans as intelligent beings in their own right." His key move is working toward this goal not in the usual decomposition of a system by function , but rather a novel decomposition by activity. "Each activity, or behavior producing system individually connects sensing to action... An activity is a pattern of interactions with the world. Another name for our activities might well be skills empasizing that each activity can at least post facto be rationalized as pursuing some purpose.15"

The following chart is an example of how Brook's model breaks down each activity, or behavior producing system individually to connect sensing to action:

Brook's approach is producing some fascinating results, and is an example of the enactive era in neurocomputing, in which allowing a well-designed system to proceed and explore the real world creates new abilities of skill and intelligence. Isn't that what we have been trying to do with strategies in NLP?

Brooks' model for building robots bears an uncany resembles to the "operational format" competence-acquisition model of Leslie Cameron-Bandler, David Gordon and Michael Lebeau, elegantly described in their woefully under-appreciated 1984 book, The Emprint Method. Here the authors chunk down competences in terms of time contexts, organized by outcomes, activities and evaluative tests. For nearly 10 years, NLP has already had a model which can be used for creating interconnected, parallel-operating strategies! But few have taken the time to apply their work to strategy detection and design.

If we in NLP can begin to use the Emprint model with parallel and simultaneously interacting operative formats, and factor in the sub-modality and meta-program distinctions that indicate how people move from one threshold state to another, then we will have a model for strategies that will be closer to what science now knows about the brain and the mind.

It can even be easy, but not simple. Many of the strategies offered today are simplifications of mental processes, albeit "useful fictions." Memory, spelling, decision, motivation, and learning strategies as we now teach them could be linked together in networks to represent simultaneous processing. Meta-strategies like the behavior generator, the cores of reframing and other protocols, the popular Walt Disney, and the modelling projects of various practitioners produce results, but often require a lot of hallucination to follow the essential steps. If you want to take a hypnotic approach to NLP, this is serviceable, but sometimes produces unpredictable results.

We believe that NLP can profit from a more rigourous approach to modelling cognition. Many of the tools already exist. We have a strategy notation and a system of elegant distinctions in meta-programming and sub-modalities that could be married with the Emprint method and recent cognitive science.

In the next article, we will review how strategies are currently represented in NLP, and offer our suggestions of how we can begin to practice and pragmatically apply he principles and methods we have presented here. But right now, you can begin to examine for yourself, what are the strategies you use that interconnect with others? What are the feed-forward and feed-back loops between memory, decisions, motivations and learning sequences? Take a look at The Emprint Method-- how could we use this for representing simultaneous, parallel processes? What are the emergent qualities of a well-formed strategy? These are some of the questions we will try to address in our next article.

"The truth is, the science of Nature has been already too long made only a work of the brain and the fancy: It is now high time that it should return to the plainess and soundness of observations on material and obvious things." Robert Hooke, Micrographia (1665)

Bibliography

Allman, William F., Apprentices of Wonder-- Inside the Neural Network Revolution, New York: Bantam, 1989.

Brooks, R. A. Intelligence without Representation. MIT Artificial Intelligence Report, l987.

Cameron-Bandler, Leslie, David Gordon and Michael Lebeau, The Emprint Method, San Rafael, CA: Future Pace, 1985.

Evans, Christopher, The Micro Millennium, New York: Washington Square Press, 1981.

Gardner, Howard, The Minds' New Science: A History of the Cognitive Revolution. New York: Basic Books, 1985.

Lakoff, George , Cognitive Semantics. In Meaning and Mental Representations, ed. Umberto Eco et al. Bloomington: Indiana University Press, 1988.

Johnson, Mark, The Body in the Mind: The Bodily Basis of Imagination, Reason and Meaning. Chicago: University of Chicago Press, 1987,

Johnson, R. Colin, Cognizers-- Neural Networks and Machines That Think, New York: Wiley, 1988.

Miller, George A, Eugene Gallanter and Karl H. Pribram, Plans and the Structure of Behavior, New York: Holt, Reinhart and Winston,

1960.

Pribram, Karl H., Languages of the Brain, Experimental Paradoxes and Principles in Neuropsychology, New York: Prentice-Hall, 1971.

Varela, Franciso J., Evan Thompson, and Eleanor Rosch, The Embodied Mind, Cambridge, MA: MIT Press, 1991.

Varela, Franciso J., Connaitre Les Science Cognitives-- tendances et perspectives, Paris: Editions du Seuil, 1988

Footnotes

1 Gardner, The Mind's New Science

2 Evans, The Micro Millennium

3 Gardner, Ibid.

4 Varela, Thompson and Rosch, The Embodied Mind

5 Varela , Connaitre les Science Cognitives

6 Varela, et al., The Embodied Mind

7 Allman, Apprentices of Wonder

8 Johnson, Cognizers-- Neural Networks and Machines That Think

9 An illustration of this experiment appears in Pribram's Languages of the Brain

10 Varela, et al., The Embodied Mind

11 Lakoff, Cognitive Semantics

12 Johnson, The Body in the Mind

13 Allman, Apprentices of Wonder

14 Johnson, Cognizers

15 Brooks, Intelligence without Represenation