message14

From test@demedici.ssec.wisc.edu Wed Jun 7 12:29:35 2006

Date: Wed, 7 Jun 2006 12:24:55 -0500 (CDT)

From: Bill Hibbard <test@demedici.ssec.wisc.edu>

Reply-To: sl4@sl4.org

To: sl4@sl4.org

Cc: wta-talk@transhumanism.org, extropy-chat@lists.extropy.org,

agi@v2.listbox.com

Subject: Re: Two draft papers: AI and existential risk; heuristics and

biases

Eliezer,

> I don't think it

> inappropriate to cite a problem that is general to supervised learning

> and reinforcement, when your proposal is to, in general, use supervised

> learning and reinforcement. You can always appeal to a "different

> algorithm" or a "different implementation" that, in some unspecified

> way, doesn't have a problem.

But you are not demonstrating a general problem. You are

instead relying on specific examples (primitive neural

networks and systems that cannot distingish a human from

a smiley) that fail trivially. You should be clear whether

you claim that reinforcement learning (RL) must inevitably

lead to:

1. A failure of intelligence.

or:

2. A failure of friendliness.

Your example of the US Army's primitive neural network

experiments is a failure of intelligence. Your statement

about smiley faces assumes a general success at intelligence

by the system, but an absurd failure of intelligence in the

part of the system that recognizes humans and their emotions,

leading to a failure of friendliness.

If your claim is that RL must lead to a failure of

intelligence, then you should cite and quote from Eric Baum's

What is Thought? (in my opinion, Baum deserves the Nobel

Prize in Economics for his experiments linking economic

principles with effective RL in multi-agent learning systems).

If your claim is that RL can succeed at intelligence but must

lead to a failure of friendliness, then it is reasonable to

cite and quote me. But please use my 2004 AAAI paper . . .

> If you are genuinely repudiating your old ideas ...

. . . use my 2004 AAAI paper because I do repudiate the

statement in my 2001 paper that recognition of humans and

their emotions should be hard-wired (i.e., static). That

is just the section of my 2001 paper that you quoted.

Not that I am sure that hard-wired recognition of humans and

their emotions inevitably leads to a failure of friendliness,

since the super-intelligence (SI) may understand that humans

would be happier if they could evolve to other physical forms

but still be recognized by the SI as humans, and decide to

modify itself (or build an improved replacement). But if this

is my scenario, then why not design continuing learning of

recognition of humans and their emotions into the system in

the first place. Hence my change of views.

I am sure you have not repudiated everything in CFAI, and I

have not repudiated everything in my earlier publications.

I continue to believe that RL is critical to acheiving

intelligence with a feasible amount of computing resources,

and I continue to believe that collective long-term human

happiness should be the basic reinforcement value for SI.

But I now think that a SI should continue to learn recognition

of humans and their emotions via reinforcement, rather than

these recognitions being hard-wired as the result of supervised

learning. My recent writings have also refined my views about

how human happiness should be defined, and how the happiness of

many people should be combined into an overall reinforcement

value.

> I see no relevant difference between these two proposals, except that

> the paragraph you cite (presumably as a potential replacement) is much

> less clear to the outside academic reader.

If you see no difference between my earlier and later ideas,

then please use a scenario based on my later papers. That will

be a better demonstration of the strength of your arguments,

and be fairer to me.

Of course, it would be best to demonstrate your claim (either

that RL must lead to a failure of intelligence, or can succeed

at intelligence but must lead to a failure of friendliness) in

general. But if you cannot do that and must rely on a specific

example, then at least do not pick an example that fails for

trivial reasons.

As I wrote above, if you think RL must fail at intelligence,

you would be best to quote Eric Baum.

If you think RL can succeed at intelligence but must fail at

friendliness, but just want to demonstrate it for a specific

example, then use a scenario in which:

1. The SI recognizes humans and their emotions as accurately

as any human, and continually relearns that recognition as

humans evolve (for example, to become SIs themselves).

2. The SI values people after death at the maximally unhappy

value, in order to avoid motivating the SI to kill unhappy

people.

3. The SI combines the happiness of many people in a way (such

as by averaging) that does not motivate a simple numerical

increase (or decrease) in the number of people.

4. The SI weights unhappiness stronger than happiness, so that

it focuses it efforts on helping unhappy people.

5. The SI develops models of all humans and what produces

long-term happiness in each of them.

6. The SI develops models of the interactions among humans

and how these interactions affect the happiness of each.

If you demonstrate a failure of friendliness against a weaker

scenario, all that really demonstrates is that you needed the

weak scenario in order to make your case. And it is unfair to

me. As I said, best would be a general demonstration, but if

you must pick an example, at least pick a strong example.

I do not pretend to have all the answers. Clearly, making RL work

will require solution to a number of currently unsolved problems.

Jeff Hawkins' work on hierarchical temporal memory (HTM) is

interesting in this respect, given the interactions within the

human brain between the cortex (modeled by HTM) and lower brain

areas where RL has been observed (in my view RL is in a lower area

because it is fundamental, and the higher areas evolved to create

the simulation model of the world necessary to solve the credit

assignment problem for RL). Clearly RL is not the whole answer,

but I think Eric Baum has it right that it is critical to

intelligence.

I appreciate your offer to include my URL in your article,

where I can give my response. Please use this (please proof

read carefully for typos in the final galleys):

http://www.ssec.wisc.edu/~billh/g/AIRisk_Reply.html

If you take my suggestion, by elevating your discussion to a

general explanation of why RL systems must fail or at least using

a strong scenario, that will make my response more friendly since

I am happier to be named as an advocate of RL than to be

conflated with trivial failure. I would prefer that you not use

the quote you were using from my 2001 paper, as I repudiate

supervised learning of hard-wired values. Please use some quote

from and cite my 2004 AAAI paper, since there is nothing in it

that I repudiate yet (but you will find more refined views in my

2005 on-line paper).

Bill

p.s., Although I receive digest messages from extropy-chat,

for some reason my recent posts to it have all bounced. Could

someone please forward this message to extropy-chat?