Metabiology

http://www.umcs.maine.edu/~chaitin/jack.html

[[[Former Title: Mathematics and Biology]]]

Mathematics, Biology and Metabiology

G. J. Chaitin, IBM Research

[gjchaitin_at_gmail.com]

In Memoriam Jacob T. "Jack" Schwartz (1930-2009)

[Lecture given at the Facultad de Ciencias Exactas of the University of Buenos Aires, 14 May 2009,

at the IBM T. J. Watson Research Center, Yorktown Heights, 5 June 2009,

and at the Frege Centre for Structural Sciences of the University of Jena, 20 June 2009.

The author thanks Veronica Becher, Pablo Jacovkis, Gustavo Stolovitzky, Olaf Dreyer, Robert Berwick,

David B. Searls, and other members of the Buenos Aires, IBM and Jena audiences for their helpful comments.]

Abstract

It would be nice to have a mathematical understanding of basic biological concepts

and to be able to prove that life must evolve in very general circumstances.

At present we are far from being able to do this.

But I'll discuss some partial steps in this direction

plus what I regard as a possible future line of attack.

Introduction: Goals of Our Theory

    • 200th anniversary of Darwin's birth, 150th anniversary of The Origin of Species.

    • The unreasonable effectiveness of mathematics in physics (Wigner) versus

      • The lack of effectiveness of mathematics in biology (Gelfand).

    • We wish to extract the fundamental mathematical ideas contained in biology.

    • We wish to prove theorems about extremely simple unrealistic models,

    • not run simulations of extremely complicated realistic models.

    • Our goal is not to realistically simulate biological evolution,

    • but to represent mathematically the fundamental biological principles of evolution

    • in such a manner that we can prove that evolution must take place.

    • This may be regarded as a toy model, but we do not see it as a toy,

    • we see it as a way to eliminate inessential distractions that only serve to

    • confuse the issues!

    • Theories are lies that help us to see the truth (Picasso).

    • Math is extremely single-minded and can only deal successfully with a single idea at a time

    • (Jack: The Pernicious Influence of Mathematics on Science).

    • If Darwin's theory of evolution is as fundamental, basic and general as most biologists think,

    • then it must be possible to extract some basic mathematical ideas from it.

    • Nothing makes sense in biology except in the light of evolution (Dobzhansky).

    • It is scandalous that we do not have a mathematical proof that evolution works!

    • I am a pure mathematician, not a biologist: I am trying to find the Platonic ideal of evolution,

    • the archetypical behavior, not the messy version that takes place in the real world!

    • The aim is proofs, not realistic simulations.

    • Another way our model differs from previous models:

    • Our goal is to understand biological creativity and the major transitions in evolution, not gradual changes.

    • Fisher-Wright population genetics studies changes in gene frequencies.

    • We are trying to model how new genes arise, and major changes such as

    • from single-celled to multi-cellular organisms.

The Nature of Biological Information

    • Algorithmic information theory has something to do with biology.

    • This theory is based on looking at the size in bits of the smallest program to calculate something.

    • This is like DNA, which is a kind of program for constructing and running an organism.

    • Program-size complexity is a better model for biological information than Shannon entropy

    • even though it ignores run-time and 9 months is already a long pregnancy let alone 90 years.

    • It is more or less the case that more complicated organisms have more DNA.

    • Each base in DNA = two bits of software.

    • Human = 3 × 109 bases = 6 × 109 bits of software.

    • Biology is not predictive: these bits of information are frozen accidents.

    • Physics is based on simple, elegant equations:

    • Theory of Everything on a T-shirt!

    • Biology is the domain of the complicated:

    • No simple equation for your spouse or an ecosystem.

    • Pure math is more like physics than like biology, right?

    • Wrong!

    • Pure math contains infinite irreducible complexity:

    • the bits of the halting probability Ω.

    • (Ω is like the DNA for pure math, because it gives in

    • compressed form the answer to all individual cases of

    • Turing's halting problem.)

DNA as Digital Software (Jack!), Software Organisms

When I visited Jack at the Cold Spring Harbor Laboratory where he was working on molecular biology,

he emphasized to me that I should just think of DNA as digital software.

Let's take this idea and run with it: How about modeling organisms as digital software?

How about studying random walks in software space?

Two important places where thinking of organisms as digital software is helpful:

    • Missing Intermediate Forms:

    • Small changes in software produce large changes in output!

    • (Amusing example: Humans and Chimps have almost the same DNA.)

    • Major Transitions in Evolution such as from

    • Single-Celled Organisms to Multi-Cellular Organisms:

    • Just make the main program into a subroutine!

A Toy Model of Evolution:

    • Population is a single software organism that computes a positive integer.

    • In order for them to evolve, we need to give our organisms something challenging to do.

    • Fitness of the organism is the size of the integer it computes: the bigger the better.

    • We make a random mutation on our current organism and throw the result away unless it computes a bigger positive integer.

    • If it computes a bigger integer, the mutated organism replaces our current organism.

    • We are doing hill-climbing in a biological way, not by exhaustive search.

    • For example,

    • N → N + N → N × N → NN → NNN.

      • Or

      • f(N) → f(N+1) → f2(N+1) → fN(N+1).

      • (With a specific function f and integer N.)

    • To avoid getting stuck on a local maximum, there must be a nonzero probability to go from any program to any program.

      • But most likely only a single bit is changed, deleted or inserted.

    • This random walk must be ergodic, that is, cover the entire space of possible programs.

    • We will tend to evolve concise algorithmic descriptions of extremely large numbers.

    • The fitness = size of the integer will grow faster than any computable function of the time

    • (the time = the total number of mutations that have been considered),

    • and will also grow faster than any computable function of the size of the program.

    • This shows that creativity is taking place.

    • If there were no creativity, if changes were made mechanically, the fitness would be a computable function of the time and size.

    • More precisely, in time growing exponentially in N the fitness

    • will be ≥ the Busy Beaver function of N bits, BB(N bits), which is

    • the biggest positive integer that can be named by a ≤ N-bit program,

    • and which grows super-exponentially.

    • Problem with this proof that evolution works:

    • It shows that this process is creative, but it doesn't show that it's cumulative.

    • The proof shows that evolution will progress, if necessary, by starting over again and again.

    • I want to show that this very same model will work cumulatively, not by starting over again and again.

    • And that large positive integers will be named using a recursive hierarchy of fast-growing function definitions.

    • I can't do this yet, but I think I can see how it might be done. [See the analogy below]

    • This could perhaps enable us to improve our estimate of the rate of evolution.

What Next? The Following Analogy U is to V as X is to Y:

    • Darwin's Theory of Evolution of Living Organisms:

        • Artificial Selection by Animal and Plant Breeders versus

        • Natural Selection, Survival of the Fittest.

    • Proposed Theory of Evolution of Software Organisms:

        • Evolution of Large Man-Made Software Projects, Maintainability versus

        • Evolvability of Large Randomly-Mutating Software Organisms.

    • Desirable Traits for Maintainability and Evolvability of Software:

    • Orthogonal Design, Levels of Abstraction, Encapsulation.

    • For both maintainability of man-made software and evolvability of mutating software;

    • it's important that a change needs to be made in only one limited location.

    • Otherwise, man-made software is unmaintainable, and

    • evolving software will require simultaneous nonlocal coordinated mutations,

    • which is extremely unlikely.

    • Examples:

      • Evolution of Fortran Language:

        • Language features never discarded — upward compatibility.

        • Language contains its entire history — too expensive to start over.

        • Ontogeny recapitulates phylogeny.

      • Development of a compiler for a new programming language:

        • I worked on such a project for IBM. Badly written code dies because it becomes

        • incomprehensible and it is no longer possible to fix bugs or to add new function.

Summary: What Does this Analogy Buy Us?

    • Ontogeny recapitulates phylogeny — Your Inner Fish!

    • [Ernst Haeckel, Jena]

      • The development of the embryo recapitulates the evolutionary history of the organism.

      • And as Fortran illustrates, large software projects contain their history.

  • Hierarchical structure of living organisms = software levels of abstraction.

    • Makes software easier to write, because programmer needs to

    • keep less information in his/her head at any given moment.

      • Makes software evolve better when subjected to random mutations,

      • because useful mutations can be small and localized,

      • instead of having to touch many places in the code.

Conclusion

    • I think this discussion makes a convincing case that

    • a theory of the evolution of randomly mutating software is possible,

    • and that random walks in software space are worth studying.

    • How biologically relevant such a theory may be remains to be seen.

    • But that it will be an interesting new field of mathematics is now plausible.

    • However, carrying out this research will require a great deal of work.

    • Maybe you would like to work on these problems?

    • I propose calling this new field metabiology: I hope that it will eventually develop into

    • a field parallel to biology, dealing with the random evolution of artificial software (computer programs)

    • rather than natural software (DNA), and simple enough that it is possible to prove rigorous theorems

    • or formulate heuristic arguments at the same high level of precision that is common in theoretical physics.

    • Whether or not this happens, as the concepts of computation, information and complexity show,

    • mathematics is moving in a biological direction.

    • This trend will continue.

Note

[Our random walk model was inspired by the stimulating critique of Darwinian evolution in

D. Berlinski, The Devil's Delusion, Crown Forum, New York, 2008. See especially pp. 192-195.

In a nutshell, model = Jack (digital software) + Berlinski (random walk) + Busy Beaver Problem.

Our model is an attempt to answer Berlinski's criticisms.]

[19 July 2009]