[[[Former Title: Mathematics and Biology]]]
Mathematics, Biology and Metabiology
G. J. Chaitin, IBM Research
In Memoriam Jacob T. "Jack" Schwartz (1930-2009)[Lecture given at the Facultad de Ciencias Exactas of the University of Buenos Aires, 14 May 2009,
at the IBM T. J. Watson Research Center, Yorktown Heights, 5 June 2009,
and at the Frege Centre for Structural Sciences of the University of Jena, 20 June 2009.
The author thanks Veronica Becher, Pablo Jacovkis, Gustavo Stolovitzky, Olaf Dreyer, Robert Berwick,
David B. Searls, and other members of the Buenos Aires, IBM and Jena audiences for their helpful comments.]
AbstractIt would be nice to have a mathematical understanding of basic biological concepts
and to be able to prove that life must evolve in very general circumstances.
At present we are far from being able to do this.
But I'll discuss some partial steps in this direction
plus what I regard as a possible future line of attack.
Introduction: Goals of Our Theory
The Nature of Biological Information
- Algorithmic information theory has something to do with biology.
- This theory is based on looking at the size in bits of the smallest program to calculate something.
- This is like DNA, which is a kind of program for constructing and running an organism.
- Program-size complexity is a better model for biological information than Shannon entropy
even though it ignores run-time and 9 months is already a long pregnancy let alone 90 years.
- It is more or less the case that more complicated organisms have more DNA.
- Each base in DNA = two bits of software.
- Human = 3 × 109 bases = 6 × 109 bits of software.
- Biology is not predictive: these bits of information are frozen accidents.
- Physics is based on simple, elegant equations:
Theory of Everything on a T-shirt!
- Biology is the domain of the complicated:
No simple equation for your spouse or an ecosystem.
- Pure math is more like physics than like biology, right?
- Pure math contains infinite irreducible complexity:
the bits of the halting probability Ω.
- (Ω is like the DNA for pure math, because it gives in
compressed form the answer to all individual cases of
Turing's halting problem.)
DNA as Digital Software (Jack!), Software OrganismsWhen I visited Jack at the Cold Spring Harbor Laboratory where he was working on molecular biology,
he emphasized to me that I should just think of DNA as digital software.
Let's take this idea and run with it: How about modeling organisms as digital software?
How about studying random walks in software space?
Two important places where thinking of organisms as digital software is helpful:
- Missing Intermediate Forms:
Small changes in software produce large changes in output!
(Amusing example: Humans and Chimps have almost the same DNA.)
- Major Transitions in Evolution such as from
Single-Celled Organisms to Multi-Cellular Organisms:
Just make the main program into a subroutine!
- Population is a single software organism that computes a positive integer.
- In order for them to evolve, we need to give our organisms something challenging to do.
- Fitness of the organism is the size of the integer it computes: the bigger the better.
This is a version of the Busy Beaver Problem and is equivalent to Turing's Halting Problem.
These are problems requiring an unlimited amount of mathematical creativity.
There is no mechanical procedure for solving them.
- We make a random mutation on our current organism and throw the result away unless it computes a bigger positive integer.
- If it computes a bigger integer, the mutated organism replaces our current organism.
- We are doing hill-climbing in a biological way, not by exhaustive search.
- For example,
N → N + N → N × N → NN → NNN.
f(N) → f(N+1) → f2(N+1) → fN(N+1).
(With a specific function f and integer N.)
- To avoid getting stuck on a local maximum, there must be a nonzero probability to go from any program to any program.
But most likely only a single bit is changed, deleted or inserted.
- This random walk must be ergodic, that is, cover the entire space of possible programs.
- We will tend to evolve concise algorithmic descriptions of extremely large numbers.
- The fitness = size of the integer will grow faster than any computable function of the time
(the time = the total number of mutations that have been considered),
and will also grow faster than any computable function of the size of the program.
- This shows that creativity is taking place.
- If there were no creativity, if changes were made mechanically, the fitness would be a computable function of the time and size.
- More precisely, in time growing exponentially in N the fitness
will be ≥ the Busy Beaver function of N bits, BB(N bits), which is
the biggest positive integer that can be named by a ≤ N-bit program,
and which grows super-exponentially.
- Problem with this proof that evolution works:
It shows that this process is creative, but it doesn't show that it's cumulative.
- The proof shows that evolution will progress, if necessary, by starting over again and again.
- I want to show that this very same model will work cumulatively, not by starting over again and again.
And that large positive integers will be named using a recursive hierarchy of fast-growing function definitions.
- I can't do this yet, but I think I can see how it might be done. [See the analogy below]
- This could perhaps enable us to improve our estimate of the rate of evolution.
What Next? The Following Analogy U is to V as X is to Y:
- Darwin's Theory of Evolution of Living Organisms:
- Artificial Selection by Animal and Plant Breeders versus
- Natural Selection, Survival of the Fittest.
- Proposed Theory of Evolution of Software Organisms:
- Evolution of Large Man-Made Software Projects, Maintainability versus
- Evolvability of Large Randomly-Mutating Software Organisms.
- Desirable Traits for Maintainability and Evolvability of Software:
Orthogonal Design, Levels of Abstraction, Encapsulation.
- For both maintainability of man-made software and evolvability of mutating software;
it's important that a change needs to be made in only one limited location.
- Otherwise, man-made software is unmaintainable, and
evolving software will require simultaneous nonlocal coordinated mutations,
which is extremely unlikely.
- Evolution of Fortran Language:
Language features never discarded — upward compatibility.
Language contains its entire history — too expensive to start over.
Ontogeny recapitulates phylogeny.
- Development of a compiler for a new programming language:
I worked on such a project for IBM. Badly written code dies because it becomes
incomprehensible and it is no longer possible to fix bugs or to add new function.
Summary: What Does this Analogy Buy Us?
- Ontogeny recapitulates phylogeny — Your Inner Fish!
[Ernst Haeckel, Jena]
The development of the embryo recapitulates the evolutionary history of the organism.
And as Fortran illustrates, large software projects contain their history.
- Hierarchical structure of living organisms = software levels of abstraction.
Makes software easier to write, because programmer needs to
keep less information in his/her head at any given moment.
Makes software evolve better when subjected to random mutations,
because useful mutations can be small and localized,
instead of having to touch many places in the code.
- I think this discussion makes a convincing case that
a theory of the evolution of randomly mutating software is possible,
and that random walks in software space are worth studying.
- How biologically relevant such a theory may be remains to be seen.
- But that it will be an interesting new field of mathematics is now plausible.
However, carrying out this research will require a great deal of work.
Maybe you would like to work on these problems?
- I propose calling this new field metabiology: I hope that it will eventually develop into
a field parallel to biology, dealing with the random evolution of artificial software (computer programs)
rather than natural software (DNA), and simple enough that it is possible to prove rigorous theorems
or formulate heuristic arguments at the same high level of precision that is common in theoretical physics.
- Whether or not this happens, as the concepts of computation, information and complexity show,
mathematics is moving in a biological direction.
This trend will continue.
Note[Our random walk model was inspired by the stimulating critique of Darwinian evolution in
D. Berlinski, The Devil's Delusion, Crown Forum, New York, 2008. See especially pp. 192-195.
In a nutshell, model = Jack (digital software) + Berlinski (random walk) + Busy Beaver Problem.
Our model is an attempt to answer Berlinski's criticisms.]
[19 July 2009]