Math‎ > ‎G J Chaitin‎ > ‎


[[[Former Title: Mathematical Philosophy — Philosophy Done Mathematically]]]

Metaphysics, Metamathematics and Metabiology 
A mathematical analysis of the scientific method, the axiomatic method, and Darwin's theory of evolution

G. J. Chaitin, IBM Research

In Memoriam Jacob T. "Jack" Schwartz (1930-2009)

[Outline of a course given August 2009 in the Programa de História das Ciências e das Técnicas e Epistemologia (HCTE) at the Universidade Federal do Rio de Janeiro (UFRJ), to be published as a book in Spanish by Midas (Valparaíso). This was a three-week course, with two two-hour lectures each week. There are links to handouts and also additional reading given for each topic.]
Sans les mathématiques on ne pénètre point au fond de la philosophie. 
Sans la philosophie on ne pénètre point au fond des mathématiques. 
Sans les deux on ne pénètre au fond de rien. — Leibniz 

[Without mathematics we cannot penetrate deeply into philosophy. 
Without philosophy we cannot penetrate deeply into mathematics. 
Without both we cannot penetrate deeply into anything.]

First Week: Metaphysics

Handout #1Handout #2Handout #3Handout #4

Lecture 1 — Epistemology: What is a Scientific Theory?

  • Leibniz's Analysis of the Scientific Method 
    Discours de métaphysique, 1686, sections V and VI:
    • Continuous model: experimental data consists of real numbers.
    • Theory is an equation passing through the data points.
    • There is always such an equation!
    • Is a good theory only if the equation is simple, not if is very complicated.
    • Hermann Weyl's 1932 formulation of this: 
      If arbitrarily complicated laws are permitted, then there is always a law, and the concept of law becomes vacuous!
    • So the concept of a scientific law subsumes the concept of complexity. 
      But as Weyl points out it is not clear how to measure the complexity of an equation!
  • The corresponding Algorithmic Information Theory (AIT) version: 
    [Solomonoff, Kolmogorov, Chaitin, 1960s]
    • Discrete model: theory and experimental data are now both finite strings of bits..
    • Theory is a program for calculating the observations and must be a compression. 
      In other words, the theory must have fewer bits than the data it explains.
    • Smallest ("elegant") program is best theory. 
      [Second week we will show that we cannot be sure we have the best theory.]
    • Measure complexity of theory/program in bits of software.
  • DNA/Software and Organism/Hardware:
    • DNA is software for calculating the organism.
    • Count bases instead of bits!
    • This gives us a very approximate way to measure the complexity of organisms, in terms of the amount of information in their DNA.
    • [Important to get biology into this course right away since the last week is on biology.]
  • Miracles and Induction/Prediction
    • Simpler world is more perfect; need for miracles indicates imperfections.
    • Simplest theory is best.
    • What to expect after a billion repetitions of 0? Or after a trillion repetitions of 01? Suddenly changing the pattern is more complicated.
  • Does Complexity Depend on Your Choice of Language?
    • Goodman: Green/Blue, Grue paradox
    • The Search for the Perfect Language
    • Optimal, Concise, Succinct Universal Turing Machines: U(πC p) = C(p)
  • Practical applications: Machine learning, Bayesian/non-parametric statistics

Lecture 2 — Ontology: The Labyrinth of the Continuum

  • Is the Universe Discrete? Quantum Gravity, Thermodynamics of Black Holes, and the Holographic Principle (Hawking, Bekenstein, Susskind, t'Hooft) suggest that a physical system can contain at most a finite number of bits of information, which in fact grows as the surface area of the boundary of the system, not as the volume of the physical system. [So if this is correct, then three-dimensional physical systems are in a sense actually two-dimensional, whence the term "holographic" principle.]
  • AIT's discrete model of scientific theories works better if the universe is discrete.
  • In addition to physical arguments against continuity and in favor of discreteness, there are also mathematical arguments, basically because real numbers contain an infinite amount of information. These mathematical arguments may be regarded as modern versions of Zeno's paradoxes (the paradox that Achilles cannot catch the tortoise, and of the immobile arrow in flight).
  • [Hermann Weyl, who was the first to discover the importance of Leibniz's ideas on complexity in the Discours de métaphysique, in the same 1932 book where he presents this, The Open World, gives a modern Zeno argument. If you really believe that time is infinitely divisible, then you can perform an infinite number of computations by performing the first step of the computation in 1/2 a second, the second step in 1/4 of a second, the third step in 1/8 of a second, and so forth and so on, so that an infinite number of steps are completed in exactly 1 second! And nobody thinks this is possible. I began studying Leibniz after reading Weyl's The Open World.]
  • Three [Four!] Proofs of the Existence of Uncomputable Real Numbers:
    1. Borel's 1927 know-it-all, oracle real: 
      You need to number all possible Yes/No questions and then you use the Nth digit of a real number to answer the Nth question.

      [Give Borel's simple original version in which each digit answers one question, and also the more complicated rigorous version in which the Nth digit tells us about the Nth string of symbols in the English alphabet (as given in my Conversations with a Mathematician). In the rigorous version of Borel's number, the Nth digit tells us if the Nth text is a syntactically valid English text, then if it is a Yes/No question, then if it has an answer ("Is the answer to this question No?" has no Yes/No answer), then whether the answer is Yes or No. This avoids the problem with Borel's original formulation of having to be able to enumerate all possible Yes/No questions (some of which don't even have an answer). It is much easier to enumerate all the possible finite texts that can be written using a finite set of characters.]

    2. Turing's 1936 proof: 
      (Cantor's diagonal method applied to the list of all computable reals) 
      By construction, the Nth digit of Turing's uncomputable real is different from the Nth digit of the Nth computable real.

      [Borel's 1927 oracle real is much more uncomputable than the uncomputable real exhibited by Turing in 1936 using Cantor's diagonal method! But Borel does not give a formal definition of what a computable real number is. That was first done by Turing in 1936. Borel and Turing unfortunately seemed unaware of each other's work.]

    3. Probabilistic Proof: 
      (Proof in the spirit of non-denumerability of the continuum argument 
      in Courant and Robbins, What is Mathematics?)

      Cover the Nth computable real with an interval of size ε/2N, giving a covering of all computable reals with total size ≤ ε × (1/2 + 1/4 + 1/8 + ...) = ε, which can be made as small as desired by taking ε sufficiently small.

      Reals are uncomputable with probability one!

      Most real numbers are unreal!

      [Indeed, Borel, Les nombres inaccesibles, 1952, points out that with probability one, a real cannot even be named uniquely (let alone computed).]

    4. Fourth Proof: 
      [We will encounter one more uncomputable real number next week, namely the halting probability Ω. It is maximally uncomputable, maximally unknowable.]

Additional Reading:

  • Nadler, The Best of All Possible Worlds: A Story of Philosophers, God and Evil, pp. 132-133
  • Smolin, Three Roads to Quantum Gravity
  • Chaitin, Thinking about Gödel and Turing: Essays on Complexity, 1970-2007 
    [Contains the first three handouts and much more besides.]

Second Week: Metamathematics

Continue using Handout #1Handout #2Handout #3Handout #4

Lecture 3 — Limitations of Formal Axiomatic Theories

  • Hilbert on formal axiomatic theories and his belief in a TOE (Theory of Everything) for pure mathematics: 
    Mechanical procedure for checking proofs. Absolute certainty! Truth is objective, not subjective.
  • Important Note: Running through all possible proofs and checking which ones are correct gives you a mechanical procedure for obtaining all the theorems in your theory! So to determine the truth of any assertion A, just calculate until you find a proof of A or a proof that not A! This goes back to the medieval search for the perfect language giving access to absolute truth (and magical powers!) recounted in Umberto Eco's book The Search for the Perfect Language, including Ramon Llull's mechanical combinations of concepts and Leibniz's characteristica universalis in which an error of reasoning is an error in calculation.
  • Let's measure the complexity of a formal axiomatic theory by the size in bits of the smallest program for generating all the theorems. An N-bit theory is one that requires an N-bit program to generate all its theorems.
  • You can't prove that a program is elegant!

    More precisely, you need an N-bit theory in order to be able to prove that an N-bit program is elegant.

    Therefore a formal axiomatic theory can only prove that finitely many programs are elegant.

    [An elegant program is the best theory for its output. More precisely, a program is elegant if no smaller program written in the same language produces the same output. For every possible output there is at least one elegant program, therefore there are infinitely many elegant programs.]

  • Proof: Consider the paradoxical program P that produces the same output as the first provably elegant program Q with the property that Q is larger in size, has more bits than P itself does. Q cannot exist, for otherwise we get a contradiction.
  • First corollary: The unsolvability of the Halting Problem.

    [Question of whether a self-contained, inputless program will calculate forever or will eventually stop.]

  • Second corollary: The Busy Beaver function grows faster than any computable function.

    [Also give direct proof of this. We will use the BB function next week.]

    [The BB function of N is the largest positive integer you can name using a program having N bits or less.]

  • Philosophical and practical consequences: No TOE (Theory of Everything) for pure mathematics, No absolute truth, No certainty, A "quasi-empirical" view of math [Lakatos; heuristic proofs; math is not equal to physics, but perhaps it is closer to physics than most people think], Experimental mathematics.
  • [Despite the fact that there can be no TOE for pure mathematics as Hilbert hoped, mathematicians remain enamored with formal proof. See the special issue on formal proof of the AMS Notices, December 2008.]
  • [Another ironical fact: Although there is no "perfect language" for mathematical reasoning (even though Zermelo-Fraenkel set theory is very popular), there are universal languages for computing, for calculating. Instead of formal systems enabling us to prove all truths, we have found formal languages for expressing all possible algorithms. This is where Hilbert's formalist dream has been most successful, not where he expected that it would be. It became computer technology instead of providing mathematics with absolute certainty. Indeed, I would argue that the best, most perfect programming languages are the concise, optimal universal Turing machines used in AIT, because the fact that U(πC p) = C(p) means that these languages are maximally expressive. Furthermore, viewed from the perspective of the Middle Ages, programming languages do give us the God-like power to breathe life into (some) inanimate matter: hardware ≈ body, software ≈ soul!]

Lecture 4 — The Halting Probability Ω

  • Go from Borel's 1927 know-it-all real, to a real whose Nth bit solves the halting problem for the Nth computer program (which is an extremely redundant oracle for the halting problem), to the halting probability Ω (which is an extremely concise oracle for the halting problem). Key idea: Given N programs, can tell which ones halt if know how many of them halt, and this is about log2 N bits of information, not N bits of information. As before, our universal Turing machine must have suitably concise, succinct programs: U(πC p) = C(p). Furthermore, now programs must be self-delimiting. In other words, programs are halves, quarters, eighths, etc. of the unit interval, so the total measure assigned to all programs is ≤ 1 and we can talk about the probability that programs do something.
  • If you could know the first N bits of the numerical value of Ω, that would enable you to solve the halting problem for all programs up to N bits in size.
  • Ω is not just uncomputable, it is maximally uncomputable, completely incompressible.
  • Ω shows that there is infinite irreducible complexity in pure mathematics.
  • You need an N-bit theory in order to be able to determine N bits of Ω.
  • The bits of Ω are necessary truths, but they appear to be contingent, accidental truths, they appear to be true for no reason, no reason simpler than themselves. Thus they refute Leibniz's principle of sufficient reason, which states that anything that's true is true for a reason.
  • Ω can be viewed pessimistically, as a limit to knowledge, but it can also be viewed optimistically, as a proof that mathematics is not mechanical and requires creativity. Indeed, in a sense Ω is concentrated, distilled, compressed mathematical creativity. And we can measure the power of a formal axiomatic theory by how many bits of Ω it can determine.

    [A similar measure of the power of a formal axiomatic theory: the size in bits of the smallest (self-contained, inputless) program for which it can neither be proved that it halts nor that it fails to halt. A formal axiomatic theory can easily verify that every program that halts does so. And it can also prove in infinitely many cases that programs will never halt. But there will always be infinitely many programs that fail to halt for which our formal axiomatic theory cannot enable us to prove this.]

  • Pure Mathematics versus Physics & Biology: 
    Math has infinite complexity and in this precise sense is therefore closer to biology than to physics, where there is still hope of a TOE, a simple set of equations for everything.

    [Even though in the previous lecture we suggested that perhaps math should be done a little bit more like we do theoretical physics.]

    [This leads us right into biology and Week 3, the attempt to give a mathematical proof that Darwin's theory of evolution works, that the complexity of organisms will increase.]

Additional Reading:

  • Tymoczko, New Directions in the Philosophy of Mathematics 
    [A defense of quasi-empiricism.]
  • Borwein and Devlin, The Computer as Crucible: An Introduction to Experimental Mathematics
  • Wolfram, A New Kind of Science 
    [A massive piece of experimental mathematics: the systematic exploration of the computational universe.]
  • Chaitin, Meta Math! The Quest for Ω 
    [A general reference for the first two weeks of this course, with more on the history of Ω.]

    Also in Portuguese:


  • Eco, The Search for the Perfect Language 
    [To be read together with Meta Math!: The prehistory of the quest for absolute knowledge.]

    Also in Portuguese:

Third Week: Metabiology

Handout #5Handout #6Handout #7Handout #8

Lecture 5 — Problems with Darwinism that may be solved by a Digital Software Organism Toy Model

  • Mathematics is very helpful in many areas of physics, but not too helpful in biology. Biology is terra incognita as far as mathematics is concerned. Can this be fixed?
  • Introduction: Summarize material on biology in Lectures 1 and 4.

    [DNA as digital software, measuring biological information, proof that math is closer to biology than to physics]

  • Our methodology: Metabiology is a very theoretical theory!

    [So was our analysis of the scientific method. But our analysis of the axiomatic method was fairly realistic.]

  • Our goal: To prove mathematically that Darwinian evolution works, that organisms will increase in complexity, by studying an enormously simplified toy model.

    [Picasso — theories are lies that help us to see the truth!]

  • [We are most definitely not trying to perform detailed simulations of actual biological systems so that we can predict their behavior as in systems biology.]
  • Main idea: To exploit the analogy between natural (DNA) and artificial (computer program) software. This seems helpful for dealing with the following Darwinian perplexities:
  • Missing Intermediate Forms — No problem, small change in program can produce large change in output.
  • The Major Transitions in Evolution — No problem, going from single-cell organisms to multi-cellular organisms is just the idea of making the main program into a subroutine.

    Furthermore, note the similar hierarchical structure of living organisms and of large pieces of human-made software.

  • Increase in Complexity — Could this be a kind of software bloat? Is it related to the fact that ontogeny (more or less) recapitulates phylogeny?

    In large software projects it is easier to add new stuff than to change old stuff.

    Large natural (DNA) and artificial (computer program) pieces of software contain their own history!

  • Emergence of Concise Encodings — Why is there DNA? This leads to greater evolvability if we are doing point (bit-level) mutations.

Lecture 6 — Evolution of Randomly Mutating Software, Random Walks in Software Space

  • Population of one organism, which is a program that calculates a positive integer
  • Try mutations at random, use oracle for halting problem to avoid ones that never halt, pick organism that produces the largest positive integer (BB problem!)
  • I can prove that the functional complexity will increase indefinitely. This is easy to do, too easy, in fact. On the positive side, the useful complexity increases; this is not just software bloat. We are not just getting the complexity to increase by adding junk to the program. But I do not know how to prove that ontogeny recapitulates phylogeny, as it were, even though I conjecture this is usually the case. I think we need to do that in order for this approach to have any biological relevance or significance. In my proof that the functional complexity will increase, evolution just starts over, it is not cumulative, which does not seem at all biological.
  • Behavior of model depends on choice of programming language for digital organisms, whether mutations are at a high-level or machine language bit level, and on the probability distribution of the possible mutations.
  • Some sort of model like this should work. There is much to be done! I think this randomly mutating software approach does make Darwinian evolution more understandable.
  • I envision metabiology as a field parallel to biology, dealing with the random evolution of artificial software (computer programs) rather than natural software (DNA), and simple enough that it is possible to prove rigorous theorems or formulate heuristic arguments at the same high level of precision that is common in theoretical physics.

Additional Reading:

  • Berlinski, The Devil's Delusion: Atheism and its Scientific Pretensions 
    [An incisive discussion of gaps in Darwin's theory.]
  • Shubin, Your Inner Fish: A Journey into the 3.5-Billion-Year History of the Human Body 
    [How ontogeny recapitulates phylogeny.]

[Additional Things to Think About]

Can compare evolvability of two programming languages A, B with the same source of mutations using largest positive integer fitness function: Language A hasgreater evolvability than language B if with high probability random mutations make positive integer that is calculated grow at faster rate when using language A instead of language B.

A key point in evolutionary theory is the existence of paths leading from low complexity to high complexity organisms passing through a series of small evolutionary steps each of which is slightly advantageous. In other words, there must be a chain of plausible intermediate forms.

Example of an interesting path in software space: Organisms ON consist of a prefix followed by the first N bits of Ω (N = 1, 2, 3, ...) [reversed]. Prefix sees how long it takes the computable lower bounds on Ω to get the first N bits right. This is the smallest K such that (the sum of 1/2 raised to the size in bits of each program ≤ K bits in size that halts in ≤ K seconds) reproduces the N bits of Ω that we are given. These organisms ON are a very small mutation distance from each other [md(ON, ON+1) is small], and the integer each one calculates grows faster than any computable function of N. [Here we are using the programming language and mutation model in Handout #6.]

[[[Here is a better version of this. As before, organism ON consists of a prefix followed by the first N bits of Ω (N = 1, 2, 3, ...) [reversed]. This time the prefix π uses these bits of Ω in the standard way to determine each self-delimiting program |p| ≤ N that halts, and then the greatest positive integer K with program-size complexity H(K) ≤ N, which grows faster than any computable function of N. If we can protect this prefix π from mutations, there is fixed-size mutation-distance neighborhood {X: md(X, ON) ≤ C} of each of these organisms in which the fittest organism will be another organism in this sequence ON+C' that calculates a bigger positive integer. Assuming we have an oracle for the halting problem, this gives us a deterministic evolutionary algorithm instead of a random walk model as in Lecture 6 above. In effect, we are calculating Ω bit by bit.

This alternative model exhibits historicity, i.e., organisms contain their evolutionary history. But it is unsatisfactory from a biological point of view because there are no random mutations and because our software organisms do not exhibit the deeply-nested hierarchical structure that is such a conspicuous feature of biological organisms. Whether organisms can shield part of their DNA from mutations is also a relevant issue, although allowing this as an option is certainly possible when we are dealing with artificial software organisms. If DNA can be shielded when we have random mutations, then we cannot expect to prove that evolution occurs with probability one, only that it occurs with positive probability. [The random walk model in Lecture 6 evolves with probability one.]

Perhaps this example shows that evolution is too easy if we have an oracle for the halting problem? How about a version of my random-walk model for evolving programs which name large positive integers that works without an oracle? Maybe then we will get the desired deeply-nested hierarchical structure? Is there any connection with primitive recursive functions and subrecursive hierarchies? With work by Rózsa Péter? Must think about this.]]]

Call an infinite sequence of software organisms OK (K = 1, 2, 3, ...) an evolutionary history if the fitness function (positive integer that is output) grows and the mutation distance between successive organisms md(OK, OK+1) is bounded. Note that the optimal universal Turing machines used in AIT have the property that evolutionary histories are preserved when a fixed prefix πC is added at the beginning of each organism OK in the evolutionary history, because outputs are the same and mutation distances are still bounded. More precisely, a universal Turing machine U has the most evolutionary histories, because an evolutionary history for any Turing machine C will also be an evolutionary history for U. And any two universal Turing machines U and U' will have exactly the same set of evolutionary histories. [Again, here we are using the programming language and mutation model in Handout #6.]

[20 August 2009]