Author Archives: Michael Weiss

Set Theory Jottings 13. From Zermelo to ZFC: Everything’s a Set!

Prev TOC Next

Zermelo’s 1908 system differs from ZFC in three respects. (1) ZFC is a “pure” set theory, without atoms. (2) ZFC includes two additional axioms, Replacement and Foundation. (3) ZFC is a first-order theory. Zermelo’s axioms are not formalized in this way. In this and the next two posts I will go into details.

Everything’s a Set!

In Zermelo’s original conception, we begin with a foundation of so-called atoms (aka urelements), things like the natural numbers. We then build up the universe by forming sets of atoms, sets of sets, etc. Later people realized that with various “coding tricks”, you don’t need any atoms. We’ve already encountered the von Neumann finite ordinals: 0=∅, 1={0}, 2={0,1}, etc. So these can serve for the natural numbers.

Zermelo had no ordered pairs. Kuratowski came up with this trick: define the ordered pairx,y〉 to be the set {{x},{x,y}}. We can extract x and y from p=〈x,y〉={a,b} by noting that x is the only element of ab, and then y is the other element of ab, if ab has two elements. If ab has only one element, then of course y is that element. With Kuratowki’s ordered pairs on hand, we can deal with relations and functions without Zermelo’s awkward workarounds.

von Neumann defined ordinals by two properties. First:

A set X is transitive if baX implies bX. Equivalently, if aX implies aX.

Then:

An ordinal is a transitive set that is well-ordered under ∈.

We need transitivity, otherwise (for example) {5} would compete with {0} for the title of “1”.

I won’t go through the development of the ordinals in detail, but here are the first few steps. The precedessors of an ordinal are its elements: β<α iff β∈α. The successor of an ordinal α (denoted α+ in this post) is α∪{α}. If α is neither 0 nor a successor, it is a limit ordinal. By standard convention, λ always denotes a limit ordinal.

A finite ordinal (aka natural number) is defined as a non-limit ordinal (i.e., either a successor or 0), all of whose elements are also non-limit ordinals. If an ordinal is finite, so is its successor. The Axiom of Infinity says there is a set, say w, closed under α↦α+ and containing 0. Separation now says there is set, denoted ω, of all the finite ordinals in w. One proves that ω is the set of all finite ordinals.

Ω denotes the class of all ordinals. (Some authors use “On”.) Classes are not a formal part of ZFC, but we permit them as a “manner of speaking”. If φ(x) is a property of sets, we might talk about the class φ of all sets x satisfying φ(x), and write x∈φ to mean φ(x). For example, α∈Ω is just another way to say that α satisfies von Neumann’s two properties. We write V for the class of all sets, corresponding to the property x=x. Any set x is a class, corresponding to the property yx. A non-set class is called a proper class. After a fashion, proper classes are the descendents of Cantor’s “absolutely infinite multitudes”.

It is almost trivial to prove transfinite induction on Ω. Say φ(α) is a property of ordinals, and say we have this implication: If φ holds for all β<α, then it holds for α. Then φ holds for all ordinals. (Proof: If it doesn’t hold for α, then it doesn’t hold for some β<α. Since α is well-ordered, there is a least β<α for which it doesn’t hold. But then it holds for all γ<β, and hence for β.)

While Ω isn’t a well-ordered set, it shares enough properties to call it (informally) a well-ordered class. Namely, the ordinals are simply ordered: we will prove this in a later post. Any property φ of ordinals has a least ordinal satisfying it.

Definitions by transfinite recursion take a bit more fussing. I’ll leave the details to the textbooks, but basically: we have a “rule” determining a value sα whenever we are given all the values sβ with β<α. How are we given all these values? By a function fα with domain α, so fα(β)=sβ for all β<α. The “rule” is a predicate with two free variables, say ρ(x,y); we want ρ(fα,sα) to hold. The rule ρ should be “function-like”, i.e., for every x there is a unique y such that ρ(x,y) holds. You drop the function fα into the input hopper of ρ, and out pops a value sα. (If you’re wondering how s0 gets assigned, for α=0 the function is the empty function f0=∅.)

Using transfinite induction, one can show that for every α∈Ω there is a unique function gα with domain α+=α∪{α} that “obeys the recursion”. That is, for every β≤α, if fβ is the restriction of gα to β, then ρ(fβ,gα(β)). We say that we’ve defined gα by transfinite recursion up to and including the ordinal α.

Thus we have a function gα, with domain α+, for every ordinal α. For any β<α, gβ is the restriction of gα to β′. We can’t formally define a function with domain Ω, but we do have a “function-like” predicate γ(x,y) with “domain” Ω: γ(α,sα) holds if and only if gα(α)=sα.

Now that we have pairs, functions, and ordinals, we can populate our universe with all the usual flora and fauna. ℕ is the same as ω. The standard rigamarole then constructs integers, rationals, reals, Hilbert spaces,…

What about cardinalities? Frege and Russell both proposed “The cardinality of M is the set of all sets equivalent to M.” But a more elaborate version of Russell’s paradox kills this approach. Informally, we can talk about the class of all sets equivalent to M. We need to pick a representative from each such class.

von Neumann proposed: the first ordinal in the class. ℵ0 is just ω. ℵ1 is the first uncountable ordinal. 20 is whichever ordinal is the first one with cardinality c. The cardinality of M is thus the unique smallest ordinal μ such that M is equivalent to μ.

All this assumes that for any M, there is an ordinal equivalent to it. That follows from the well-ordering theorem, which uses the Axiom of Choice. If AC isn’t around, then you need a different idea. We’ll see one later.

Prev TOC Next

Leave a comment

Filed under History, Set Theory

From Kepler to Ptolemy 18

Prev TOC Next

The Astronomia nova

The full title of the Astronomia nova is The New Astronomy, based upon causes, or celestial physics, treated by means of commentaries on the motions of the star Mars, from the observations of Tycho Brahe. This hits all the high spots: the treasure trove of Tycho’s observations, Kepler’s new physics, and the “battles with Mars”.

Historians have discovered that the Astronomia nova, although ostensibly a blow-by-blow account of Kepler’s process of discovery, in fact was (as Stephenson puts it) “one sustained argument”, written and rewritten carefully. So I will not initially deal with the topics in the same order.

Tycho’s Observations

The observational accuracy available to Ptolemy and Copernicus is generally taken to be about 10′, i.e., 10 arcminutes (Babb; Thoren, p.11), although errors of 40′ and more in his star catalog are frequent (Pedersen, p.252). Tycho’s observations rarely erred by more than 4′, and often were accurate to 1′ (Thoren, pp.11–12).

But just as crucial was Tycho’s systematic approach, following each planet throughout its orbit:

Previous astronomers had been clever, observing chiefly at critical moments, such as oppositions to the sun, when their observations gave clear answers to the questions they wanted to ask. Tycho’s less directed program proved its worth, naturally enough, when Kepler’s research reached the point where he had to ask questions that had not been thought of before.

Stephenson (p.51)

We saw in post 4 that with a suitable choice of parameters, Ptolemy’s equant speed law and Kepler’s area law give the same speeds at perihelion and aphelion, and give nearly the same times for the planet’s arrivals at the quadrants. We will see that the data for the octants played a critical role in Kepler’s discoveries.

Actual Orbits

Thanks to the Planetary Hypotheses, we know that Ptolemy regarded his planetary models as physically real. By the time of Copernicus, however, most astronomers subscribed to the viewpoint expressed in Osiander’s (in)famous preface to De revolutionibus:

…it is the duty of an astronomer to compose the history of the celestial motions through careful and expert study. Then he must conceive and devise the causes of these motions or hypotheses about them. Since he cannot in any way attain to the true causes, he will adopt whatever suppositions enable the motions to be computed correctly from the principles of geometry for the future as well as for the past. The present author has performed both these duties excellently. For these hypotheses need not be true nor even probable. On the contrary, if they provide a calculus consistent with the observations, that alone is enough.

Cardinal Bellarmine enunciated this viewpoint in a letter to Paolo Foscarini, warning him not to advocate the Copernican hypothesis as reality:

To say that on the supposition that the Earth moves and the Sun stands still all the appearances are saved better than on the assumption of eccentrics and epicycles, is to say very well—there is no danger in that, and it is sufficient for the mathematician: but to wish to affirm that in reality the Sun stands still in the center of the world, and that the Earth is located in the third heaven and revolves with great velocity about the Sun, is a thing in which there is much danger.

Historians later called this attitude instrumentalism, in contrast with realism.1

Kepler, like Copernicus (and Ptolemy and Peurbach) was a realist. The Astronomia nova includes, for the first time, a diagram of the actual orbit of Mars according to Ptolemy:

Actual Orbit of Mars in the Ptolemaic System

Donahue comments:

The figure on the facing page [see above] is a momentous diagram. Nothing like it had ever before been published. Astronomers had become so accustomed to thinking of celestial motions as compounded circular motions that it had apparently not occurred to anyone to consider the actual path traversed by a planet. Once the reality of the celestial spheres came into question, however, the actual path traversed came to be a matter of great interest.

Donahue (p.35)

Expanding on the last sentence: the Planetary Hypotheses provides a physical mechanism for Ptolemy’s geometrical models, as we’ve seen. For a number of reasons (among them the discovery that comets would barrel right through the spheres) this mechanism seemed more and more dubious. The instrumentalists didn’t lose any sleep over this.

But for the realist Kepler, this “pretzel orbit” (as he termed it) constituted prima facie evidence againt the Ptolemaic and Tychonic systems2.

The Astronomia nova puts it this way:

But, with arguments of the greatest certainty, Tycho Brahe has demolished the solidity of the orbs, which hitherto was able to serve these moving souls [motrices animae, the beings that kept the orbs rotating], blind as they were, as walking sticks for finding their appointed road; and hence the planets complete their courses in the pure aether, just like birds in the air.

The realist approach has a couple of consequences. First, the longitude and latitude models must mesh properly to form a single consistent three-dimensional orbital geometry.

Second, if crystal spheres don’t guide the planets, what does?

[1] Centuries earlier, the Arabic polymath Averroös (Ibn Rushd, 1126–1198) complained, “The astronomy of our time offers no truth, but only agrees with the calculations and not with what exists.” [Wikipedia, longer quote in “The Search for a Plenum Universe” in Gingerich], p.140]

[2] The Tychonic system kept the Earth motionless but had the planets revolve about the Sun. It also suffered from ‘pretzelosis’.

Prev TOC Next

Leave a comment

Filed under Astronomy, History

Set Theory Jottings 12. Zermelo on “definiteness”

Prev TOC Next

In the last post, I mentioned Zermelo’s 1929 paper “On the concept of definiteness in axiomatics”. By this time, people had suggested replacing “definite” with “definable in first-order logic”. Zermelo did not agree with this.

Three points, I think, explain Zermelo’s views.

First, Zermelo never really “got” formal logic. He didn’t grasp the distinction between the meta-theory and the object theory, nor that between syntax and semantics. His correspondence with Gödel shows this. Another example: in the 1929 paper, he complains that the inductive definition of a first-order formula “depends on the concept of finite number whose clarification, after all, is supposed to be one of set theory’s principal tasks.” The inductions of course belong to the meta-theory.

Second, Zermelo approached axiomatization in the spirit of Euclid rather than with the philosophy of formalism. The axioms assert mathematical truths. They are not the arbitrary rules of a game. Hilbert’s famous “tables, chairs, and beer mugs” remark expresses the need to rid the development of any reliance on visual intuition. In much the same way, Zermelo stressed the purely objective character of the Axiom of Choice.

Third, Zermelo found Skolem’s countable model of ZF unacceptable. Recall the resolution of Skolem’s paradox: the power set of ω is not the “true” power set. In a 1930 paper, Zermelo gave his final version of the axioms of set theory. In a footnote to the Separation Axiom, he writes:

Like the replacement function in [the Replacement Axiom], the propositional function 𝔣(x) can be completely arbitrary here, and all consequences of restricting it to a particular class of functions cease to apply from the present point of view. I shall consider elsewhere more thoroughly “the question of definiteness” in connection with my last contribution to this journal and with the critical “remarks” by Mr. Th. Skolem.

The implicit criticism: if you hobble the Separation Axiom by allowing only first-order definable properties, no wonder you get a countable power set!

The 1929 paper offered the following definition of “definite property”. Zermelo writes Dp to say that p is a definite property. Then (changing notation and rewording somewhat, except for the quoted parts):

  1. First, all fundamental relations are definite.”
  2. Definiteness is passed on to composite assertions as follows
    1. If Dp, then Dp).
    2. If Dp and Dq, then D(pq) and D(pq).
    3. If Df(x,y,z,…) “for all (permissible) combinations of values”, then D((∀x,y,z,…)f(x,y,z…)) and D((∃x,y,z,…)f(x,y,z…)).
    4. If DF(f) “for all definite functors f’’ then D(∀f F(f)) and D(∃f F(f)).
      Definiteness is passed on to the quantifications.”
  3. If P is the system of all definite properties, then “it has no proper subsystem P1’’ that contains all the fundamental relations and is closed under the compositions listed above.

In clause (II.4), the f ranges over properties (or “propositional functions”), so we have a second-order quantification. Furthermore, we have an implicit circularity: the scope of ∀f is restricted to definite properties, just what we’re in the midst of defining. But clause (III) is perhaps even worse: without a robust set theory, how are we to interpret the quantification over all subsystems of P? Skolem made both these points in his reply.

Zermelo’s 1930 paper “On boundary numbers and domains of sets: New investigations in the foundations of set theory”, gave (as I mentioned above) his final version of the axioms. This includes both Replacement and Foundation, but curiously not Choice—as an axiom. Zermelo writes:

We have not explicitly formulated the “axiom of choice” here because it differs in character from the other axioms […] However, we use it as a general logical principle upon which our entire investigation is based; in particular, it is on the basis of this principle that we shall assume in the following that every set is capable of being well-ordered.

Prev TOC Next

2 Comments

Filed under History, Logic, Set Theory

Set Theory Jottings 11. Zermelo to the Rescue! (Part 2)

Prev TOC Next

In 1908 Zermelo published his paper “Investigations in the foundations of set theory”. This contained the axiom system that eventually led to ZFC. Zermelo opens the paper with this rationale:

Set theory is that branch of mathematics whose task is to investigate mathematically the fundamental notions “number”, “order”, and “function” … At present, however, the very existence of this discipline seems to be threatened by certain contradictions, or “antinomies” [such as the Russell paradox]. …it no longer seems admissible today to assign to an arbitrary logically definable notion a “set”, or “class”, as its “extension”. Cantor’s original definition of a “set” as “a collection, gathered into a whole, of certain well-distinguished objects of our perception or our thought” therefore certainly requires some restriction … Under these circumstances there is at this point nothing left for us to do but to proceed in the opposite direction and, starting from “set theory” as it is historically given, to seek out the principles required for establishing the foundations of this mathematical discipline. In solving the problem we must, on the one hand, restrict these principles sufficiently to exclude all contradictions and, on the other, take them sufficiently wide to retain all that is valuable in this theory.1

Zermelo’s motivation is pragmatic, unlike the philosophical approach of Russell and Whitehead’s Principia Mathematica.

Axiomatization was “in the air” at this time, with people throwing out various suggestions. Moore (p.151) offers some examples. Julius König proposed two axioms: (1) There are mental processes satisfying the formal laws of logic. (2) The continuum ℝ, treated as the totality of all sequences of natural numbers, does not lead to a contradiction. Schoenflies opted for the Trichotomy of Cardinals, and wanted to hold on to the Principle of Comprehension. Cantor sent a letter to Hilbert with some principles that look a lot like Zermelo’s axioms, but this letter didn’t come to light until decades later.

Zermelo’s article stands out as the first published proposal with a full set of axioms, demonstrating that it could save some of Cantor’s Paradise, and recognizing that the Principle of Comprehension was kaput.

Zermelo’s eschews philosophy:

The further, more philosophical, question about the origin of these principles and the extent to which they are valid will not be discussed here. I have not yet even been able to prove rigorously that my axioms are “consistent”, though this is certainly very essential…

The paper, he says, develops the theory of equivalence in a manner “that avoids the formal use of cardinal numbers.” He promises a second part, dealing with well-ordering, but this never appeared.

After the introduction, Zermelo begins:

  1. Set theory is concerned with a “domain” 𝔅 of individuals, which we shall call simply “objects” and among which are the “sets”. …
  2. Certain “fundamental relations” of the form aεb obtain between the objects of the domain 𝔅. …An object b may be called a set if and—with a single exception (Axiom II)—only if it contains another object, a, as an element. [I will use the modern ∈ in place of Zermelo’s ε from now on.]
  3. [Definition of subset and disjoint]
  4. A question or assertion 𝔈 is said to be “definite” if the fundamental relations of the domain, by means of the axioms and the universally valid laws of logic, determine without arbitrariness whether it holds or not. Likewise a “propositional function” 𝔈(x), in which the variable term x ranges over all individuals of a class 𝔎, is said to be “definite” if it is definite for each single individual x of the class 𝔎. Thus the question whether ab or not is always definite, as is the question whether MN or not.

Note that item (1) allows for so-called urelements or atoms—things like integers. ZFC is a so-called “pure” set theory, without atoms.

Next come seven axioms, interlarded with extensive discussion.

Extensionality:
“Every set is determined by its elements.” In other words, if MN and NM, then M=N.
Elementary Sets:
The null set exists. Given any elements a and b of the domain, the sets {a} and {a,b} exist.
Separation:
To quote Zermelo: “Whenever the propositional function 𝔈(x) is definite for all elements of a set M, M possesses a subset M𝔈 containing as elements precisely those elements x of M for which 𝔈(x) is true.”

Zermelo notes that “sets may never be independently defined by means of this axiom but must always be separated as subsets from sets already given”, and that this prevents the Russell paradox and the like. Indeed, the Russell paradox is turned into a theorem: for any set M there is a subset M0 such that M0M. He also notes that “definiteness” precludes some semantic paradoxes, e.g., Richard’s paradox (see post 5).

Zermelo shows that Separation implies the existence of set differences MM1 (denoted MM1) and intersections MN (denoted [M,N]), and even ⋂XTX for a set of sets (which he denotes 𝔇T, for “Durchschnitt”).

Power Set:
For any set T, there is a set whose elements are precisely all of T’s subsets. He denotes the power set of T by 𝔘T (for “Untermengen”).
Union:
For any set T, there is a set whose elements are precisely the elements of the elements of T. In modern notation, ⋃XTX. Denoted 𝔖T, for “Summe”. He writes M+N for our MN.
Choice:
Given any set T of mutually disjoint nonempty sets, the union ⋃XTX contains a subset S such that SX is a singleton for each XT.

Zermelo adds, “We can also express this axiom by saying that it is always possible to choose a single element from each element M,N,R,… of T and to combine all the chosen elements, m,n,r,…, into a set” S.

The set of all S’s satisfying this condition (card(SX)=1 for all XT) Zermelo calls the product of the elements of T, denoted 𝔓T, or just MN for a pair of disjoint sets M and N.

Infinity:
There is a set Z containing the null set 0, and for each of its elements a, it also contains {a}.

This leads to the so-called “Zermelo finite ordinals”, 0, {0}, {{0}}, {{{0}}}, etc. Z contains all these, and using Separation, we can assume Z contains exactly these. The Zermelo finite ordinals have two drawbacks: (1) They don’t extend naturally into the infinite ordinals; (2) Each of them, except 0, contains exactly one element. The von Neumann ordinals removed both of these blemishes.

The rest of the paper develops the theory of equivalence from the axioms. I noted that Zermelo allows atoms. On the other hand, he does not have ordered pairs, and thus neither relations nor functions. This lack calls for some gymnastics. When M and N are disjoint, the set of all unordered pairs MN={{m,n}:mM, nN} substitutes for our M×N.

To define equivalence between sets M and N, he assumes first that M and N are disjoint. Using MN instead of M×N, he can define “bijection between M and N’’. If one exists, then M and N are “immediately equivalent”. Dropping the disjointness condition, he says M and N are “mediately equivalent” if there exists a third set that is disjoint from both and “immediately equivalent” to both. It takes a couple of pages to show that this definition makes sense.

Zermelo proves the Equivalence Theorem, that is the “(Cantor-Dedekind-Schröder)-Bernstein Theorem”. (A couple of decades later, he discovered that Dedekind had basically the same proof.) He gives detailed proofs of the basic facts about equivalence. He defines “M has lower cardinality than N’’ in the usual fashion (M injects into N but not vice versa) but avoids defining “cardinal number”, as he promised in the introduction. The paper crescendoes in a proof of J. König’s inequality, a generalization of Cantor’s 𝔪<2𝔪. Expressed using cardinal numbers, this says that if 𝔪k<𝔫k for all k in some index set K, then ∑k𝔪k<∏k𝔫k. Zermelo, of course, phrases this without mentioning cardinal numbers.

Zermelo spills a fair amount of ink on the question of “definiteness”. He initially claims that ab and MN are definite questions, as we’ve seen. When defining 𝔇T, he notes that for any object a, the set Ta={XT: aX} exists by Separation (because aX is definite). But the question whether Ta=T is also definite. So using Separation again, 𝔇T={aA: Ta=T}, where A is any element of T. A similar discussion accompanies his definition of “immediately equivalent” showing that it is definite whether a given subset of MN defines a bijection.

Nonetheless, a certain nimbus of indefiniteness surrounds Zermelo’s “definite”. Twenty-one years later, Zermelo published a paper, “On the concept of definiteness in axiomatics”. By this time, people had suggested replacing “definite” with “definable in first-order logic”. Zermelo did not accept this, and his proposal had no influence on ZFC.

[1] Despite this clear statement from Zermelo, Moore argues that “his axiomatization was primarily motivated by a desire to secure his demonstration of the Well-Ordering Theorem and, in particular, to save his Axiom of Choice” (Moore (p.159)). He notes that Zermelo composed the two 1908 papers—the axiomatization, and the second well-ordering proof—together, and that “there are numerous internal links connecting the two papers” (Moore). Zermelo’s biographer takes an intermediate view: “Above all, however, one has to take into consideration how deeply Zermelo’s axiomatic work was entwined with Hilbert’s programme of axiomatization and the influence of the programme’s ‘philosophical turn’ which was triggered by the publication of the paradoxes in 1903” (Ebbinghaus (p.81)).

Prev TOC Next

8 Comments

Filed under History, Set Theory

From Kepler to Ptolemy 17

Prev TOC Next

The Mysterium cosmographicum

The Mysterium cosmographicum (Cosmographical Mystery) boasts one of most celebrated illustrations in the history of science: the planetary spheres nested with the five Platonic solids. This picture graces nearly every history of astronomy. Who am I to break with tradition, here it is:

This depicts Kepler’s explanation of why there were six planets, plus the reason for their relative distances. The architecture of the universe, so to speak. (Historians call this the polyhedral hypothesis.)

Shades of the tightly nested spheres of the Planetary Hypotheses! Kepler and his contemporaries were not aware of this work, but they had read Peurbach’s Theoricae Novae Planetarum (see post 15).

In the preface to the Mysterium cosmographicum, Kepler said:

And there were three things above all for which I sought the causes as to why it was this way and not another—the number, the dimensions, and the motions of the orbs.

You can’t argue with the sheer elegance of Kepler’s explanation—but it proved a dead-end. At most, it speaks to Kepler’s drive to find patterns in the data, a drive that led to his third law. On the other hand, the noted physicist Steven Weinberg has defended Kepler’s scheme in its historical context:

No one today would take seriously a scheme like Kepler’s, even if it had worked better. This is not because we have gotten over the old Platonic fascination with short lists of mathematically possible objects, like regular polyhedrons. There are other such short lists that continue to intrigue physicists… What makes Kepler’s scheme so foreign to us today is not his attempt to find some fundamental physical significance for the regular polyhedrons, but that he did this in the context of planetary orbits, which are just historical accidents. Whatever the fundamental laws of nature may be, we can be pretty sure now that they do not refer to the radii of planetary orbits… In [Kepler’s] time no one knew (and Kepler did not believe) that the stars were suns with their own systems of planets, rather than just lights on a sphere somewhere outside the sphere of Saturn. The solar system was generally thought to be pretty much the whole universe, and to have been created at the beginning of time. It was perfectly natural then to suppose that the detailed structure of the solar system is as fundamental as anything else in nature.

Weinberg (p.163–164)

From the beginning Kepler committed to Copernicanism. Only heliocentrism sets the relative Sun-planet distances without extra assumptions (recall scaling, post 2). Kepler noted that heliocentrism explains several mysterious features of the Ptolemaic approach (see post 6). He wrote, “Copernicus alone gives an explanation to those things that provoke astonishment among other astronomers, thus destroying the source of astonishment, which lies in the ignorance of the causes.”

Even in this first work, we find Kepler wrestling with celestial physics. I said in post 1 that Kepler’s physics, though wrong nearly point-for-point, nonetheless guided him past many pitfalls. How could this happen?

I’ll start with a one-sentence summary of what Kepler got right, physically speaking:

Forces emanating from the Sun guide or drive the planets in their orbits.

Let’s call this the motto. For the benefits listed in post 6, it matters not whether you use the mean Sun or the true Sun. Copernicus used the mean Sun throughout De revolutionibus. But our motto has a corollary. The mean Sun is a point in space, without physical meaning, and of no signficance for any planet except Earth. Hence:

Refer everything to the true Sun, and not the mean Sun. Nor should the Earth receive any special treatment.

Since the mean Sun’s very definition involves the Earth, the second sentence implies the first.

Gingerich1 proposed calling this corollary Kepler’s zeroth law. It looms large in the Astronomia nova, but already in the Mysterium cosmographicum, Kepler insisted on measuring distances from the true Sun.

The motto came from two observations. In the Copernican system, the farther a planet is from the Sun, the more slowly it moves. But also, a planet moves more slowly at aphelion than at perihelion. We know now that Kepler’s third law governs the “between planets” ratios, his second law the “aphelion vs. perihelion” ratios. And we know how Newtonian physics explains this: the conservation of angular momentum accounts for the second law, gravity’s inverse square law for the third.

To Kepler, these two regularities suggested that the “virtue” (virtus) moving the planets resides in the Sun. Perhaps the Sun’s virtue diminishes with distance as it spreads out? The aphelion-perihelion ratio pointed at a physical cause for Ptolemy’s equant.

Kepler took a stab or two at finding the rule for the “between planets” ratios. Not until much later, in the Epitome astronomiae Copernicanae, would Kepler find the right relation.

[1] In “Kepler’s Place in Astronomy”, in Gingerich.

Prev TOC Next

Leave a comment

Filed under Uncategorized

First-Order Categorical Logic 13

Prev TOC Next

MW: It’s been a minute! Well, almost 60,000 minutes.

We left off with a question: does a natural transformation from a syntactic hyperdoctrine to a semantic hyperdoctrine automatically “respect quantifiers”? We saw that this amounts to a Beck–Chevalley condition. We wondered if we had to add that condition to our definition of a model, or if it came for free.

Maybe you’ve had the experience of putting aside a crossword, half-completed, and coming back to it in a hour. Hey, of course the answer to 60 Across is “Turing Test”!

The same thing happened to me here. And the answer is: no free lunch.

Let me recap, just to get my brain back up to speed, and fix notation. B is a syntactic hyperdoctrine, C is a semantic hyperdoctrine, and F:BC is a natural transformation.

For example: say B is the theory LO of linear orders, and C is an actual linear order. B(2) contains predicates like [x<y], C(2) contains binary relations on the domain of the linear order, and F2([x<y]) is the less-than relation on this domain.

Here’s the key diagram:

Ignore the dashed arrows for just a second. Say f is an injection in FinSet, like f:{x}→{x,y}. Then B(f) and C(f) are the “throw in extra variables” morphisms we’ve talked about ad nauseum. This diagram is a commuting square in the category BoolAlg.

The dashed arrows are morphisms in a larger category we dubbed PoolAlg. This is a full subcategory of Poset, and BoolAlg is a so-called wide subcategory of it. Its objects are all boolean algebras, and its morphisms are all order-preserving maps, i.e., all the morphisms those objects have as denizens of Poset.

Within PoolAlg, the dashed arrows are left adjoints of the right arrows. Now does the diagram commute? In PoolAlg, that is. Beck–Chevalley says yes. We saw last time that we want this to be true. For example, the top left arrow takes [y<x] to [∃y(y<x)]. We want this predicate to map to the corresponding relation on an actual linear order. That’s what we mean by “respecting quantifiers”.

JB: This sounds right. Thanks for getting us started again!

As you might expect, I have a couple of nitpicks. While I feel sure there’s no free lunch here, I don’t think you have proved it. Maybe for some reason the Beck–Chevalley condition always holds in this situation! I feel sure it doesn’t, but I think that can only be shown with a counterexample: a defective model that doesn’t obey Beck–Chevalley.

It’s probably easy enough to find one. However, I don’t feel motivated to do it. We have bigger fish to fry. I’m happy to assume our models obey this Beck–Chevalley condition.

(Do we also need to assume they obey some sort of Frobenius condition?)

And here’s another even smaller nitpick: I don’t see the need for this category PoolAlg. I believe whenever you’re tempted to use it we can use our friend Poset. For example, the dashed arrows are left adjoints in Poset, and the square containing those dashed arrows commutes in Poset. Saying PoolAlg—restricting the set of objects in that way—isn’t giving us any leverage. It doesn’t seriously hurt, but I prefer to think about as few things as is necessary.

Maybe it’s time to finally try to state and prove a version of Gödel’s completeness theorem. Do you remember our best attempt at stating it so far? I think I can, just barely… though it’s somewhat shrouded by the mists of time.

MW: That’s right, to prove the “no free lunch” result we need a counter-example. That’s what came to me when I started thinking about this stuff again. A way to construct a whole slew of counter-examples.

And I think it’s worth going through, because it relates to Henkin’s proof of Gödel’s completeness theorem. The cat-logic proof will have to surmount the same obstacles. So here goes.

Say B is a theory, aka syntactic hyperdoctrine. Say C is a semantic hyperdoctrine, and F:BC is one of those natural transformations that does respect quantifiers. Suppose the domain of C is V: C(n) is a set of n-ary relations on V. Now let W be a subset of V. Let D(n) be all the relations you get by restricting the relations in C(n) to W. And for any predicate φ∈B(n), let Gn(φ) be the restriction of Fn(φ) to W. Let be the image of B(n) under Gn. I claim that D is a hyperdoctrine, and G is a natural transformation from B to D. And very often, G will be disrespectful of quantifiers.

Using my LO example, let C have V=ℤ as its domain, and let W=ℕ. The predicate [y<x] gets sent to a less-than relation on ℤ by F2. This restricts to a less-than relation on ℕ. So G2([y<x]) is that relation.

Now let’s apply the left adjoints of the injection {x}→{x,y}. Up in B, we get the predicate [∃y(y<x)]. F1 sends this to the always-true relation in ℤ, which of course restricts to the always-true relation in ℕ. What about the left adjoints in C and D? The relation supy(y<x) is always true in ℤ, but is x≠0 in ℕ. So “go left, then down via G1’’ gets you to always-true in D, while “go down via G2, then left” gets you to x≠0.

Another example: let V=ℚ, W=ℤ, and for our predicate, use [x<z<y]. The left adjoint from {x,y}→{x,y,z} is [∃z(x<z<y)]. Taking one path brings us to the relation x<y in ℤ, while the other path leads to “follows, but not immediately”.

It’s clear now that one path leads to a “model” where ∃x means, “does there exist an x in a larger domain?”. The Henkin proof features a whole sequence of such enlargements. Thinking about that suggested this class of counter-examples to me.

I believe that restricting a semantic hyperdoctrine this way results in a hyperdoctrine. I haven’t checked all the details. But it’s a piece of cake when C(n) equals P(Vn) for all n. In that case, D(n) is just P(Wn)! This is what you wanted for semantic hyperdoctrines in the first place, as I recall.

The other thing we need to check: that G is a natural transformation. The key
diagram:

Here, ↾m and ↾n are the restriction maps. It’s pretty obvious that the bottom square commutes. Since the top square does too, an easy diagram chase shows that “outer frame” square does.

Anyway, you wanted a refresher on the statement of the Gödel completeness theorem in our framework. Here goes. A syntactic hyperdoctrine B is consistent if B(0) is not the one-element boolean algebra {⊤=⊥}. The theorem states that every consistent syntactic hyperdoctrine has a model, which is a natural transformation to a semantic hyperdoctrine that respects quantifiers.

(And maybe also equality? Haven’t thought about that yet.)

We want to adapt the Henkin proof to this framework, is that right? I think I see, in a vague way, how to do that. But I don’t see how we’d be leveraging the framework—how the proof would be anything but a straighforward translation, not really using any of the neat category ideas.

JB: Okay, great. Thanks for all that.

Let’s try to clean things up a wee bit before we dive in. When you say “syntactic hyperdoctrine” I’d prefer to just say “hyperdoctrine”.

First, it seems plausible that any hyperdoctrine is consistent iff has a model. Second, part of the whole goal of working categorically is to break down the walls between syntax and semantics, to treat them both as entities of the same kind (in this case hyperdoctrines).

So, we should aim for a version of Gödel’s completeness theorem saying that a hyperdoctrine B is consistent iff it has a morphism F to a hyperdoctrine C of some particular “semantic” sort. You said B should be “syntactic”. That’s great as far as it goes, but here’s a place where we can generalize and do something new—so let’s boldly assume B is an arbitrary hyperdoctrine.

MW: Wait a minute! What do you mean by “morphism” here?

JB: A morphism of hyperdoctrines. Do you object? (That was a pun.)

MW: Who could object to a three layer cake?

In the bottom layer are boolean algebras, regarded as poset categories. The middle filling consists of categories whose objects are boolean algebras; a hyperdoctrine is a functor from FinSet to one of them. And now you propose to top it off with a category whose objects are hyperdoctrines. Copacetic!

Anyway, we never defined a morphism of hyperdoctrines. Do you mean a natural transformation respecting quantifiers?

JB: Well, perhaps not exactly that. We certainly want F to be a natural transformation respecting quantifiers, but we probably want it to be more than that, and we haven’t yet figured out exactly what. I expect that maybe F should be a natural transformation whose components obey the Beck—Chevalley and Frobenius conditions. That should make it respect quantifiers and equality. But we really need to think about this harder before we can be sure! So I said “morphism”, to leave the issue open.

We will probably resolve this issue as part of proving Gödel’s completeness theorem. It’s a time-honored trick in math: you make up the definitions while you’re proving a theorem, so your definitions make the theorem true.

Okay, where was I? I wanted to think about Gödel’s completeness theorem this way: it says a hyperdoctrine B is consistent iff it has some morphism F to a hyperdoctrine C of some “semantic” sort. And by “semantic” let’s mean the most traditional thing: we’ll demand that C be a hyperdoctrine where we pick a set V and let C(n) be the power set of Vn. That is semantic par excellence.

So how are we going to do this? Whatever trick people ordinarily use to prove Gödel’s completeness theorem—and you already mentioned the key word: “Henkin”—let’s try to adapt it to a general hyperdoctrine B. Maybe we can sort of pretend that B is defined “syntactically”, but try to use only its structure as a hyperdoctrine.

MW: Okay, this is the sort of thing I was hoping for!

I’m intrigued. The Henkin proof leans heavily on syntax. It begins by adding a bunch of new constants and axioms. How can we do this, while forswearing syntax?

JB: A question for the next installment….

Prev TOC Next

Leave a comment

Filed under Categories, Conversations, Logic

Set Theory Jottings 10. Axiomatic Set Theory

Prev TOC Next

“An Axiom, you know, is a thing that you accept without contradiction. For instance, if I were to say ‘Here we are!’ that would be accepted without any contradiction, and it’s a nice sort of remark to begin a conversation with. So it would be an Axiom. Or again, supposing I were to say, ‘Here we are not!’, that would be—”

“—a fib!” cried Bruno.

“that would be accepted, if people were civil”, continued the Professor; “so it would be another Axiom.”

“It might be an Axledum”, Bruno said: “but it wouldn’t be true!

—Lewis Carroll, Sylvie and Bruno Concluded

To get to the “good stuff” in math, you almost always need some set theory. Zermelo-Fraenkel set theory (ZF), plus the axiom of choice (AC; ZF+AC=ZFC) has become the standard first-order axiom system for set theory.

Before diving into the details, some generalities on axiom systems. Nowadays we’re pretty chill about them; you can take any collection you like (hopefully consistent) for a theory, and then you can start writing your thesis. Not, perhaps, an interesting thesis, but at any rate Bruno won’t complain that your axioms aren’t true!

For the Greeks, the axioms and postulates were true, in some sense. Idealized, sure, but descriptive of reality. This tie began to fray with the discovery of non-Euclidean geometries. Algebraic axiom systems, like those for groups and for fields, appear by the end of the 19th century.

For roughly two thousand years after Euclid, most math developed without axioms. Take calculus as an example. You have the rules of calculus, but you don’t see anything like the Euclidean treatment of geometry. This remained true even as people subjected its foundations to stricter and stricter scrutiny. Mathematical intuition reigned supreme.

Hilbert’s Grundlagen der Geometrie (Foundations of Geometry, 1899) pushed towards a more formalist attitude. A celebrated quote of his, from years earlier, sums it up nicely:

One must be able to say at all times, instead of points, lines, and planes: tables, chairs, and beer mugs.

At times Cantor seemed to endorse this perspective:

Mathematics is entirely free in its development, and its concepts are only bound by the necessity of being consistent, and being related to the concepts introduced previously by means of precise definitions.

Grundlagen einer allgemeinen Mannigfaltigkeitslehre (Foundations of a general theory of sets)

But he held strong opinions on what’s true in mathematics:

I entertain no doubts as to the truths of the tranfinites, which I recognized with God’s help and which, in their diversity, I have studied for more than twenty years; every year, and almost every day brings me further in this science.

—Letter from Cantor to Jeiler, quoted in Dauben (p.147).

On the other hand, he referred to the “Cholera-Bacillus of infinitesimals”, and called them “nothing but paper numbers!” (Dauben, p.131). The Continuum Hypothesis was for him a question of fact.

Two other themes run through this period: mathematics as a mental activity, and as logic.

Recall that Boole titled his famous treatise An Investigation of the Laws of Thought: on Which are Founded the Mathematical Theories of Logic and Probabilities. Cantor’s definition of “set” in his last major work reads

By a set we are to understand any collection into a whole M of definite and separate objects m of our intuition or our thought.

Here is the first sentence of Dedekind’s Was sind und sollen die Zahlen?: “In what follows I understand by thing every object of our thought.” His proof of the existence of an infinite set relies on this ontology:

Theorem: There exist infinite systems.

Proof: My own realm of thoughts, i.e., the totality S of all things which can be objects of my thought, is infinite. For if s signifies an element of S, then the thought s′, that s can be an element of my thought, is itself an element of S

[Dedekind then appeals to his definition of infinite as having a bijection with a proper subset.]

Frege severely criticized this injection of psychology into mathematics. Cantor’s “proof” of the Well-Ordering Theorem suffers from it, as it consists of successively choosing elements of the set to be well-ordered. If we take this literally, then the choices must take place at an increasing sequence of times t1<t2<…. This limits us to ordinals that are “realizable in ℝ’’, and thus to countable ordinals (see post 4). Yet Cantor claimed that every set can be well-ordered, in particular ℝ.

This is why Zermelo was at pains to say in his second proof of the Well-Ordering Theorem, “…the ‘general principle of choice’ can be reduced to the following axiom, whose purely objective character is immediately evident.” (My emphasis.)

Both Frege and Russell held that the truths of mathematics are logical facts. Thus we find debates on whether Zermelo’s axiom of choice is logically valid. Not surprising, historically. Aristotle’s logic dealt with propositions. From “proposition” we obtain “propositional function”, that is, a proposition with a free variable, like “x is mortal”. It becomes a proposition if we assign a value to the variable (“Socrates is mortal”), or quantify over it (“All men are mortal”). The class of all things satisfying a propositional function went by the name, “extension of a concept”.

Zooming out from these specifics, logic and mathematics both lay claim to necessary truth. This is elaborated in Kantian philosophy. Kant classified mathematical facts as synthetic a priori: necessary truths that go beyond analytic truth, which are true by definition. Poincaré classified the Axiom of Choice as a synthetic a priori judgment, just like the principle of induction.

The rise of formal logic and axiomatic set theory resulted in a sharply drawn boundary between logic and set theory. We have the axioms and rules of inference of first-order logic; then we have the axioms of ZFC or similar systems, which are particular first-order theories. Things weren’t so clear at the dawn of the 20th century.

Prev TOC Next

Leave a comment

Filed under History, Set Theory

From Kepler to Ptolemy 16

Prev TOC Next

Kepler

Kepler wrote five major astronomical works. Chronologically:

the Mysterium cosmographicum (1596)

the Astronomia nova (1609)

the Epitome astronomiae Copernicanae (1618–1622)

the Harmonice mundi (1619)

and the Tabulae Rudolphinae (1627).

The Mysterium cosmographicum (Cosmographical Mystery) expresses Kepler’s youthful enthusiasm and sounds the leading notes to themes that would persist throughout his career. Kepler’s elliptical orbits and the area speed law make their debut in the Astronomia nova (New Astronomy). (Although at this point Kepler regarded the area law as just an approximation to the inverse speed law.) The Epitome astronomiae Copernicanae (Epitome of Copernican Astronomy) completes and refines his theory. The Harmonice mundi (Harmonies of the World) contains the statement of Kepler’s 3rd law, its main scientific claim to fame. The Tabulae Rudolphinae (Rudolphine Tables) ultimately led to the widespread acceptance of Keplerian astronomy.

He also wrote several lesser astronomical works, and treatises on optics, on computing volumes, on the philosophy of science, a pamplet on snowflakes… The critical edition of his collected works runs to 22 volumes. I will focus just on the Mysterium cosmographicum and the Astronomia nova.

Prev TOC Next

Leave a comment

Filed under Astronomy, History

Set Theory Jottings 9. Cantor Normal Form

Prev TOC Next

Suppose β>1, and let ζ>0 be arbitrary. Then ζ has a unique representation in so-called Cantor normal form:

ζ α1 χ1+···+βαk χk
α1>···>αk, 1≤χi<β for all i
(1)

Stillwell gives an illustration, close to a proof, in §2.6 (pp.46–47)1, for the most important special case: β=ω and ζ<ε0. (The Cantor normal form for ε0 is just ωε0, not helpful.)

Here’s a more formal proof of the full theorem. First generalize division to all ordinals. If β>1, then any ζ has a unique representation in the form

ζ= βχ+ρ,   ρ<β

Proof: since β>1, β(ζ+1)≥ζ+1, and so there is a least χ′ such that βχ′>ζ. Furthermore χ′ cannot be a limit ordinal: if βξ≤ζ for all ξ<λ, then βλ≤ζ by continuity of multiplication. Set χ=χ′−1, so

βχ≤ζ<β(χ+1)

Now write (by equation (7) of post 8)

ζ=βχ+(−βχ+ζ)=βχ+ρ

setting ρ=−βχ+ζ. We cannot have ρ≥β, otherwise

ζ=βχ+ρ≥βχ+β=β(χ+1)>ζ

Uniqueness needs to be done just right, since ordinal addition has cancellation on the left but not on the right. Suppose

βχ11=βχ22,    ρ12

If χ12 then χ21+γ for some γ>0. So

βχ11=β(χ1+γ)+ρ2=βχ1+βγ+ρ2

Cancelling on the left,

ρ1=βγ+ρ2≥βγ≥β

contradicting the definition of ρ1. So χ12=χ (say), and we can cancel βχ on the left to get ρ12.

The proof of Cantor normal form starts out in a similar fashion. First prove by induction that βα≥α for all α. Since βζ+1>ζ, it follows that there is a least α with βα>ζ. Furthermore α cannot be a limit ordinal by continuity of exponentiation (in the exponent). Set α1=α−1. So

βα1≤ζ<βα1+1

Now we divide ζ by βα1:

ζ=βα1χ11,    ρ1α1

We cannot have χ1≥β, otherwise

ζ=βα1χ11≥βα1β=βα1+1

Repeat with ρ1:

ρ1α2χ22,   χ2<β,   ρ2α2

We must have α21 because βα2≤ρ1α1. Continuing we get a descending sequence of αi’s which must terminate in finitely many steps, because ordinals. This establishes (1).

The proof of uniqueness provides a bonus: a criterion for which of two normal forms is larger. It’s what you’d expect from decimal notation. Suppose

ζ α1 χ1+···+βαk χk
ζ′ = βα1 χ1′+···+ βαm χm

are two unequal normal forms. Because cancellation on the left holds for addition, we can remove any equal terms on the left, and assume either α11′ or α11′ and χ11′ (for the reverse inequalities, just flip things around). Then ζ<ζ′. I will call this the first difference criterion for the ordering of Cantor normal forms.

The proof resembles a decimal computation. To show that 999<1000 (for example), we add 1 and do the carries. Here, we begin by combining the last two terms of ζ, using the facts that χk<β and αk−1k:

βαk-1 χk-1αk χk αk-1 χk-1αk+1
≤βαk-1 χk-1αk-1
αk-1k-1+1)

Next we combine with the previous term, using the fact that χk−1+1≤β and αk−2k−1:

βαk-2 χk-2αk-1k-1+1) ≤βαk-2 χk-2αk-1+1
≤βαk-2k-2+1)

We keep going, until eventually we have

ζ<βα11+1)

If α11′ and χ11′, then we have βα11+1)≤βα1 χ1′. Thus ζ is less than the first term of ζ′. If α11′ then we use the fact that χ1+1≤β, concluding that βα11+1)≤βα1+1≤βα1, and hence ζ is again less than the first term of ζ′. A fortiori, ζ<ζ′.

Uniqueness follows. These ordering criteria are needed for the Goodstein and Hydra theorems.

[1] However, Stillwell messes up at the end of the example. In the last bullet he says, “This means that α is a term in the following sequence with limit ωω2·7+ω·4+11+ω= ωω2·7+ω·4+12.” This equation is incorrect; the last “+ω’’ should be “·ω’’. The sequence on the next line should have “·1’’, “·2’’, etc., instead of “+1’’, “+2’’, etc. So he has many more steps to go.

Another point. When he says that α “falls between” two terms in a sequence, he’s not entitled to assume it falls strictly between. He should say instead that there are two consecutive terms with α greater than or equal to the first and less than the second.

Prev TOC Next

Leave a comment

Filed under Set Theory

Set Theory Jottings 8. Ordinal Arithmetic

Prev TOC Next

Usually one defines the ordinal operations via transfinite induction:

Continue reading

2 Comments

Filed under Set Theory