Category Archives: Math

Set Theory Jottings 22. Absoluteness

Prev TOC Next

Let’s look again at the notion of definability, rewritten slightly: for any set A, xA is definable over A if there is a first-order formula φ(y,ū) and elements āA such that

x={zAA(z,ā)}

where φA is φ relativized to A, i.e., all quantifiers in φ range only over A.

Very clearly the right-hand side depends on A. In some cases, we can write a formula for x that is independent of A. For example:

x={y}   ↔   yx ∧ ∀z(zxz=y)

Absoluteness is at the heart of Gödel’s proof that V=L holds in L. Failures of absoluteness present the main technical obstacles to showing that L satisfies ZF.

Three Non-absolute Notions

Let’s look at three non-absolute notions:

x=𝒫(y)
z is uncountable
x=𝒫un(y)={zy:z is uncountable}

Now think about relativizing these three notions to L and to the Lα’s. Suppose x and y are both present in some Lα, and Lαx=𝒫(y). As we ascend to higher Lβ’s, the assertion “x=𝒫(y)’’ can become false1, because new subsets of y can appear in Lβ. The reverse switch from false to true can’t happen: once we have a “witness” z to x≠𝒫(y) (i.e., zy but zx, or zy but zx), it won’t go away. (Note that the meaning of x and y can’t change, because the Lα’s are all transitive: by the time x shows up, all its elements have shown up. Ditto for y.)

Here’s a slightly different way to look at it. Consider the sequence xα=𝒫Lα(y). The sequence xα is monotonically increasing (indeed, 𝒫Lα(y)=𝒫(y)∩Lα). So the assertion x=xα can flip from true to false as α increases2. It can’t flip from false to true, because that would mean that x had some elements (subsets of y) that xα was missing—but in that case x wouldn’t have been an element of Lα in the first place.

It’s a similar story for uncountability. We have:

Lαz is uncountable ↔ Lα⊧¬∃f(f:z↣ω)

where f:z↣ω is shorthand for saying that f is a injection from z into ω. If a witness f to the countability of z appears in some higher Lβ, then z “becomes countable”, and remains countable after that.

Next we look at x=𝒫un(y). Let xα=𝒫unLα(y). New uncountable subsets of y can appear at any time. But also, a subset of y that is uncountable in Lα can become countable later on. The upshot: 𝒫unLα(y) can both gain and lose elements as α increases, and the truth-value of “x=𝒫un(y)’’ can switch back and forth over and over again.

We will look at the limiting behavior (i.e., 𝒫L(y) and 𝒫unL(y)) later on.

Absolute Formulas

How about absolute formulas? We say that a formula φ(ū) is absolute if for all ā in K

K⊧φ(ā) ↔ V⊧φ(ā)
(K a transitive class, āK)

in other words, the truth-value of K⊧φ(ā) doesn’t depend on K, provided only that K is a transitive class. (To be precise, when we say K⊧φ(ā), we mean (K,∈)⊧φ(ā). Also, āK means that all the ai belong to K. Note that without āK, K⊧φ(ā) makes no sense.)

Δ0 formulas

A Δ0 formula is one where all quantifiers are bounded, i.e., of the form (∀xy) or (∃xy).

Here’s the intuition behind Δ0 formulas. In post 14, we defined the transitive closure of x to the elements of x, plus the elements of the elements of x, etc. Here we want the augmented transitive closure, where we add x as an element. Any transitive class containing x as an element contains its augmented transitive closure. In fact, it’s easy to see that the augmented transitive closure of x is the smallest transitive class containing x as an element, and that the augmented transitive closure is a set. OK: if φ(ā) is Δ0, then to find out if K satisfies φ(ā), we only need to root around in the augmented transitive closure of the ai’s. We never need to search through all of K. It looks like Δ0 formulas should always be absolute.

One proves this by a routine induction on complexity. If φ() and ψ() are absolute, it’s immediate that ¬φ(), φ()∧ψ(), and φ()∨ψ() are absolute. As for (∃ūv)φ(ū,), one direction is easy. Suppose for ā,bK we have K⊧(∃ūb)φ(ū,ā). Then

    K⊧(∃ūb)φ(ū,ā)
⇒(∃K) Kb∧φ(,ā)
⇒(∃V) Vb∧φ(,ā)
V⊧(∃ūb)φ(ū,ā)

In the other direction, again with ā,bK (as demanded by the definition of absoluteness)

    V⊧(∃ūb)φ(ū,ā)
⇒(∃V) Vb∧φ(,ā)
⇒(∃K) Kb∧φ(,ā)
K⊧(∃ūb)φ(ū,ā)

We know that K in the third line because bK and K is transitive. The inductive assumption takes us from V⊧φ(,ā) to K⊧φ(,ā).

Some examples of Δ0 formulas:

Example 1: “xy’’ is Δ0, since it can be written (∀zx)zy.

Example 2: “x is an ordinal” is Δ0. In post 17 we wrote out the clauses for this; for example, one was “(∀u,vx)(u<vv<uu=v)’’. (Recall that < is the same as ∈ for ordinals.) This is obviously Δ0. The transitivity of x was “(∀u,v)(uvxux)’’, which we can rewrite as “(∀vx)(∀uv)ux’’. Only the last clause isn’t Δ0, even rewritten this way:

(∀yx)(y≠∅ → (∃uy) uy=∅)

The issue is “∀yx’’. This does not count as a bounded quantifier. But Foundation makes this clause unnecessary.

Example 3: “f:xy’’ is Δ0, i.e., f is an injection of x into y. Intuition: We just need to dig inside the guts of f and its domain and range to show that f is a function and is 1–1.

(∀〈u,z〉∈f) (∀〈v,z〉∈f) u=v

says that f is injective. The vernacular “∀〈u,z〉∈f’’ expands to “(∀pf)p=〈u,z〉’’, and this presents no snags.

Example 4: “w=ω’’ is Δ0, i.e., w is the first infinite ordinal. Here’s the formula for this:

w is an ordinal ∧ (∀yw)(y=∅ ∨ (∃xw)y=x+)

where x+=x∪{x}, the successor of x.

Example 5: f:y↣ω, i.e., f establishes that y is countable. This follows immediately from the last two examples.

In none of these examples do we ever need to climb outside the transitive closure of a given set and wander around the entire class K.

In contrast, we cannot check that Kx=𝒫(y) or that x is countable in K without surveying all of K, looking for (respectively) subsets of y and injections f.

Syntactic Analysis

With this in mind, we turn to a syntactic analysis of our non-absolute examples.

As noted, zy and f:z↣ω are Δ0. So x=𝒫(y) is of the form

z δ1(x,y,z)

and “x is uncountable” is of the form

z δ2(x,z)

where δ1 and δ2 are Δ0.

Finally, x=𝒫un(y) is of the form ∀zfg δ3(x,y,z,f,g), where δ3 is Δ0. Showing this takes a bit of work. First we break down x=𝒫un(y) into three parts:

   ∀z(zx zy)
∧ ∀z(zx → ∀f ¬(f:z↣ω))
∧ ∀z((z⊆y ∧ ∀g ¬(g:z↣ω)) → zx)

We ask, how could this conjunction fail to be true? This way:

   ∃z(zx zy)
∨ ∃z(zx ∧ ∃f f:z↣ω)
∨ ∃z(zy ∧ ∀g ¬(g:z↣ω) ∧ zx)

Now we use basic logical equivalences to move the quantifiers outwards. Say we have formulas φ, ψ(u), and ξ(u), where u does not appear in φ. Then

φ∧∃uψ(u) ↔∃u(φ∧ψ(u))
φ∨∃uψ(u) ↔∃u(φ∨ψ(u))
φ∧∀uψ(u) ↔∀u(φ∧ψ(u))
φ∨∀uψ(u) ↔∀u(φ∨ψ(u))
uψ(u) ∨ ∃uξ(u) ↔∃u(ψ(u)∨ξ(u))

So we can rewrite the failure of x=𝒫un(y) as:

z f g [
    (zx zy)
∨ (zx f:z↣ω)
∨ (zy ∧ ¬(g:z↣ω) ∧zx)  ]

The stuff inside the brackets is Δ0, so negating this gives us the form we claimed.

Thinking in terms of witnesses makes this more picturesque. Suppose x≠𝒫un(y). In other words, x is accused of the crime of not being 𝒫un(y). The prosecution and defence must provide witness lists before the trial starts. The prosecution lists z and f; the defence, all the g’s. Any one of the three disjuncts is sufficient to convict; let’s imagine a trial lasting three days. The witness z is called each day. On the first day, if z testifies to being an element of x but not a subset of y, game over. But suppose z surprises Jack McCoy (the prosecutor) by being a subset of y. On the second day, f is also called, to testify to the countability of z; McCoy hopes to show that zx. Too bad for McCoy, f’s testimony falls apart. On the third day, z is recalled and is shown to be both a subset of y, and not an element of x after all! The defence tries to argue that’s ok, because z is countable. He calls up every single g to testify to being the required injection, but each g fails. The jury convicts and McCoy repairs to the bar to have a drink with his ADA.

The Lévy Hierarchy

Summarizing the previous section:

x=𝒫(y) is of the form uδ1(x,y,u)
x is uncountable is of the form uδ2(x,u)
x=𝒫un(y) is of the form uvwδ3(x,y,u,v,w)

where δ1, δ2, and δ3 are all Δ0 formulas. This syntactic analysis fits into a scheme known as the Lévy Hierarchy.

Formulas of the form ∀uδ(,u) are called Π1 formulas; replace the ∀ with an ∃, and you’ve got a Σ1 formula. (Of course, δ here stands for a Δ0 formula.) More generally, any string of ∀’s is allowed at the front of a Π1 formula, likewise any string of ∃’s at the front of a Σ1 formula.

The negation of a Π1 formula is Σ1, and vice versa. Truth “propagates upwards” for Σ1 formulas and “propagates downwards” for Π1 formulas. The intuition is clear: if K⊧∃ūδ(ā,ū) for some āK with K a transitive class, then we have witnesses—elements K such that K⊧δ(ā,). The witnesses cannot be impeached by enlarging K, because δ is Δ0. So ∃ūδ(ā,ū) holds also in any transitive class containing K. Likewise for the downward propagation with Π1 formulas.

ūδ(,ū,) is a Π2 formula; its negation is a Σ2 formula. A formula that looks like ∀ū…δ, where there are n alternating quantifier blocks, is a Πn formula; starting off with an existential block gives a Σn formula.

A formula equivalent to both a Σ1 and a Π1 formula will thus be absolute—but that word “equivalent” is the kicker. Equivalent in what sense? One answer: φ() is ΣnZF if there is a Σn formula ψ() such that ZF⊢∀(φ()↔ψ()); likewise for ΠnZF. If a formula is both ΣnZF and ΠnZF, we say it’s ΔnZF. So Δ1ZF formulas are absolute between models of ZF. (And of course, Σ1ZF formulas are absolute upwards, Π1ZF absolute downwards, but only between models of ZF.)

This tradeoff between admitting more formulas or more classes can take a variety of forms. I won’t explore the full landscape, but a few aspects should be highlighted.

First let’s look at the role of bounded quantifiers. In Σ1 formulas they must appear on the inside the scope of the unbounded ∃. (∀xy)∃z(zx) is not Σ1, for example.

If we restrict attention to ZF-models, then we can allow bounded quantifiers anywhere, and still get something equivalent to Σ1. Example: the formula (∀xy)∃zφ(x,y,z) is ZF-equivalent to ∃u(∀xy)(∃zu)φ(x,y,z), by an argument involving ranks3. So we can migrate all bounded quantifiers to the inside.

Formulas like (∀xy)∃zφ(x,y,z) are absolute upwards for all transitive classes, not just models of ZF. The idea is simple: in quantifying (∀zy)… with yK, the bounded quantifier never asks us to go outside K, for a transitive K. So if the assertion holds for K, it will hold for any transitive MK.

For “function-like” formulas, we have another trick. We encountered this notion in posts 13 and 17: φ(,y) is function-like if for any there is a unique y making φ(,y) true. That is, ∀yz(φ(,z)↔z=y).

Any formula that is absolute upwards between transitive classes KM, and is function-like over both K and M, is in fact absolute between K and M. Proof: Suppose āK, and we have both K⊧φ(ā,b) and M⊧φ(ā,c), with bK and cM. Because φ(,y) is absolute upwards from K to M, we also have M⊧φ(ā,b). But since φ(,y) is function-like over M, that means b=c. So K⊧φ(ā,c).

This result has a counterpart in the Lévy hierarchy. Suppose we have a Σ1 formula

ūφ(,y,ū)

where φ(,y,ū) is Δ0. Suppose also that K is a transitive class, and ∃ūφ(,y,ū) is function-like for K. That is, for any K, there is a unique d such that K⊧(∃ū)φ(,d,ū). Then our formula is equivalent to this Π1 formula over K:

ūz(φ(,z,ū)→y=z)

I relegate the proof to the end of this post.

Now suppose that (∃ū)φ(,y,ū) is function-like for both K and M with KM. Then it is equivalent in both classes to a Π1 formula; we might say it is Δ1 for K and M, and hence absolute between them.

The Πnn classification is not confined to set theory; in a more general context, quantifier-free formulas play the role of Δ0 formulas. Historically, proofs in logic often began by reducing formulas to prenex normal form (i.e., all quantifiers in front). This isn’t so widespread anymore. But induction on the “complexity” of formulas still pervades logic, and the Πnn classification is our deepest analysis of this complexity.

Here is the argument about function-like Σ1 formulas. Suppose one of these formulas holds for (,d). If the antecedent holds in the Π1 formula for some ū with = and z=d′, then the Σ1 formula also holds for (,d′). By the uniqueness hypothesis, d=d′ and the consequent holds for (,d) in Π1 formula. That shows that the Σ1 formula implies the Π1 formula. For the other direction, suppose the Π1 formula holds for (,d). By the existence part of function-likeness, there must be a ū and a d′ making the Σ1 formula true for (,d′). The Π1 formula tells us that d=d′, so the Σ1 formula holds for (,d). This argument can be extended by induction to show that function-like Σn formulas are Πn.}

[1] Just to be clear: by saying “x=𝒫(y) becomes false”, I mean that although Lαx=𝒫(y), Lβx≠𝒫(y). Here x and y are fixed elements of L, which both belong to Lα (and thus also to Lβ).

[2] Again, to be clear, by “flip” I mean that x=xα but xxβ for two ordinals α<β.

[3] For each x in y, let αx be least ordinal such that there is a z of rank αx making φ(x,y,z) true. Let ξ=supxyαx, and set u=Vξ. Later on I will call this sort of reasoning a waiting argument.

Prev TOC Next

Leave a comment

Filed under Set Theory

Set Theory Jottings 21. The Constructible Universe

Prev TOC Next

The constructible universe is traditionally denoted L. L is a subclass of V and is a proper class. Gödel proved three things about L:

  1. All the axioms of ZF hold in L, i.e., L is a model of ZF.
  2. V=L holds in L, i.e., L is a model of the axiom “All sets are constructible”. Cohen: “This is a small but subtle point. It says that a constructible set is constructible when the whole construction is relativized to L.”
  3. V=L→AC and V=L→GCH are both provable in ZF.

So we can’t prove not-GCH in ZF. If we could, it would have to hold in L, but GCH holds in L. Ditto for AC.

L is constructed according to the familiar transfinite scheme, using a function ℱ (discussed below):

L0 = ∅
Lα+1 = ℱ(Lα)
Lλ = ⋃α<λLα
L = ⋃α∈Ω Lα

Let A be a set. ℱ(A) is a subset of 𝒫(A); it’s the set of all sets that are definable using elements of A.

Here’s the precise definition. For any set A, xA is definable over A if there is a first-order formula φ(y,ū) and elements āA such that

zx  ↔  zA ∧ φA(z,ā)

where φA is φ relativized to A, i.e., all quantifiers in φ range only over A. (Logic Notes §5 treats relativisation.) As we said before, ℱ(A) is the set of all subsets of A that are definable over A.

So ℱ(A) is something like 𝒫(A), except we include only those sets where we can explicitly describe their criterion for membership. Cohen discusses how this notion arose from, but did not resolve, concerns about so-called impredicative definitions. (We talked about this in post 3 on the paradoxes, and in post 5 on Zermelo’s proof of the well-ordering theorem.)

Some examples. The singleton {x}, the unordered pair {x,y}, the ordered pair 〈x,y〉={{x},{x,y}}, and the power set 𝒫(x) are all definable from x or from x and y:

z∈{x} z=x
z∈{x,y} z=xz=y
z∈〈x,y z={x} ∨ z={x,y}
z∈𝒫(x) ↔ ∀u[uzux]

Each right-hand side is a first-order formula characterizing the elements of a set. As usual, imagine the vernacular expanded. For example, instead of z={x}, we have ∀t(tzt=x). The left-hand sides are abbreviations for the right-hand sides, so (for example) in the formal definition of 〈x,y〉, “z={x}’’ and “z={x,y}’’ have been expanded.

The prime examples of sets not obviously definable are choice functions. For example, it’s easy to say what we desire of a choice function c for 𝒫(ℝ), where ℝ=𝒫(ω):

(∀s⊆ℝ) (s≠∅→c(s)∈s)

But this doesn’t characterize c. (We’ve already seen how to express formally “c is a function with domain 𝒫(ℝ)∖{∅}’’, and “c(s)∈s’’.)

Gödel’s L is an example of an inner model. The method of inner models proves relative consistency results: If a theory 𝒯 is consistent, then so is 𝒯+φ, where φ is a formula φ in ℒ(𝒯). To apply this method, you have find a formula α(x) in ℒ(𝒯), and show two things:

for all ψ∈𝒯, 𝒯⊢ψα
and also 𝒯⊢φα

where ψα is ψ relativized to α.

You can approach this method syntactically or semantically. First, semantics: Let’s say T is a model for 𝒯. Consider the substructure selected by α(x), call it A. It’s a model of 𝒯+φ, because each formula ψ∈𝒯, when interpreted as speaking about A, is equivalent to ψα interpreted in T:

A⊧ψ if and only if T⊧ψα

Likewise for φ. We’ve found a model of 𝒯+φ sitting inside a model of 𝒯.

Syntactically, say we had a proof of a contradiction in 𝒯+φ. Go through and relativize everything with α. Now we have a proof of a contradiction in 𝒯: all the relativized axioms of 𝒯+φ can be proved in 𝒯, and it turns out that relativization preserves the logical axioms and rules of inference. (Picky point: we need ∃xα(x) to hold too.)

Gödel’s treatment emphasized the syntactic aspect, Cohen’s the semantic.

Of course the hard part is proving the relativizations:

for all ψ∈ZF, ZF⊢ψL
and also ZF⊢(V=L)L

So Con(ZF) → Con(ZF+V=L). People write ZFL for ZF+V=L, “All sets are constructible.” Gödel also showed that ZFL⊢AC and ZFL⊢GCH, giving relative consistency for these too.

Here’s a trivial example of the method of inner models. Let Group be the first-order theory of groups, and let abelian be the axiom x·y=y·x. Any model of Group (i.e., any group) has a model of Group+abelian sitting inside of it, namely its center. The formula

ζ(x) ↔∀y[x·y=y·x]

selects the center of the group. (Picky point: we can’t let the ∀y be implicit, since we need ζ(x) to define a unary relation.) Within Group, we can prove that the center of a group is an abelian group. It’s not totally trivial that the center of a group is even a group, i.e., that it’s closed under the group operation. Anyway, this argument shows the relative consistency result

Con(Group)→Con(Group+abelian)

admittedly a trivial result, but it illustrates the method.

Before Gödel’s L, the most prominent example of this method was von Neumann’s class of well-founded sets: V=⋃α∈ΩVα. This shows that if ZF minus Foundation is consistent, then ZF is too. The demonstration amounts to a much easier “dry run” for Gödel’s results.

Finally, let’s note an important feature of the inclusion LV. Is it a proper inclusion, i.e., are there any non-constructible sets? Gödel thought so. So do most set-theorists who believe the question has meaning. For a formalist, the only question that has meaning is, what can you prove? Well, if we did have ZF⊢VL, that would mean that ZFL was inconsistent. By Gödel’s relative consistency result, that can happen only if ZF itself is inconsistent!

Cohen showed that ZF+VL is also consistent (if ZF is), so just like AC and GCH, whether V=L cannot be settled by the axioms of ZF. For a formalist, that’s the end of the story. For a platonist—some one who believes that the universe of set theory “really exists”—the question still has meaning. (Your platonism has to be at least moderately strong: you could believe that a multiverse of sets “really exists”, with V=L true in some universes and not in others.)

For what it’s worth, the consensus among set theorists of the platonist persuasion seems to be that AC is true, GCH is false, and V=L is also false.

Prev TOC Next

Leave a comment

Filed under Set Theory

Set Theory Jottings 19. GCH implies AC.

Prev TOC Next

Sierpiński’s Theorem: GCH implies AC

I had a look at the version of the proof in Cohen (§IV.12). Sierpiński was a clever fellow, and he came up with a few tricks that would be hard to motivate.

Here I will try to imagine how Sierpiński could have devised his proof. Cohen does offer one bit of intuition:

The GCH is a rather strong assertion about the existence of various maps since if we are ever given that ABP(A) then there must be a 1–1 map either from B onto A or from B onto P(A). Essentially this means that there are so many maps available that we can well-order every set.

Let A be the set we wish to well-order. Let’s write AB to mean there is an injection from A into B. GCH tells us that for any U,

AUP(A) implies UA or U≡𝒫(A)

If U is well-ordered, then UA and U≡𝒫(A) both imply that A can be well-ordered, the latter because A is naturally imbedded in 𝒫(A). But this is too simple an approach: AU already makes A well-ordered for a well-ordered U, so if we could show the antecedent we’d be done—we wouldn’t need GCH to finish the job.

Let’s not assume U is well-ordered, but instead suppose it contains a well-ordered set. Say we could show that

AW+A≤𝒫(A)

for a well-ordered set W, where ‘+’ stands for disjoint union. (That is, W×{0}∪A×{1}, or some similar trick to insure disjointness.) Then we’d have

W+A≡𝒫(A) or W+AA

Now, if W+A≡𝒫(A), then we ought to have W≡𝒫(A), just because A is “smaller” than 𝒫(A) (in some sense)—W should just absorb A, if W is “big enough”. Also, if W is “big enough” then that should exclude the other arm of the choice, where W+AA. And if W≡𝒫(A), then 𝒫(A) and so also A can be well-ordered, as we have seen.

At this point Hartog’s theorem shows up at the door. This gives us a well-ordered set W with

W≤𝒫4(A) and WA

So we have

AW+A ≤𝒫4(A)+A ?≡? 𝒫4(A)

(where ?≡? means that the equivalence needs to be proven). WA excludes W+AA, good. Let’s postpone the issue of the ‘?’. Deal first with the problem that the bounds are not tight enough for GCH to apply. We fix that by looking at:

𝒫3(A)≤W+𝒫3(A)≤𝒫4(A)+𝒫3(A) ?≡? 𝒫4(A)

So if W+𝒫3(A)≡𝒫4(A), we ought to have W≡𝒫4(A) and hence a well-ordering of A. What about the other case, W+𝒫3(A)≡𝒫3(A)? Ah, then we have

𝒫2(A)≤W+𝒫2(A)≤W+𝒫3(A)≡𝒫3(A)

and so we can repeat the argument: either W+𝒫2(A)≡𝒫3(A), which ought to make 𝒫3(A) well-ordered and hence also A well-ordered; or W+𝒫2(A)≡𝒫2(A), in which case we repeat the argument yet again. Eventually we work our way down to

AW+AW+𝒫(A)≡𝒫(A)

and W+AA is excluded since WA, and we are done.

All this relies on the intuition that if W+M≡𝒫(M), then we should have W≡𝒫(M): we used this with M=𝒫n(A) for n=0,…,3. Well, we can prove something a little weaker.

Lemma: If W+M≡𝒫(M)×𝒫(M), then W≥𝒫(M).

Proof: Suppose h:W+M→𝒫(M)×𝒫(M) is a bijection. Restrict h to M and compose with the projection to the second factor: π2⚬(hM):M→𝒫(M). Cantor’s diagonal argument shows that this map cannot be onto. (The fact that π2⚬(hM) might not be 1–1 doesn’t affect the argument.) So for some s0∈𝒫(M), we know that h(x) never takes the form (−,s0) for xM. In other words, the image of hW must include all of 𝒫(M)×{s0}. Therefore 𝒫(M)×{s0} can be mapped 1–1 to a subset of W. qed.

The missing pieces of the proof now all take the form of absorption equations. We know that 𝒫(M)×𝒫(M)≡𝒫(M+M)—as an equation for cardinals, 2𝔪2𝔪=22𝔪. If we had 2𝔪=𝔪, that would take care of that problem. The ?≡? above also takes the form 2𝔪+𝔪 ?=? 2𝔪, for 𝔪 the cardinality of 𝒫3(A).

The general absorption laws for addition depend on AC. But we do have these suggestive equations even without AC:

𝔞+ω+1 = 𝔞+ω, 2𝔪+1 = 2·2𝔪

and so if 𝔞+ω=𝔪 and 2𝔪=𝔟, then 2𝔟=𝔟. So let’s say we set B=𝒫(A+ω). Then we have 2B≡𝒫(A+ω+1)≡B (where 2B, of course, is the disjoint union of B with itself). Also BB+1≤2BB, so BB+1 and so 𝒫(B)≡2𝒫(B). So if we replace A with B, then all gaps in the argument are filled and we conclude that B can be well-ordered. But obviously A can be imbedded in B, so A also can be well-ordered. QED.

Ernst Specker proved a “local” version of Sierpiński’s Theorem: if 𝔪 and 2𝔪 both satisfy CH, then 2𝔪=ℵ(𝔪).

Prev TOC Next

Leave a comment

Filed under Set Theory

Set Theory Jottings 18. The Axiom of Determinacy

Prev TOC Next

Just denying the axiom of choice doesn’t buy you much. If you’re going to throw away AC, you should add some powerful incompatible axiom in its place. The Axiom of Determinacy (AD) has been studied in this light.

Here’s one formulation. Let S be ℕ, i.e., the set of all infinite strings of natural numbers. Let GS. Alice and Bob play a game where at step 2n, Alice chooses a number s2n, and at step 2n+1, Bob chooses a number s2n+1. If s0s1s2…∈G, Alice wins, otherwise Bob wins. We say elements of G are assigned to Alice, and elements not in G are assigned to Bob. We’ll call the infinite strings results (of the game). Rather than think of G as a set of results, think of it as a function G:S→{Alice,Bob}.

A strategy for Alice tells her how to play each move. Formally, it’s a function from the set of all number strings of finite even length to ℕ. Likewise, a strategy for Bob maps number strings of finite odd length to numbers. A game is determined if Alice or Bob has a winning strategy, i.e., if the player follows the strategy then that player will win. The Axiom of Determinacy says that each game is determined.

Interesting thing about the proof that AC → ¬AD: it’s much easier using the well-ordering theorem instead of Zorn’s lemma.

First note that there are c=ℵ00 strategies (lumping together both Alice and Bob strategies), likewise c results. Assuming AC, well-order the strategies {Sα:α<ωc}. Here ωc is the least ordinal with cardinality c, so the set {α:α<κ} has cardinality less than c for each κ<ωc.

We construct a game G by inducting transfinitely through all the strategies, at step κ considering Sκ. Our goal is to assign some result to Alice or Bob that prevents Sκ from being a winning strategy. Say Sκ is an Alice strategy. Since we assign only one result at each step, fewer than c results have been assigned before step κ. However, there are c possible results if Alice follows Sκ, since Bob can play his numbers however he wants. So there exists a result where Alice follows Sκ but this result has not yet been assigned to either player. Assign it to Bob; this thwarts Sκ. If Sκ is a Bob strategy, just switch everything around. QED

The cardinality argument at the heart of this proof is harder to pull off with Zorn’s lemma (though possible, of course). (The exact same argument works with bit strings instead of strings of natural numbers, but for some reason AD is generally stated using ℕ instead of 𝒫(ℕ).)

Prev TOC Next

Leave a comment

Filed under Set Theory

Set Theory Jottings 17. Ordinals Revisited

Prev TOC Next

In post 13 I sketched proof by transfinite induction, and definition by transfinite recursion. Let’s take a closer look at that. By definition, < means ∈ for ordinals—treat that also as vernacular. Sometimes one symbol or the other makes the meaning clearer. We must keep in mind though that we have not shown that < is a well-ordering on Ω, or even a simple ordering.

Let Ω(x) be the formula for “x is an ordinal”, thus:

(∀u,v)(uvxux)
∧(∀u,vx)(u<v v<u u=v)
∧(∀ux)(¬u<u)
∧(∀u,vx)(¬(u<v v<u))
∧(∀u,v,wx)(u<v<w u<w)
∧(∀yx)(y≠∅ → ∃u(uy ∧ ∀v(¬(v<u vy))))

The first line says x is transitive, the next four lines say that < simply-orders x, and the last line says that < is a well-ordering on x. (Quite a bit of vernacular, but by now you should know how to deal with “uvx’’, for example.)

The clauses are not as economical as possible: for example, Foundation implies the third and last line, and a transitive irreflexive relation is automatically asymmetric, so even without Foundation we don’t need the fourth line. For the moment, we proceed without using Foundation.

Within an ordinal, ∈ well-orders; we want to bootstrap this to all of Ω. First an easy observation. Given a formula φ(x), if any β∈α satisfy φ, then there is a least such β. We need only invoke Separation to provide us with the set of all β∈α satisfying φ, and then appeal to the last clause in the definition of an ordinal.

How about smallest elements over all of Ω? That is,

∃αφ(α)→∃α(φ(α) ∧ ∀β(¬(β<α∧φ(β))))

(By vernacular convention, α and β are ordinals. Formally one begins “∃x(Ω(x)∧φ(x))…’’.)

Suppose ∃γφ(γ). If none of γ’s predecessors satisfy φ, then we’re done. Otherwise suppose φ(α) with α∈γ. Because γ is well-ordered, there is is an element of γ—let’s still call it α—such that α is the smallest element of the set {β∈γ:φ(β)}. We need to know that there are no ordinals, period, that satisfy φ and precede α. But if β was such an ordinal, we’d have β∈α∈γ. Since γ is transitive, we’d have β∈γ, and we’ve already ruled that out.

The contrapositive is transfinite induction. Letting ψ be ¬φ,

∀α((∀β<α)ψ(β)→ψ(α)) → ∀αψ(α)

With transfinite induction, we can show that Ω is simply-ordered. Only trichotomy presents any wrinkles. We induct on the property, “α is comparable with every ordinal.” Suppose this is true for all β<α. Let γ be an arbitrary ordinal. If γ<β or γ=β for some β∈α, then γ<α, since α is transitive (and < means ∈). Suppose then that β∈γ for all β∈α. That is, α⊆γ. If α=γ, we’re done. Suppose α⊂γ. Let δ be the smallest element of γ∖α. We will now show that α=δ, proving α∈γ, i.e., α<γ.

The key point: both α and δ are subsets of γ, and within γ, ∈ is a simple ordering. Extensionality kicks in: we have to prove x∈α↔x∈δ, or turned around, x∉δ↔x∉α. Now, α is an initial segment of γ because α is transitive. Therefore δ>x for all x∈α because δ∉α. So x∉δ implies x≥δ implies x∉α. On the other hand, x∉α implies x>y for all y∈α, again because α is an initial segment. Since δ is the least element of γ∖α, it follows that x≥δ, i.e., ¬x<δ, i.e., x∉δ. We’re done.

So < is a well-ordering on Ω, and we can trust our intuition about it (mostly).

Next consider transfinite recursion. The discussion in post 13 applies, with a few remarks. As before, we have a “rule” ρ(α,f,s) that assigns a value s to an ordinal α given as input a function f:α→···. Our rule is a formula, and may even include parameters. Thus: ρ(α,f,s,).

Suppose that for the parameter values = this formula is “function-like”; then we’re in business. One proves in ZF that the following condition

∀α,ρ(α,f,s,)
∧ ∀α,f,s,t (ρ(α,f,s,)∧ρ(α,f,t,)→s=t)

implies, for all α, the existence of a unique function gα:α+1→··· “obeying the recursion”. The proof uses transfinite induction, of course; also the set-existence axioms, especially Replacement. I won’t comb out all the hair, but essentially: if we have an “obedient” gβ for all β<α, Replacement plus Union plus the uniqueness allow you to combine all these gβ’s into one fα with domain α. Then you use ρ to assign a value to α, extending fα to gα.

The conclusion, even leavened with vernacular, is more than I’m prepared to write. But you get a theorem—technically, a different theorem for each formula ρ. (So it’s a theorem schema.) It has the form ∀(AB), where A is the above condition and B asserts the existence of g.

We don’t need to prove that the condition holds; rather, the theorem says that whenever it holds (perhaps only for some values of the parameters), then the conclusion holds. The proof does not require Choice: the ordinals are already well-ordered. As before, we also have a formula G(α,,s); if the condition holds for some c, then G(α,,s) assigns a unique s to every ordinal—a “function-like” thingy with “domain” Ω.

As a bonus, we can rehabilitate Cantor’s “proof” of the Well-Ordering Theorem. Let M be a set, and let c be a choice function for M. Our rule ρ says that the s assigned to α is c(M∖{f(β):β<α}), provided that the difference set is not empty. Otherwise we assign M to α. (We choose M simply because it is not an element of M.) We define G by transfinite recursion; informally, we can write Ĝ(α) for the value assigned to α. (I have left the parameters M and implicit.) If we never have Ĝ(α)=M, that means that G defines an injection of Ω into M, and inverting this gives us a map from M onto Ω. Replacement then says that Ω is a set. The Largest Ordinal paradox however proves this is not so. (The formal statement: ¬∃S x(Ω(x)↔xS).) Therefore for some α, gα maps α+1 1–1 onto M, and we’ve well-ordered M.

I’ve made a full meal of this argument. Typically, one would condense this discussion into a couple of sentences: “Given a choice function c, use it to define (by transfinite recursion) an injective function from an initial segment of Ω into M. As this function cannot be defined for all ordinals, it must map the ordinals less than some α onto M, and so M is well-orderable.”

While this proof resembles Cantor’s demonstration, it borrows essential features from Zermelo’s. Gone are the successive choices with their psychological aspect, replaced with a choice function. And the “union of partial well-orderings” in Zermelo’s 1904 proof lies buried inside the transfinite recursion.

Prev TOC Next

Leave a comment

Filed under History, Set Theory

Set Theory Jottings 16. Axioms of ZFC

Prev TOC Next

The Axioms of ZFC

The language of ZF (ℒ(ZF)) consists of basic first-order syntax, with a single binary predicate symbol ∈. Here is a list of the axioms, with a tag-line (an imprecise description) for each.

Extensionality:
Every set is determined by its elements.
Foundation:
All sets are built up in levels by starting with the empty set.
Pairs:
There is a set whose elements are any two given sets. (Or any one set.)
Union:
For every set, there is a set consisting of the union of its elements.
Power Set:
For every set, there is a set consisting of all its subsets.
Infinity:
There is an infinite set.
Replacement:
Given a set and a rule for replacing its elements, there is a set consisting of all these replacements.
Choice:
Given a set of pairwise disjoint nonempty sets, there is a set containing exactly one element from each of them.

Two more axioms:

Null Set:
There is a set with no elements.
Separation:
Given a set and a property, there is a set consisting of all the elements satisfying the property.

These are redundant. The version of Replacement we adopt implies Separation; Null Set follows from Pairs plus Separation. But they are historically important, and help understand ZFC.

Now for the formal, or at least more formal, statements. (I discussed “vernacular” in post 15. Recall that stands for stands for a list (t1,…,tn).)

Extensionality:

z(zxzy)→x=y

Using the defined symbol ⊆ (defined as ∀z(zxzy)), we could also write this as “xyyxx=y’’.

Foundation:

x(∃yx)(yx=∅)

Of course, ∩ and ∅ are defined terms, and (∃yx) is vernacular. Here is Foundation without using them:

xy(yx∧∀u(¬(uxuy)))

People sometimes call this Regularity. Foundation says there is an ∈-minimal element of x, that is, a yx none of whose elements belong to x. If that were false, we’d have an infinite descending chain xy1y2∋…. (Possibly it could end in a loop, i.e., ym=yn for some m<n.) But such an x is not “built up from ∅’’.

Foundation has an equivalent statement: There are no infinite descending ∈-chains. To prove this equivalence, you need Choice. The version above (due to von Neumann) avoids this issue, and is simpler to express.

Pairs:

xyz(z={x,y})

or without the defined term {x,y}:

xyzu(uz ↔(u=xu=y))

Since x=y is not excluded, this also covers singletons.

Null Set:

xyyx)

Union:

xu(u=⋃yxy)

or without the defined term ⋃

xuz(zu ↔∃y(zyyx))

In other words, the elements of the union are the elements of the elements of the original set.

Power Set:

xps(spsx)

Or in other words, p={s : sx}, just the kind of set-builder definition that was uncritically accepted before the paradoxes. The power set of x is denoted 𝒫(x). Without the defined term sx:

xps(sp↔∀y(ysyx))

Infinity:

w(∅∈w∧∀x(xwx∪{x}∈w))

We mentioned earlier the von Neumann representation for natural numbers: 0={∅}, 1={0}, 2={0,1}, etc. This axiom insures the existence of a set containing all the von Neumann natural numbers. I leave the vernacular-free expression of Infinity as an exercise for the fastidious.

Separation (sometimes called Subset) isn’t a single axiom, but an axiom schema. We have an instance of the schema for each formula φ(y,). For any set x and any choice of values  for , the instance says there is a set of all elements y of x satisfying φ(y,).

Separation:

 ∀xsy(ysyx∧φ(y,))

Separation justifies the use of the set-builder notation s={yx:φ(y,)}.

Replacement (sometimes called Substitution) is also an axiom schema, with an instance for each formula φ(y,z,). Suppose for some particular choice =, the formula φ(y,z,) defines z as a partial function φ° of y: Given y, there is at most one z making φ(y,z,) true. Then the range of φ° on any set x is a set.
Replacement:

x(∀y≤1z φ(y,z,) →∃w(w={φ°(y):yx}))

But the set-builder term is not justifiable by Separation. Eliminating it gives:

x[∀y≤1zφ(y,z,)
→∃wz(zw↔(∃yx)φ(y,z,))]

And we can eliminate ∃≤1z and ∃yx. We replace ∃≤1z with this:

uv(φ(y,u,)∧φ(y,v,)→u=v)

and (∃yx)φ(y,z,) with

y(yx ∧φ(y,z,))

This version of Replacement demands only that φ° be a partial function, not necessarily total. You can derive Separation from it. If φ(y,) is the separating property, then let ψ(y,z,) (for a particular =) define the partial function ψ°(y)=y when φ(y,) is true, and undefined when φ(y,) is false. Then applying ψ° to x gives the subset we want. Some authors use the “total” version of Replacement, and include Separation in their list of axioms.

Choice (AC) was the scene for major philosophical combat early in the 20th century, as we’ve seen. It’s easier to express than Replacement.

Choice:

x[(∀yx)y≠∅∧ (∀yx)(∀zx)(yzy∩z=∅)
→∃c(∀y∈x)#(c∩y)=1]

Eliminating some vernacular should be routine by now: y≠∅, yz=∅. As for #(cy)=1, this becomes

u(ucy∧∀v(vcyv=u))

where of course ucy is formally ucuy, likewise for v.

Another version of Choice does not require disjointness. It’s easy to express formally once you have the machinery of ordered pairs, and thus functions as sets of ordered pairs. It says: for every set x whose elements are all nonempty sets, there is a so-called choice function c such that c(y)∈y for all yx.

Unlike every other “set existence” axiom of ZFC, we can’t define c with set-builder notation, or indeed give any other explicit description of the choice function. We’ve seen how this lead to much distrust of AC. In 1938, Gödel showed if ZF is consistent, then ZFC is too. That calmed things down somewhat. I’ve cited Moore’s book before on the history of the Axiom of Choice.

Prev TOC Next

2 Comments

Filed under History, Set Theory

Set Theory Jottings 15. From Zermelo to ZFC: Formal Logic

Prev TOC Next

The Role of Formal Logic

In Zermelo’s original system, the Separation Axiom refers to a “definite property”. The Replacement Axiom refers to a “rule”. In 1922, Skolem proposed interpreting “definite” as “first-order definable”. So properties and rules are just formulas in the language of ZF, ℒ(ZF). With this clarification, ZFC assumes its modern form as a first-order theory.

ZF boasts a spartan vocabulary: just ∈ plus the basic symbols of first-order logic. Writing things out formally, with no abbreviations or short-cuts, rapidly snows us under unreadable expressions. We handle this (like everyone else) with semi-formal expressions; I like to call these “vernacular”. Copious hand-waving suggests how a masochist could write these out in the formal language ℒ(ZF).

Example: here’s how we say p=〈x,y〉={{x},{x,y}}, partially expanded:

a,b(p={a,b}∧a={x}∧b={x,y})

“∃a,b’’ is vernacular for “∃ab’’. Next we expand p={a,b} into

u(up↔(u=au=b))

and likewise for a={x} and b={x,y}.

A relation r is a set of ordered pairs, so more formally

(∀pr)∃a,b(p=(a,b))

which still has some vernacular. (∀pr)… more formally is ∀p(pr→…). Likewise, (∃xz)… in formal dress is ∃x(xz∧…).

To say f is a function with domain D, we start with the vernacular

f is a relation
∧ (∀(a,b)∈f)aD
∧ (∀aD) ∃!b((a,b)∈f)

“∀(a,b)∈f’’ expands to “(∀pf)∀a,b(p=(a,b)→…)’’. ∃!bφ(b), “exists a unique b satisfying φ’’, expands to ∃bφ(b)∧∀b,c(φ(b)∧φ(c)→b=c).

The vernacular f(x)=y becomes 〈x,y〉∈f. These should be enough to give you the flavor.

Often we have lists of variables, like x1,…xn. We write to reduce clutter; ∀ and ∃ have the obvious meanings.

When we get to the formal versions of Separation and Replacement, we’ll see how “property” and “rule” are made precise.

Coda

We’ve seen the informal use of “class” in ZF. This proved so convenient that NBG, a theory developed in succession by von Neumann, Bernays, and Gödel, gave a home to it. It turns out that NBG is a so-called conservative extension of ZFC: any formula of NBG that “talks only about sets” is provable in NBG iff it is provable in ZF.

In NBG, we still have only the symbol ∈ plus the basic logical symbols. However, certain members of the “universe” have the left-hand side of ∈ barred to them. If xy, then we say x is a set; anything that’s not a set is a proper class. So proper classes can have sets as elements, but cannot themselves be elements. The term class encompasses both sets and proper classes; in NBG, the variables range over classes.

Here’s how NBG skirts around Russell’s paradox. We can still write the formal expression R={x:xx}. This defines the class R, which is the class of all sets that are not elements of themselves. Is RR? No, because if it were, it would have to be a set that was not an element of itself. OK, if RR, doesn’t that mean that R satisfies the condition to be an element of R? No, not if R is a proper class—R contains only sets, no proper classes allowed!

Zermelo’s original system did not include Replacement and Foundation, although it did include Choice. Somewhat ahistorically, people use Z to refer to ZF minus Replacement, but including Separation. ZC is Z plus Choice.

ZF is a theory of “pure sets”. ZFA is “ZF with atoms”. An atom is an object that is not the null set but has no elements. The axioms of ZF can be modified to allow for this. Before the invention of forcing, Mostowski used ZFA to investigate theories without Choice.

Prev TOC Next

Leave a comment

Filed under History, Set Theory

Set Theory Jottings 14. From Zermelo to ZFC: Replacement and Foundation

Prev TOC Next

Replacement and Foundation

Thirteen years after Zermelo published his axioms, Fraenkel pointed out that they weren’t quite strong enough. For example, you couldn’t prove the existence of the set ⋃n∈ω𝒫n(0).

Fraenkel suggest the Axiom of Replacement (aka Substitution): If we have a way of “associating” with every element m of a set M a set Xm, then {Xm:mM} is a set. In other words, if you go through a set, replacing each element with another set, you get a set. The intuition: {Xm:mM} is “no bigger” than M, and so ought to be a set also.

Combined with Axiom of Unions, this is the gateway to big sets. Replacement gives us {𝒫n(0):n∈ω}, and then Union gives us Fraenkel’s set.

Zermelo agreed with Fraenkel for this addition to the axioms. Skolem also pointed out the need.

Using transfinite recursion, we now define the Cumulative Hierarchy:

V0 = ∅
Vα+1 = Vα∪𝒫(Vα)
Vλ = ⋃α<λ Vα

Fraenkel’s set is Vω.

(Two variations: it turns out that “Vα∪’’ isn’t neeeded in the second line, you get the same sets with “Vα+1=𝒫(Vα)’’. Some authors modify the definition so that their Vλ is the same as our Vλ+1.)

Power Set plus Replacement plus Union shows that all the Vα are sets. The rank of a set is where it first appears in the cumulative hierarchy, or more precisely: rk(X) is the least α with XVα+1. The “+1’’ is so that α has rank α.

How do we know that every set appears somewhere in the cumulative hierarchy? That brings us to our second additional axiom: Foundation.

Skolem noticed the need to exclude “pathological” sets, such as A={A} or infinite descending chains A1A2∋…. Such sets will never appear in any of the Vα’s. The Foundation Axiom does the trick.

von Neumann gave the most elegant version of Foundation: Every nonempty set contains an element disjoint from it. That is, for any set A, there is an aA with aA=∅. This a is minimal in the ∈ ordering of the elements of M. That’s another way to phrase it: Every nonempty set A has a minimal element in its ∈-ordering.

Foundation is equivalent to every set having a rank. Writing this the classy way, V=⋃α∈ΩVα. The proof of this takes a few steps; I’ll save it for the end of this post.

For so-called “ordinary mathematics”, you usually don’t need to go any higher than rank ω2. Example: the complex Hilbert space L2(ℝ3), beloved by analysts and physicists alike. We build up to this by contructing all the natural numbers, then the integers, rational numbers, and reals, ordered pairs and ordered triples of reals, then functions from the ordered triples to ordered pairs, and finally equivalence classes of these functions. (Recall that two L2 functions are identified if they agree except for a set of measure zero; hence, the actual elements of L2(ℝ3) are equivalence classes.) We have all the natural numbers by the time we get to Vω. If x,yVα, then the ordered pair 〈x,y〉∈Vα+2. So if AVα, then any subset of A×A is in Vα+3. Integers are often defined as sets of ordered pairs of natural numbers, rational numbers as sets of ordered pairs of integers, and real numbers as sets of rational numbers (Dedekind cuts). If I counted right, that puts all the elements of ℝ in Vω+7 and any subset of ℝ in Vω+8. The Hilbert space in question should belong to Vω+15, again if I haven’t miscounted.

Although sets of high rank aren’t “needed” by most mathematicians, it would be quite strange to impose a “rank ceiling” limitation on the power set axiom. Just maybe if Vω+1 were adequate—but it isn’t.

Without Choice, we cannot use von Neumann’s definition of card(M) as the least ordinal equivalent to M. An alternative is Scott’s trick: card(M) is the set of all sets of least rank equivalent to M. In other words, S∈card(M) iff SM and for all TM, rk(T)≥rk(S). This lacks the simplicity of von Neumann’s definition, but it’s the best we’ve got. Scott’s trick can be used for other purposes, e.g., to define the order type of a simply-ordered set.

Okay, let’s look at the equivalence of Foundation with V=⋃α∈ΩVα. We’ll need some machinery: transitive closures, ∈-induction, and the suprenum of a set of ordinals.

Recall that a set A is transitive if baA implies bA, i.e., aA implies aA. Every set A is contained in a transitive set called its transitive closure, tc(A). First we set f(A)=⋃aAa. Then set tc(A)=⋃n∈ωfn(A), where f0(A)=A. The transitive closure is the smallest transitive set containing A: A⊆tc(A), and if AB with B transitive, then tc(A)⊆B.

Suppose φ is a property where the following implication holds: If every element of A satisfies φ, then so does A. Then ∈-induction says that φ holds for all sets. Foundation justifies this. Suppose on the contrary that A is a non-φ set. Then some element of A is non-φ. Consider the subset S of tc(A) consisting of all non-φ elements. This is nonempty by hypothesis, so let sS be ∈-minimal in S. All the elements of s belong to tc(A) because tc(A) is transitive, so therefore all of s’s elements must satisfy φ (otherwise s wouldn’t be ∈-minimal). But then s must also satisfy φ, contradiction.

Lemma: If S is a set of ordinals, then ⋃α∈Sα is an ordinal. This is a set by the Union Axiom. That S is simply-ordered under ∈ is easy as pie. Foundation makes the proof of the well-ordering property trivial, though it’s not hard to prove without it. For ordinals, α⊆β is equivalent to α≤β (we’ll prove this in a later post), so S is an upper bound to all its elements. As usual, we call the least uppper bound to S its suprenum, denoted sup S.

Now suppose X is a set, and we associate an ordinal αx with each xX. Replacement says that {αx:xX} is a set; we denote its suprenum by supxXαx.

Time to prove that every set has a rank. We use ∈-induction. Suppose all the elements of A have ranks. Set α=supaA(rk(a)+1). As aA belongs to Vrk(a)+1 and VβVα whenever β≤α, all the elements of A belong to Vα. Therefore A belongs to Vα+1.

The reverse implication is straightforward, given this fact about rank: ab implies rk(a)<rk(b). (Proof: transfinite induction.) It follows that an element of A of lowest rank is ∈-minimal in A.

Prev TOC Next

Leave a comment

Filed under History, Set Theory

Set Theory Jottings 13. From Zermelo to ZFC: Everything’s a Set!

Prev TOC Next

Zermelo’s 1908 system differs from ZFC in three respects. (1) ZFC is a “pure” set theory, without atoms. (2) ZFC includes two additional axioms, Replacement and Foundation. (3) ZFC is a first-order theory. Zermelo’s axioms are not formalized in this way. In this and the next two posts I will go into details.

Everything’s a Set!

In Zermelo’s original conception, we begin with a foundation of so-called atoms (aka urelements), things like the natural numbers. We then build up the universe by forming sets of atoms, sets of sets, etc. Later people realized that with various “coding tricks”, you don’t need any atoms. We’ve already encountered the von Neumann finite ordinals: 0=∅, 1={0}, 2={0,1}, etc. So these can serve for the natural numbers.

Zermelo had no ordered pairs. Kuratowski came up with this trick: define the ordered pairx,y〉 to be the set {{x},{x,y}}. We can extract x and y from p=〈x,y〉={a,b} by noting that x is the only element of ab, and then y is the other element of ab, if ab has two elements. If ab has only one element, then of course y is that element. With Kuratowki’s ordered pairs on hand, we can deal with relations and functions without Zermelo’s awkward workarounds.

von Neumann defined ordinals by two properties. First:

A set X is transitive if baX implies bX. Equivalently, if aX implies aX.

Then:

An ordinal is a transitive set that is well-ordered under ∈.

We need transitivity, otherwise (for example) {5} would compete with {0} for the title of “1”.

I won’t go through the development of the ordinals in detail, but here are the first few steps. The precedessors of an ordinal are its elements: β<α iff β∈α. The successor of an ordinal α (denoted α+ in this post) is α∪{α}. If α is neither 0 nor a successor, it is a limit ordinal. By standard convention, λ always denotes a limit ordinal.

A finite ordinal (aka natural number) is defined as a non-limit ordinal (i.e., either a successor or 0), all of whose elements are also non-limit ordinals. If an ordinal is finite, so is its successor. The Axiom of Infinity says there is a set, say w, closed under α↦α+ and containing 0. Separation now says there is set, denoted ω, of all the finite ordinals in w. One proves that ω is the set of all finite ordinals.

Ω denotes the class of all ordinals. (Some authors use “On”.) Classes are not a formal part of ZFC, but we permit them as a “manner of speaking”. If φ(x) is a property of sets, we might talk about the class φ of all sets x satisfying φ(x), and write x∈φ to mean φ(x). For example, α∈Ω is just another way to say that α satisfies von Neumann’s two properties. We write V for the class of all sets, corresponding to the property x=x. Any set x is a class, corresponding to the property yx. A non-set class is called a proper class. After a fashion, proper classes are the descendents of Cantor’s “absolutely infinite multitudes”.

It is almost trivial to prove transfinite induction on Ω. Say φ(α) is a property of ordinals, and say we have this implication: If φ holds for all β<α, then it holds for α. Then φ holds for all ordinals. (Proof: If it doesn’t hold for α, then it doesn’t hold for some β<α. Since α is well-ordered, there is a least β<α for which it doesn’t hold. But then it holds for all γ<β, and hence for β.)

While Ω isn’t a well-ordered set, it shares enough properties to call it (informally) a well-ordered class. Namely, the ordinals are simply ordered: we will prove this in a later post. Any property φ of ordinals has a least ordinal satisfying it.

Definitions by transfinite recursion take a bit more fussing. I’ll leave the details to the textbooks, but basically: we have a “rule” determining a value sα whenever we are given all the values sβ with β<α. How are we given all these values? By a function fα with domain α, so fα(β)=sβ for all β<α. The “rule” is a predicate with two free variables, say ρ(x,y); we want ρ(fα,sα) to hold. The rule ρ should be “function-like”, i.e., for every x there is a unique y such that ρ(x,y) holds. You drop the function fα into the input hopper of ρ, and out pops a value sα. (If you’re wondering how s0 gets assigned, for α=0 the function is the empty function f0=∅.)

Using transfinite induction, one can show that for every α∈Ω there is a unique function gα with domain α+=α∪{α} that “obeys the recursion”. That is, for every β≤α, if fβ is the restriction of gα to β, then ρ(fβ,gα(β)). We say that we’ve defined gα by transfinite recursion up to and including the ordinal α.

Thus we have a function gα, with domain α+, for every ordinal α. For any β<α, gβ is the restriction of gα to β′. We can’t formally define a function with domain Ω, but we do have a “function-like” predicate γ(x,y) with “domain” Ω: γ(α,sα) holds if and only if gα(α)=sα.

Now that we have pairs, functions, and ordinals, we can populate our universe with all the usual flora and fauna. ℕ is the same as ω. The standard rigamarole then constructs integers, rationals, reals, Hilbert spaces,…

What about cardinalities? Frege and Russell both proposed “The cardinality of M is the set of all sets equivalent to M.” But a more elaborate version of Russell’s paradox kills this approach. Informally, we can talk about the class of all sets equivalent to M. We need to pick a representative from each such class.

von Neumann proposed: the first ordinal in the class. ℵ0 is just ω. ℵ1 is the first uncountable ordinal. 20 is whichever ordinal is the first one with cardinality c. The cardinality of M is thus the unique smallest ordinal μ such that M is equivalent to μ.

All this assumes that for any M, there is an ordinal equivalent to it. That follows from the well-ordering theorem, which uses the Axiom of Choice. If AC isn’t around, then you need a different idea. We’ll see one later.

Prev TOC Next

Leave a comment

Filed under History, Set Theory

Set Theory Jottings 12. Zermelo on “definiteness”

Prev TOC Next

In the last post, I mentioned Zermelo’s 1929 paper “On the concept of definiteness in axiomatics”. By this time, people had suggested replacing “definite” with “definable in first-order logic”. Zermelo did not agree with this.

Three points, I think, explain Zermelo’s views.

First, Zermelo never really “got” formal logic. He didn’t grasp the distinction between the meta-theory and the object theory, nor that between syntax and semantics. His correspondence with Gödel shows this. Another example: in the 1929 paper, he complains that the inductive definition of a first-order formula “depends on the concept of finite number whose clarification, after all, is supposed to be one of set theory’s principal tasks.” The inductions of course belong to the meta-theory.

Second, Zermelo approached axiomatization in the spirit of Euclid rather than with the philosophy of formalism. The axioms assert mathematical truths. They are not the arbitrary rules of a game. Hilbert’s famous “tables, chairs, and beer mugs” remark expresses the need to rid the development of any reliance on visual intuition. In much the same way, Zermelo stressed the purely objective character of the Axiom of Choice.

Third, Zermelo found Skolem’s countable model of ZF unacceptable. Recall the resolution of Skolem’s paradox: the power set of ω is not the “true” power set. In a 1930 paper, Zermelo gave his final version of the axioms of set theory. In a footnote to the Separation Axiom, he writes:

Like the replacement function in [the Replacement Axiom], the propositional function 𝔣(x) can be completely arbitrary here, and all consequences of restricting it to a particular class of functions cease to apply from the present point of view. I shall consider elsewhere more thoroughly “the question of definiteness” in connection with my last contribution to this journal and with the critical “remarks” by Mr. Th. Skolem.

The implicit criticism: if you hobble the Separation Axiom by allowing only first-order definable properties, no wonder you get a countable power set!

The 1929 paper offered the following definition of “definite property”. Zermelo writes Dp to say that p is a definite property. Then (changing notation and rewording somewhat, except for the quoted parts):

  1. First, all fundamental relations are definite.”
  2. Definiteness is passed on to composite assertions as follows
    1. If Dp, then Dp).
    2. If Dp and Dq, then D(pq) and D(pq).
    3. If Df(x,y,z,…) “for all (permissible) combinations of values”, then D((∀x,y,z,…)f(x,y,z…)) and D((∃x,y,z,…)f(x,y,z…)).
    4. If DF(f) “for all definite functors f’’ then D(∀f F(f)) and D(∃f F(f)).
      Definiteness is passed on to the quantifications.”
  3. If P is the system of all definite properties, then “it has no proper subsystem P1’’ that contains all the fundamental relations and is closed under the compositions listed above.

In clause (II.4), the f ranges over properties (or “propositional functions”), so we have a second-order quantification. Furthermore, we have an implicit circularity: the scope of ∀f is restricted to definite properties, just what we’re in the midst of defining. But clause (III) is perhaps even worse: without a robust set theory, how are we to interpret the quantification over all subsystems of P? Skolem made both these points in his reply.

Zermelo’s 1930 paper “On boundary numbers and domains of sets: New investigations in the foundations of set theory”, gave (as I mentioned above) his final version of the axioms. This includes both Replacement and Foundation, but curiously not Choice—as an axiom. Zermelo writes:

We have not explicitly formulated the “axiom of choice” here because it differs in character from the other axioms […] However, we use it as a general logical principle upon which our entire investigation is based; in particular, it is on the basis of this principle that we shall assume in the following that every set is capable of being well-ordered.

Prev TOC Next

2 Comments

Filed under History, Logic, Set Theory