Author Archives: Michael Weiss

by Michael Weiss | July 8, 2026 · 2:33 pm

Set Theory Jottings 25. Mostowski Collapsing Lemma

The Collapse of the Tacoma Bridge

A relational system is a pair (A,R) where A is a class and R is a relation on A. If R has certain properties, we get a mapping (the Mostowski collapsing map) F:A→M from A to a transitive class M. F is an isomorphism, in this sense: yRx if and only if F(y)∈F(x). If yRx, we say y is a component of x, and we write x* for the class of components of x.

Here are the properties we demand of (A,R):

Extensionality: For all x,y in A, if x≠y then x*≠y*.
Properness: For all x in A, x* is a set.
Well-foundedness: R is well-founded, i.e., every subclass of A has an R-minimal element.

If we drop extensionality, we can still define the Mostowski map F, but we no longer get an isomorphism.

An important special case: R is ∈. In this situation, x* is the set of elements of x that belong to A; in other words, A∩x.

It helps to think of the elements of A as names or labels for sets. The Mostowski mapping x↦F(x) sends the name x to the named set F(x). The following fact defines F, as we will see:

F(x)={F(y):yRx} (Eq.1)

That is, you recursively apply F to each component of x, and gather the results into a set. So:

The names of the elements of F(x) are the components of x.

Note that if an element x has no components, then Eq.1 says that F(x)=∅.

Let’s look at an example, then see how to justify these claims.

Figure 1: A Mostowski Map

Fig.1 illustrates the Mostowski mapping F for a relational system (A,R). All the R-edges are explicit: although bRdRe, we do not have bRe.

Here a is a name for the empty set, since it has no components. The only component of b is a, so b*={a} and F(b)={F(a)}={∅}=1. Likewise for the rest of the figure. Here is all of F:

a↦∅

b↦{∅}=1

c↦{∅,1}=2

d↦{1}

e↦{{1}}

f↦{{1},2}

Another example, based on Fig.1: let R=∈. Let a be some random set, say ω. That makes ω a name for the empty set in this system. Also b={ω}, while F(b) is still {∅}. Likewise for the rest of the figure. (Exercise: what is f? What is F(f)?)

If we add a couple of nodes (a′ and c′) to Fig.1, we get the non-extensional system of Fig.2 below. Note that a′ has the same components as a. This single failure of extensionality propagates upwards: while c and c′ have different components, they still map to the same set because a′ and a both name the same set. The Mostowski map is no longer an isomorphism.

Figure 2: A Non-extensional Mostowski Map

The map is the same as in Fig.1, plus a′↦∅ and c′↦{∅,1}=2.

Next, let’s see why Eq.1 defines a function on all of A. I don’t want to get bogged down in details, so I’ll just sketch the reasoning. We define the descendants of x∈A just like the transitive closure in post 14: let s(x)=x*, and then desc(x) = x ∪ s(x) ∪ s²(x) ∪···∪ sⁿ(x) ∪ ···.

Say that u∈A is good if there is a function f_u with domain {u}∪desc(u) satisfying Eq.1:

f_u(x)={f_u(y):yRx} for all x∈{u}∪desc(u)

If u∈A is not good, we say it is bad. By the well-foundedness of R, if there are any bad elements in A, there is an R-minimal one, say u. So all the components of this bad u are good. Next we show that if v₁ and v₂ are components of x and f_v₁ and f_v₂ are the functions satisfying Eq.1 on their respective domains, then the functions agree on the overlap—on the intersection of the domains. If not, there is an R-minimal element (say t) where they disagree. But then Eq.1 shows that they have to agree on t as well.

This means that we can “meld” all the component f_v’s together, to get a function defined on all descendents of u. But then we can use Eq.1 to define the function on u as well, satisfying the equation throughout, and u wasn’t bad after all.

Finally we define F with domain A by melding all the f_u’s together.

Eq.1 obviously implies

yRx →F(y)∈F(x)

Extensionality (not used till now) makes this an “if and only if”, and also implies that F is injective. Here’s why. Say x₁≠x₂ but F(x₁)=F(x₂). Since x₁≠x₂, one of the x’s has a component the other lacks; say y₁Rx₁ and ¬(y₁Rx₂). So F(y₁)∈F(x₁)=F(x₂). But can we have F(y₁)∈F(x₂)? Only if there is a y₂Rx₂ such that F(y₁)=F(y₂). From x₁≠x₂ with F(x₁)=F(x₂), we’ve obtained y₁Rx₁, y₂Rx₂, with y₁≠y₂ and F(y₁)=F(y₂). An R-minimality argument shows this can’t happen, and so F is injective. Now suppose we have elements x and y with F(y)∈F(x). Eq.1 says that F(y) is F(y′) for some y′Rx. But injectivity tells us that y=y′, so yRx, as required.

Up top we mentioned the key fact about the Mostowski map:

The Mostowski map F is an isomorphism from (A,R) to (M,∈), where M is a transitive class.

Naturally we define M as the image of A under F, making F surjective, and so we have our isomorphism. What about transitivity of M? This is immediate: given any F(x)∈M, its elements are, by definition of F, of the form F(y). So they also belong to M.

We next look at the special case where R is ∈. As noted above, in this case x*=A∩x. So (A,∈) is automatically proper because x* is contained in the set x. Well-foundedness also comes for free because of Foundation.

We focus on the interplay between extensionality and transitivity. Transitivity implies extensionality: if A is transitive, then x*=A∩x=x, since x is an element of A and hence a subset of A. This shows even more: if A is transitive then F is the identity map. Proof: F(x)={F(y):y∈x*}={F(y):y∈x}, so by R-minimality of ∈ (or ∈-induction, see post 14), F(x)=x for all x∈A.

We can squeeze a little more juice out of this. Assume that B is a transitive subclass of A, with A perhaps not transitive. For any x∈B, x*=x because x⊆B⊆A. So the same argument shows that F is the identity on B.

Extensionality does not imply transitivity. When R is ∈, then all components are elements, but not all elements need be components.

For example, here’s a system (A′,∈) with the same graph as in Fig.1:

a={5}, b={a,6}, c={a,b,5}, d={b,7}, e={d}, f={c,5,7}

I started with the image of F (i.e., F(a)=∅, F(b)=1, etc.) as the “main ingredients”. I never said what a,b,c etc. were in Fig.1, so I let them be their images under F. Then I added 5, 6, and 7 as “spices” to some elements of A, without adding them as elements in their own right. The result is A′. Since the spices do not belong to A′, their addition doesn’t change the graph. So b’s only component is still a, c’s only components are still a and b, etc. The Mostowski map for (A′,∈) (call it F′) “de-spices” the dish. Result: F′(A′)=A. We have an extensional non-transitive system mapped to a transitive one.

At the end of post 24 I mentioned a reflection principle, which I now rephrase:

If K is a countable set and Φ is a finite set of ZF closed formulas (allowing parameters from K), then there is a countable set M⊇K reflecting all the formulas in Φ.

Suppose that K contains a transitive subset K₀. Add to Φ the Extensionality Axiom, “∀x,y(x≠y→∃z(z∈x∧z∉y ∨ z∈y∧z∉x))”. Apply the reflection principle to get an extensional relational system (M,∈) reflecting all of Φ. Apply the Mostowski map F to (M,∈) to get a transitive relational system (N,∈), ∈-isomorphic to (M,∈). The ∈-isomorphism implies that N also reflects Φ. F restricted to K₀ is the identity by what we’ve said above. So K₀ is a subset of N (although K might not be). Obviously N is countable. We conclude:

If K is a countable set containing a transitive subset K₀, and Φ is a finite set of closed formulas in ℒ_K(ZF) (so allowing parameters from K), then there is a countable transitive set N⊇K₀ reflecting all the formulas in Φ. In reflecting the formulas of Φ, the parameters from K are replaced with images under an ∈-isomorphism; this ∈-isomorphism is the identity on K₀.

This may seem rather complicated, but we will need it in a later post.

Prev TOC Next

Leave a comment

Filed under Set Theory

by Michael Weiss | July 1, 2026 · 9:33 am

Set Theory Jottings 24. Reflection Principles

Prev TOC Next

Mount Hood Reflected in Mirror Lake (Public Domain)

A reflection principle gives circumstances in which K⊧φ iff M⊧φ, where K⊆M. “As above, so below.”¹ The absoluteness of Δ₀ formulas could be called a reflection principle: if φ is Δ₀, then the transitivity of K and M is all we need. Usually though people reserve the term for the downward Löwenheim-Skolem theorem and its descendents.

The original downward Löwenheim-Skolem was the first really deep theorem of first-order logic. It says that if M is a structure for a countable language ℒ and K₀⊆M, K₀ countable, then there is a K with K₀⊆K⊆M, K also countable, with K “reflecting” M in this sense: for any formula φ(x̄) in ℒ and any ā in K,

K⊧φ(ā) ↔M⊧φ(ā)

We say K is an elementary substructure of M. Standard notation: K≼M.

The absoluteness of Δ₀ formulas says that for a highly restricted class of formulas, truth is reflected, with very modest conditions on the pair K⊆M. The downward LS says that for any formula φ(x̄) and any M, we can find a K that “reflects” M. In a later post I will outline how reflection is used. Briefly, it compensates for failures of absoluteness.

Let’s quickly recap the proof of the downward Löwenheim-Skolem theorem. (The Logic notes give a fuller discussion.) K is a union of an ascending chain K₀⊆K₁⊆…. We obtain K_n+1 from K_n by adding “witnesses”: for every closed formula of the form

∃x φ(x,c̄), c̄∈K_n

if M⊧∃x φ(x,c̄), then we pick an element w∈M (the witness) for which M⊧φ(w,c̄), and add it. That is, K_n+1 is K_n plus all these witnesses. Note that we allow names of any elements of K_n to appear in the formula ∃x φ(x,c̄). This construction is not computable, of course, and assumes the axiom of choice. The countability of K is easy to verify.

To prove that K=⋃_n K_n is an elementary substructure of M, we do the usual induction on formula complexity. This is a routine crank-turning, except for one case: formulas of the form ∃x φ(x,c̄) with c̄∈K. But even that is easy, because we must have c̄∈K_n for large enough n. Suppose M⊧∃x φ(x,c̄). Let w be the witness for this, so w∈K_n+1 and so w∈K. By inductive hypothesis, K⊧φ(w,c̄) if and only if M⊧φ(w,c̄). So K⊧∃x φ(x,c̄). The converse (if K⊧∃x φ(x,c̄) then M⊧∃x φ(x,c̄)) is trivial.

From Löwenheim-Skolem to Reflection

Let’s try to apply the Löwenheim-Skolem theorem to the universe V. Our “structure” is (V,∈). Problem: this is not a structure in the formal sense, since the domain is a proper class and not a set. So the Löwenheim-Skolem theorem doesn’t apply.

One can surmount this by invoking Axiom SM. This says that there is a set M such that (M,∈) is a transitive model of ZF. Axiom SM implies that ZF is consistent, and so ZF cannot prove it (by Gödel’s Second Incompleteness Theorem). Assuming SM, Löwenheim-Skolem then tells us that there is a countable model of ZF.

Should we believe SM? Cohen gives this justification:

The Löwenheim-Skolem theorem allows us to pass to countable submodels of a given model. Now, the “universe” does not form a set and so we cannot, in ZF, prove the existence of a countable submodel. However, informally we can repeat the proof of the theorem. We recall that the proof merely consisted of choosing successively sets which satisfied certain properties, if such a set existed. In ZF we can do this process finitely often. There is no reason to believe that in the real world this process cannot be done countably many times and thus yield finally a countable standard model for ZF. The only reason this cannot be done in ZF is simply that there is no property A(n,x) in ZF which expresses, for each n, the property of x which we wish to consider at the n-th stage.

The undefinability of truth is the rub. The proof we sketched relies on the notion of satisfaction. We could express Cohen’s A(n,x) formally, if we had a formula true(⌜φ⌝) expressing truth in (V,∈). But as we noted in the previous post, no such formula exists.

But ZF can prove more limited reflection principles. One is the Lévy-Montague reflection principle. This says that for any formula φ(x̄) and any V_β, there is a V_α⊇V_β that “reflects the universe” with respect to φ. That is, for all c̄∈V_α,

V⊧φ(c̄) ↔V_α⊧φ(c̄)

You can adapt the proof outlined above to show this. Since V_β is not usually countable, you need to use transfinite induction. For each formula ∃x φ(x,c̄) with c̄∈V_β, if V satisfies it, then you will find a witness by ascending far enough in the cumulative hierarchy. Because the ordinals go on “forever”, this transfinite process eventually produces a V_β′ with all the witnesses we need. The remainder of the argument is pretty much the same. (See Drake (§3.6) for a proof without frantic handwaving.)

As a variant, we can replace V, V_β, and V_α with L, L_β, and L_α. The proof is the same. We will need this version in a later post.

Another version says that there is a countable model (M,∈) reflecting all ZF formulas up to a given parsing depth. This follows from the existence of a ZF formula expressing truth in V up to a given parsing depth—see the previous post. As a special case, we can reflect any finite set of ZF formulas.

Minor variation: we can demand that M contain any countable set K. And if K is a transitive set, we can demand that M is also transitive. This last bit uses the Mostowski Collapsing Lemma of the next post.

The famous Skolem paradox

Consider the theorem “𝒫(ω) is uncountable”. The proof of this uses only a finite number of ZF axioms, so it has a countable model. What gives? Answer: in any countable model, 𝒫^M(ω) possesses a 1–1 correspondence with ω, but the correspondence isn’t in the model. (Skolem’s paradox has many variations, all with similar resolutions.)

[1] A quote from the Emerald Tablet of Hermes Trismegistus, a alchemical sacred text.

Prev TOC Next

Leave a comment

Filed under Set Theory

by Michael Weiss | June 24, 2026 · 8:00 pm

Set Theory Jottings 23. Absoluteness of Constructibility

Prev TOC Next

Now we turn to the absolutness of the notion of constructibility. There is a formula Λ(x) which says that x is constructible, and which holds in L iff it holds in V. Λ(x) is not Δ₀, nor is it absolute over all transitive classes, so some subtleties come into play. (It is absolute between models of ZF.)

At the heart of constructibility lies the notion of definability, in turn depending on satisfaction. This requires coding formulas as sets, much like the Gödel coding used in the proof the Gödel’s incompleteness theorem. I will lean heavily on corner bracket notation for this: ⌜φ⌝ stands for the Gödel set for the formula φ. For example, 〈⌜φ⌝,⌜ψ⌝〉 ↦⌜φ∧ψ⌝ stands for the function taking the Gödel sets for a pair of formulas to the Gödel set for their conjunction.

I will make a clear distinction between syntax and semantics, emphasize the role of names, give some related results for context, and handwave; I hope all this makes the discussion readable.

Some notation, using my conventions. Let a be any set.

Symbol	Meaning	Example
ℒ(ZF)	Language of ZF
ℒ_V(ZF)	ℒ(ZF) augmented with names for all sets
ℒ_a(ZF)	ℒ(ZF) augmented with names for all elements of a
ℰ	codes of formulas of ℒ_V(ZF)	⌜φ(x̄,c̄)⌝
ℰ^a	codes of formulas of ℒ_a(ZF)	⌜φ(x̄,c̄)⌝ with c̄∈a
S	codes of sentences (closed formulas) of ℒ_V(ZF)	⌜φ(c̄)⌝
S(a)	codes of sentences of ℒ_a(ZF)	⌜φ(c̄)⌝ with c̄∈a
M	codes of monadic formulas of ℒ_V(ZF)	⌜φ(x,c̄)⌝
M(a)	codes of monadic formulas of ℒ_a(ZF)	⌜φ(x,c̄)⌝ with c̄∈a

ℒ(ZF) has only one predicate symbol ‘∈’; I regard ‘=’ as a basic logical symbol. For ℒ_V(ZF), we could let sets name themselves. Since ℒ(ZF) has no (built-in) constants, name and constant are synonymous here.

It is illuminating to consider three results together, all concerning the “definability of truth”. There is a pure formula true(x,y) (in ℒ(ZF)) defining truth in the structure (a,∈) for sentences in ℒ_a(ZF). For any n∈ℕ, there is a pure formula true_n(x) defining truth in (V,∈) for sentences in ℒ_V(ZF) of parsing depth at most n. But there is no pure formula true(x) defining truth in (V,∈) for sentences in ℒ_V(ZF). The formula for ℒ_a(ZF) is even Δ₁^ZF. Summarizing:

Yes (Δ₁^ZF)	a⊧φ(c̄) iff V⊧true(a,⌜φ(c̄)⌝) (c̄∈a)
Yes	V⊧φ(c̄) iff V⊧true_n(⌜φ(c̄)⌝) (depth≤n)
No	V⊧φ(c̄) iff V⊧true(⌜φ(c̄)⌝)

(I’ve written a⊧ instead of (a,∈)⊧ for brevity, likewise for V.) The last result is known as Tarski’s theorem on the undefinability of truth. The first two are formalizations of Tarski’s definition of truth.

Induction and Functions

Before plunging into details, some preliminaries.

We’ll work with a function V_a instead of a predicate true(a,−);
V_a(⌜φ(c̄)⌝)=1 iff a⊧φ(c̄). Functions often have technical advantages over predicates.

Formally defining a function f means having a formula φ(x̄,y) for the relation f(x̄)=y. If φ is Δ₁^ZF, then we say that f has a Δ₁^ZF definition (likewise for Σ₁ or Π₁ or pure or whatever). Most authors require functions to have domains that are sets, so the word functional is often used when this restriction is dropped. Recall that a formula φ(x̄,y) is function-like over a transitive class K if it defines a functional defined on all of K.

As a reminder, Σ₁ definitions are absolute upwards, Π₁ absolute downwards. and Δ₁^ZF are absolute between models of ZF.

Example: the power set functional 𝒫(x) has a Π₁ definition because y=𝒫(x) iff ∀z[z∈y↔z⊆x], and z⊆x is Δ₀. The ordered pair functional 〈x,y〉 is Δ₀ because

z=〈x,y〉 iff (∃u,v∈z)[z={u,v}∧u={x}∧v={x,y}]

and z=x×y is Δ₀ because

z=x×y iff	(∀p∈x)(∀q∈y)(∃u∈z)[u=〈p,q〉]
	∧(∀u∈z)(∃p∈x)(∃q∈y)[u=〈p,q〉]

Special case: 0-ary functions, aka distinguished elements. The most important example is ω: w=ω iff w is transitive and the elements of w are linearly ordered under ∈, because Foundation then implies that w is well-ordered under ∈. So ω has a Δ₀ definition.

Definition by transfinite induction preserves Σ₁^ZF-ness, Π₁^ZF-ness, and hence Δ₁^ZF-ness. Proof: Suppose the functional F(x) is Σ₁^ZF. Define F^* by

F^*(0)	= ∅
F^*(α+1)	= F(F^*(α))
F^*(λ)	= ⋃_α<λF^*(α)

Then y=F^*(α) iff there is a function f with domain α+1 such that f satisfies the inductive demands for all β≤α and y=f(α). Explicitly, but with some vernacular:

α is an ordinal and

∃f[f is a function with domain α+1 and

f(0)=∅ and

(∀β∈α+1) [f(β+1)=F(f(β))] and

(∀ limit λ∈α+1) [f(λ)=⋃_β∈λf(β)] and

y=f(α)]

Thus if F is Σ₁^ZF, so is F^*. On the other hand, if F is Π₁^ZF, we use a formula saying that all functions f with domain α+1 and satisfying the inductive demands must have f(α)=y. Explicitly:

α is an ordinal and

∀f[f is a function with domain α+1 and

f(0)=∅ and

(∀β∈α+1)[f(β+1)=F(f(β))] and

(∀ limit λ∈α+1) f(λ)=⋃_β∈λf(β)]→

y=f(α)]

We can ring variations on this theme. Extra parameters can come along for the ride, i.e., replace F(x) with F(x,ȳ) and F^*(α) with F^*(α,ȳ). We don’t have to start with F^*(0)=∅, we can define a functional F^*(α,a) where a is the initial value (i.e., F^*(0,a)=a, F^*(α+1,a)=F(F^*(α,a)), etc.) We can include α as an additional argument to F, i.e., F^*(α+1)=F(α,F^*(α)). “Ordinary” inductions that only go up to ω preserve Σ₁^ZF/Π₁^ZF/Δ₁^ZF-ness because “being ω’’ is Δ₀.

Example: the functional y=tc(x) (the transitive closure of x) is Δ₁^ZF because z∈x is Δ₀ and tc has the inductive definition

tc(0,x)	= x
tc(n+1,x)	= tc(n,x)∪⋃_z∈xtc(n,z)
tc(ω,x)	= ⋃_n∈ωtc(n,x)

and tc(x)=tc(ω,x). Likewise rank is Δ₁^ZF, and more generally definitions by so-called ∈-induction preserve Σ₁^ZF/Π₁^ZF/Δ₁^ZF-ness.

For industrial-strength use, one can develop a whole calculus to determine places in the complexity hierarchy. But we need very little of this.

I’ve included the ZF superscript throughout; this sweeps away any concerns about the placement of bounded quantifiers.

Formalizing Syntax in ZF

To formalize syntax in ZF, we code the base layer in a somewhat arbitrary but straightforward manner. Then we throw (ordinary) induction at it.

For the base layer, we need codes for all the individuals, i.e., the variables and the names of elements of V. We could use 〈0,i〉 to code v_i and 〈1,a〉 to code the name for a. But we don’t need to worry about those details, we’ll just write ⌜v_i⌝ and ⌜a⌝.

Next, formulas. One can obviously code a formula as a finite sequence of codes of symbols and of individuals. But I think it’s cleaner to use a parse tree. We have the “tree-building” functionals

(⌜x⌝,⌜y⌝) ↦ ⌜x∈y⌝

= 〈⌜x⌝, ⌜y⌝, 0〉

⌜φ⌝ ↦ ⌜¬φ⌝

= 〈⌜φ⌝, 1〉

(⌜φ⌝,⌜ψ⌝) ↦ ⌜φ∧ψ⌝

= 〈⌜φ⌝, ⌜ψ⌝, 2〉

⌜φ⌝ ↦ ⌜(∃v_i)φ⌝

= 〈⌜φ⌝, ⌜v_i⌝, 3〉

using ordered pairs and triples, with the last slot serving as “tag” to identify the type of each node (negation, conjunction, etc.) That makes it pretty obvious that these functionals are Δ₀.

The functional a↦ℰ^a takes a set a and finds all codes of formulas where the names all refer to elements of a. Like practically everything in the realm of syntax, we induct on the depth of parse trees. It’s very similar to the definition of the transitive closure. We start with the base layer of atomic formulas, denoted ℰ₀^a. We want ℰ^a to satisfy the inductive condition

y∈ℰ^a iff

y∈ℰ₀^a or

y=⌜¬φ⌝ with ⌜φ⌝∈ℰ^a or

y=⌜φ∧ψ⌝ with ⌜φ⌝,⌜ψ⌝∈ℰ^a or

y=⌜(∃v_i)φ(v_i)⌝ with ⌜φ(v_i)⌝∈ℰ^a

To fit this into the F, F^* paradigm, we let F take a set x of codes of formulas, and throw in one application of any of the tree-building functionals. So x⊆F(x), and if ⌜φ⌝∈x then ⌜¬φ⌝∈F(x), etc. Next, F^*(0)=ℰ₀^a, and F^*(n+1) is defined inductively as F(F^*(n)). It follows from the generalities on induction that ℰ^a is Δ₁^ZF.

The class ℰ of codes of all formulas in ℒ_V(ZF) is a proper class. It has a Σ₁^ZF definition: x∈ℰ iff ∃a[x∈ℰ^a].

It’s the same story for other aspects of syntax. For example, the substitution functional (⌜φ(x)⌝,⌜c⌝) ↦⌜φ(c)⌝ has a Δ₁^ZF definition.

Formalizing Truth in ZF

We turn our attention to the two truth predicates we can formalize in ZF (a⊧φ(c̄), and V⊧φ(c̄) for depth(φ)≤n) and the one we can’t (V⊧φ(c̄)).

For atomic sentences, the formalization is a breeze:

true₀(x) iff

(x=⌜c=d⌝∧c=d)

∨(x=⌜c∈d⌝∧c∈d)

Let’s start with the inductive definition we’d like for
V⊧φ(c̄):

true(x) iff

x is atomic and true₀(x)

∨ x=⌜¬φ⌝ ∧ ¬true(⌜φ⌝)

∨ x=⌜φ∧ψ⌝ ∧ true(⌜φ⌝) ∧ true(⌜ψ⌝)

∨ x=⌜∃xφ(x)⌝ ∧ ∃d true(⌜φ(d)⌝)

Because of the circularity, this doesn’t actually define a formula in ℒ(ZF); rather, it expresses the property we’d want the formula to have. Tarski’s theorem tells us that no such formula exists.

First modification:

true_n+1(x) abbreviates

x is atomic ∧ true₀(x)

∨x=⌜¬φ⌝∧ ¬true_n(⌜φ⌝)

∨x=⌜φ∧ψ⌝∧ true_n(⌜φ⌝)∧true_n(⌜ψ⌝)

∨x=⌜∃xφ(x)⌝∧ ∃d true_n(⌜φ(d)⌝)

Imagine the right hand side expanded out repeatedly until we have a formula in ℒ(ZF). Put another way, the induction is outside ZF, although the longer and longer formulas belong to ZF.

Not just ever longer formulas: true_n is Σ_n^ZF, because of the ∃d embedded in it. (If we’d made ∀ fundamental and ∃ an abbreviation, then true_n would be Π_n^ZF.)

As noted, V⊧φ iff V⊧true_n(⌜φ⌝), provided φ has depth ≤n. For the second modification, we define a single formula true(x,y) such that a⊧φ(c̄) iff V⊧true(a,⌜φ(c̄)⌝). This time there is no restriction on the depth of φ(c̄), but we do demand that c̄∈a.

We handle the circularity just as we did for ℰ^a. Recall that S(a) is the set of codes of sentences of ℒ_a(ZF). Let S_n(a) be the codes for sentences of depth ≤n. Let T_n(a) be the set of all true sentences of S_n(a), i.e., all that are satisfied by (a,∈). We have an inductive definition of T_n(a).
T₀(a) presents no issues: x∈T₀(a) iff x is atomic and true₀(x).

x∈T_n+1(a) iff x∈S_n+1(a) and [

x is atomic and x∈T₀(a)

∨ x=⌜¬φ⌝∧ ⌜φ⌝∉T_n(a)

∨ x=⌜φ∧ψ⌝∧ ⌜φ⌝∈T_n(a) ∧ ⌜ψ⌝∈T_n(a)

∨ x=⌜∃xφ(x)⌝∧ (∃d∈a)(⌜φ(d)⌝∈T_n(a))]

It’s no sweat to turn this into a function F such that T_n+1(a)=F(T_n(a)) for all n∈ω. Moreover, F has a Δ₀ definition, because all the tree-building functionals are Δ₀. Note that the crucial existential quantifier, “(∃d∈a)’’, is now bounded. So we have a Δ₁^ZF definition of the set
T(a) of true sentences in S(a).

I’ve emphasized the parallels between ℰ^a and T(a). Now let’s highlight the differences. We defined the proper class ℰ via the equivalence x∈ℰ≡∃a(x∈ℰ^a). Why doesn’t this work for T⊆S, the proper class of true sentences about V? First hint of the problem: V⊧φ(c̄) and ∃a(a⊧φ(c̄)) are not equivalent, even when c̄∈a. We search for a culprit; the quantifier pleads guilty. Syntax doesn’t care about the scope of a quantifier ∃x—it’s just a node in the parse tree. But for semantics, the scope is central to the meaning. Put another way, when syntax examines the formula φ(x̄,c̄), it “sees” only the names explicitly present. Semantics considers all possible names when turning ∃xφ(x) into φ(d).

Absoluteness of L

Consider this list of relations.

a⊧φ(c̄), with c̄∈a.
y={x∈a:a⊧φ(x,c̄)}.
y∈ℱ(a).
y=ℱ(a).
y∈L_α and y=L_α, where α is an ordinal.
y∈L, that is, Λ(y)

These are all Δ₁^ZF except the last one. But y∈L is absolute between V and L. Proof:

We’ve just seen that this is Δ₁^ZF, or rather, its equivalent
true(a,⌜φ(x,c̄)⌝) is.
y={x∈a:a⊧φ(x,c̄)} iff (∀z∈y)(a⊧φ(z,c̄)) and (∀z∈a)[(a⊧φ(z,c̄))→z∈y]. So this is Δ₁^ZF.
y∈ℱ(a) iff there is a monadic formula φ(x,c̄) in ℒ_a(ZF) such that y={x∈a:a⊧φ(x,c̄)}. Recall that M(a) is the set of codes of such monadic formulas. So
y∈ℱ(a) iff

(∃p∈M(a))[y={x∈a:true(a,p)}]

The new feature: instead of a true bounded quantifier, we have something of the form (∃u∈f(z))ψ(y,z,u) where f(z) and ψ are both Δ₁^ZF. But that’s equivalent to ∃v(∃u∈v)[v=f(u)∧ψ(y,z,u)], which is Σ₁^ZF. It’s also equivalent to ∀v[v=f(u)→(∃u∈v)ψ(y,z,u)], which is Π₁^ZF.
y=ℱ(a) iff (∀z∈y)[z∈ℱ(a)]∧(∀z∈ℱ(a))[z∈y]. We can handle the “kind-of” bounded quantifier, (∀z∈ℱ(a)), much the same as we handled (∃p∈M(a)) in the previous item.
Transfinite induction preserves Δ₁^ZF-ness.
∃α(y∈L_α). So y∈L is Σ₁^ZF. So Λ is upwards absolute from L to V. In the reverse direction, suppose V⊧Λ(s) for some s. Then for some ordinal α, V⊧s∈L_α. But L contains all ordinals, and being an ordinal is absolute, and L_α is absolute, so L⊧s∈L_α and hence Λ is downwards absolute from V to L.

This final item (6) is the goal of the whole argument. But as a matter of curiosity you might ask, is Λ downwards absolute between models of ZF? How about upwards and downwards absoluteness for transitive classes in general?

(1)–(6) are not Σ₁, though they are Σ₁^ZF. We mentioned in post 22 that if you slap a bounded quantifier in front of a Σ₁ formula, the result is upwards absolute between transitive classes. Using this, one can show that (1)–(6) are upwards absolute between transitive classes. So all these notions are upwards absolute between transitive classes.

We needed one feature of L, besides being a model of ZF, to establish downwards absoluteness from V: the fact that L contains all ordinals. Are there any standard models of ZF not containing all ordinals? The “yes” answer is one form of axiom SM. Using this, plus forcing, one can show Λ is not downwards absolute between models of ZF.

It’s much easier to show that Λ is not downwards absolute between transitive classes. Consider L_α+1 for some α. Suppose s has rank α, i.e., s∈L_α+1∖L_α. (For example, let s=α or s=L_α.) So s is constructible, but not constructible “in L_α+1’’. For if L_α+1⊧Λ(s), then L_α+1⊧∃β(s∈L_β), i.e., s∈L_β for some β≤α. But we assumed s had rank α.

Prev TOC Next

Leave a comment

Filed under Set Theory

by Michael Weiss | March 11, 2026 · 12:36 pm

Set Theory Jottings 22. Absoluteness

Prev TOC Next

Let’s look again at the notion of definability, rewritten slightly: for any set A, x⊆A is definable over A if there is a first-order formula φ(y,ū) and elements ā∈A such that

x={z∈A:φ^A(z,ā)}

where φ^A is φ relativized to A, i.e., all quantifiers in φ range only over A.

Very clearly the right-hand side depends on A. In some cases, we can write a formula for x that is independent of A. For example:

x={y} ↔ y∈x ∧ ∀z(z∈x→z=y)

Absoluteness is at the heart of Gödel’s proof that V=L holds in L. Failures of absoluteness present the main technical obstacles to showing that L satisfies ZF.

Three Non-absolute Notions

Let’s look at three non-absolute notions:

	x=𝒫(y)
	z is uncountable
	x=𝒫_un(y)={z⊆y:z is uncountable}

Now think about relativizing these three notions to L and to the L_α’s. Suppose x and y are both present in some L_α, and L_α⊧ x=𝒫(y). As we ascend to higher L_β’s, the assertion “x=𝒫(y)’’ can become false¹, because new subsets of y can appear in L_β. The reverse switch from false to true can’t happen: once we have a “witness” z to x≠𝒫(y) (i.e., z⊆y but z∉x, or z⊈y but z∈x), it won’t go away. (Note that the meaning of x and y can’t change, because the L_α’s are all transitive: by the time x shows up, all its elements have shown up. Ditto for y.)

Here’s a slightly different way to look at it. Consider the sequence x_α=𝒫^L_α(y). The sequence x_α is monotonically increasing (indeed, 𝒫^L_α(y)=𝒫(y)∩L_α). So the assertion x=x_α can flip from true to false as α increases². It can’t flip from false to true, because that would mean that x had some elements (subsets of y) that x_α was missing—but in that case x wouldn’t have been an element of L_α in the first place.

It’s a similar story for uncountability. We have:

L_α⊧ z is uncountable ↔ L_α⊧¬∃f(f:z↣ω)

where f:z↣ω is shorthand for saying that f is a injection from z into ω. If a witness f to the countability of z appears in some higher L_β, then z “becomes countable”, and remains countable after that.

Next we look at x=𝒫_un(y). Let x_α=𝒫_un^L_α(y). New uncountable subsets of y can appear at any time. But also, a subset of y that is uncountable in L_α can become countable later on. The upshot: 𝒫_un^L_α(y) can both gain and lose elements as α increases, and the truth-value of “x=𝒫_un(y)’’ can switch back and forth over and over again.

We will look at the limiting behavior (i.e., 𝒫^L(y) and 𝒫_un^L(y)) later on.

Absolute Formulas

How about absolute formulas? We say that a formula φ(ū) is absolute if for all ā in K

	K⊧φ(ā) ↔ V⊧φ(ā)
	(K a transitive class, ā∈K)

in other words, the truth-value of K⊧φ(ā) doesn’t depend on K, provided only that K is a transitive class. (To be precise, when we say K⊧φ(ā), we mean (K,∈)⊧φ(ā). Also, ā∈K means that all the a_i belong to K. Note that without ā∈K, K⊧φ(ā) makes no sense.)

Δ₀ formulas

A Δ₀ formula is one where all quantifiers are bounded, i.e., of the form (∀x∈y) or (∃x∈y).

Here’s the intuition behind Δ₀ formulas. In post 14, we defined the transitive closure of x to the elements of x, plus the elements of the elements of x, etc. Here we want the augmented transitive closure, where we add x as an element. Any transitive class containing x as an element contains its augmented transitive closure. In fact, it’s easy to see that the augmented transitive closure of x is the smallest transitive class containing x as an element, and that the augmented transitive closure is a set. OK: if φ(ā) is Δ₀, then to find out if K satisfies φ(ā), we only need to root around in the augmented transitive closure of the a_i’s. We never need to search through all of K. It looks like Δ₀ formulas should always be absolute.

One proves this by a routine induction on complexity. If φ(x̄) and ψ(x̄) are absolute, it’s immediate that ¬φ(x̄), φ(x̄)∧ψ(x̄), and φ(x̄)∨ψ(x̄) are absolute. As for (∃ū∈v)φ(ū,x̄), one direction is easy. Suppose for ā,b∈K we have K⊧(∃ū∈b)φ(ū,ā). Then

	K⊧(∃ū∈b)φ(ū,ā)
	⇒(∃c̄∈K) K⊧c̄∈b∧φ(c̄,ā)
	⇒(∃c̄∈V) V⊧c̄∈b∧φ(c̄,ā)
	⇒V⊧(∃ū∈b)φ(ū,ā)

In the other direction, again with ā,b∈K (as demanded by the definition of absoluteness)

	V⊧(∃ū∈b)φ(ū,ā)
	⇒(∃c̄∈V) V⊧c̄∈b∧φ(c̄,ā)
	⇒(∃c̄∈K) K⊧c̄∈b∧φ(c̄,ā)
	⇒K⊧(∃ū∈b)φ(ū,ā)

We know that c̄∈K in the third line because c̄∈b∈K and K is transitive. The inductive assumption takes us from V⊧φ(c̄,ā) to K⊧φ(c̄,ā).

Some examples of Δ₀ formulas:

Example 1: “x⊆y’’ is Δ₀, since it can be written (∀z∈x)z∈y.

Example 2: “x is an ordinal” is Δ₀. In post 17 we wrote out the clauses for this; for example, one was “(∀u,v∈x)(u<v∨v<u∨u=v)’’. (Recall that < is the same as ∈ for ordinals.) This is obviously Δ₀. The transitivity of x was “(∀u,v)(u∈v∈x→u∈x)’’, which we can rewrite as “(∀v∈x)(∀u∈v)u∈x’’. Only the last clause isn’t Δ₀, even rewritten this way:

(∀y⊆x)(y≠∅ → (∃u∈y) u∩y=∅)

The issue is “∀y⊆x’’. This does not count as a bounded quantifier. But Foundation makes this clause unnecessary.

Example 3: “f:x↣ y’’ is Δ₀, i.e., f is an injection of x into y. Intuition: We just need to dig inside the guts of f and its domain and range to show that f is a function and is 1–1.

(∀〈u,z〉∈f) (∀〈v,z〉∈f) u=v

says that f is injective. The vernacular “∀〈u,z〉∈f’’ expands to “(∀p∈f)p=〈u,z〉’’, and this presents no snags.

Example 4: “w=ω’’ is Δ₀, i.e., w is the first infinite ordinal. Here’s the formula for this:

w is an ordinal ∧ (∀y∈w)(y=∅ ∨ (∃x∈w)y=x⁺)

where x⁺=x∪{x}, the successor of x.

Example 5: f:y↣ω, i.e., f establishes that y is countable. This follows immediately from the last two examples.

In none of these examples do we ever need to climb outside the transitive closure of a given set and wander around the entire class K.

In contrast, we cannot check that K⊧x=𝒫(y) or that x is countable in K without surveying all of K, looking for (respectively) subsets of y and injections f.

Syntactic Analysis

With this in mind, we turn to a syntactic analysis of our non-absolute examples.

As noted, z⊆y and f:z↣ω are Δ₀. So x=𝒫(y) is of the form

∀z δ₁(x,y,z)

and “x is uncountable” is of the form

∀z δ₂(x,z)

where δ₁ and δ₂ are Δ₀.

Finally, x=𝒫_un(y) is of the form ∀z∀f∃g δ₃(x,y,z,f,g), where δ₃ is Δ₀. Showing this takes a bit of work. First we break down x=𝒫_un(y) into three parts:

	∀z(z∈x → z⊆y)
	∧ ∀z(z∈x → ∀f ¬(f:z↣ω))
	∧ ∀z((z⊆y ∧ ∀g ¬(g:z↣ω)) → z∈x)

We ask, how could this conjunction fail to be true? This way:

	∃z(z∈x ∧ z⊈y)
	∨ ∃z(z∈x ∧ ∃f f:z↣ω)
	∨ ∃z(z⊆y ∧ ∀g ¬(g:z↣ω) ∧ z∉x)

Now we use basic logical equivalences to move the quantifiers outwards. Say we have formulas φ, ψ(u), and ξ(u), where u does not appear in φ. Then

φ∧∃uψ(u)	↔∃u(φ∧ψ(u))
φ∨∃uψ(u)	↔∃u(φ∨ψ(u))
φ∧∀uψ(u)	↔∀u(φ∧ψ(u))
φ∨∀uψ(u)	↔∀u(φ∨ψ(u))
∃uψ(u) ∨ ∃uξ(u)	↔∃u(ψ(u)∨ξ(u))

So we can rewrite the failure of x=𝒫_un(y) as:

	∃z ∃f ∀g [
	(z∈x ∧ z⊈y)
	∨ (z∈x ∧ f:z↣ω)
	∨ (z⊆y ∧ ¬(g:z↣ω) ∧z∉x) ]

The stuff inside the brackets is Δ₀, so negating this gives us the form we claimed.

Thinking in terms of witnesses makes this more picturesque. Suppose x≠𝒫_un(y). In other words, x is accused of the crime of not being 𝒫_un(y). The prosecution and defence must provide witness lists before the trial starts. The prosecution lists z and f; the defence, all the g’s. Any one of the three disjuncts is sufficient to convict; let’s imagine a trial lasting three days. The witness z is called each day. On the first day, if z testifies to being an element of x but not a subset of y, game over. But suppose z surprises Jack McCoy (the prosecutor) by being a subset of y. On the second day, f is also called, to testify to the countability of z; McCoy hopes to show that z∈x. Too bad for McCoy, f’s testimony falls apart. On the third day, z is recalled and is shown to be both a subset of y, and not an element of x after all! The defence tries to argue that’s ok, because z is countable. He calls up every single g to testify to being the required injection, but each g fails. The jury convicts and McCoy repairs to the bar to have a drink with his ADA.

The Lévy Hierarchy

Summarizing the previous section:

x=𝒫(y)	is of the form	∀uδ₁(x,y,u)
x is uncountable	is of the form	∀uδ₂(x,u)
x=𝒫_un(y)	is of the form	∀u∀v∃wδ₃(x,y,u,v,w)

where δ₁, δ₂, and δ₃ are all Δ₀ formulas. This syntactic analysis fits into a scheme known as the Lévy Hierarchy.

Formulas of the form ∀uδ(x̄,u) are called Π₁ formulas; replace the ∀ with an ∃, and you’ve got a Σ₁ formula. (Of course, δ here stands for a Δ₀ formula.) More generally, any string of ∀’s is allowed at the front of a Π₁ formula, likewise any string of ∃’s at the front of a Σ₁ formula.

The negation of a Π₁ formula is Σ₁, and vice versa. Truth “propagates upwards” for Σ₁ formulas and “propagates downwards” for Π₁ formulas. The intuition is clear: if K⊧∃ūδ(ā,ū) for some ā⊆K with K a transitive class, then we have witnesses—elements c̄∈K such that K⊧δ(ā,c̄). The witnesses cannot be impeached by enlarging K, because δ is Δ₀. So ∃ūδ(ā,ū) holds also in any transitive class containing K. Likewise for the downward propagation with Π₁ formulas.

∀ū∃v̄δ(x̄,ū,v̄) is a Π₂ formula; its negation is a Σ₂ formula. A formula that looks like ∀ū∃v̄∀w̄…δ, where there are n alternating quantifier blocks, is a Π_n formula; starting off with an existential block gives a Σ_n formula.

A formula equivalent to both a Σ₁ and a Π₁ formula will thus be absolute—but that word “equivalent” is the kicker. Equivalent in what sense? One answer: φ(x̄) is Σ_n^ZF if there is a Σ_n formula ψ(x̄) such that ZF⊢∀x̄(φ(x̄)↔ψ(x̄)); likewise for Π_n^ZF. If a formula is both Σ_n^ZF and Π_n^ZF, we say it’s Δ_n^ZF. So Δ₁^ZF formulas are absolute between models of ZF. (And of course, Σ₁^ZF formulas are absolute upwards, Π₁^ZF absolute downwards, but only between models of ZF.)

This tradeoff between admitting more formulas or more classes can take a variety of forms. I won’t explore the full landscape, but a few aspects should be highlighted.

First let’s look at the role of bounded quantifiers. In Σ₁ formulas they must appear on the inside the scope of the unbounded ∃x̄. (∀x∈y)∃z(z∈x) is not Σ₁, for example.

If we restrict attention to ZF-models, then we can allow bounded quantifiers anywhere, and still get something equivalent to Σ₁. Example: the formula (∀x∈y)∃zφ(x,y,z) is ZF-equivalent to ∃u(∀x∈y)(∃z∈u)φ(x,y,z), by an argument involving ranks³. So we can migrate all bounded quantifiers to the inside.

Formulas like (∀x∈y)∃zφ(x,y,z) are absolute upwards for all transitive classes, not just models of ZF. The idea is simple: in quantifying (∀z∈y)… with y∈K, the bounded quantifier never asks us to go outside K, for a transitive K. So if the assertion holds for K, it will hold for any transitive M⊇K.

For “function-like” formulas, we have another trick. We encountered this notion in posts 13 and 17: φ(x̄,y) is function-like if for any x̄ there is a unique y making φ(x̄,y) true. That is, ∀x̄∃y∀z(φ(x̄,z)↔z=y).

Any formula that is absolute upwards between transitive classes K⊆M, and is function-like over both K and M, is in fact absolute between K and M. Proof: Suppose ā∈K, and we have both K⊧φ(ā,b) and M⊧φ(ā,c), with b∈K and c∈M. Because φ(x̄,y) is absolute upwards from K to M, we also have M⊧φ(ā,b). But since φ(x̄,y) is function-like over M, that means b=c. So K⊧φ(ā,c).

This result has a counterpart in the Lévy hierarchy. Suppose we have a Σ₁ formula

∃ūφ(x̄,y,ū)

where φ(x̄,y,ū) is Δ₀. Suppose also that K is a transitive class, and ∃ūφ(x̄,y,ū) is function-like for K. That is, for any c̄∈K, there is a unique d such that K⊧(∃ū)φ(c̄,d,ū). Then our formula is equivalent to this Π₁ formula over K:

∀ū∀z(φ(x̄,z,ū)→y=z)

I relegate the proof to the end of this post.

Now suppose that (∃ū)φ(x̄,y,ū) is function-like for both K and M with K⊆M. Then it is equivalent in both classes to a Π₁ formula; we might say it is Δ₁ for K and M, and hence absolute between them.

The Π_n/Σ_n classification is not confined to set theory; in a more general context, quantifier-free formulas play the role of Δ₀ formulas. Historically, proofs in logic often began by reducing formulas to prenex normal form (i.e., all quantifiers in front). This isn’t so widespread anymore. But induction on the “complexity” of formulas still pervades logic, and the Π_n/Σ_n classification is our deepest analysis of this complexity.

Here is the argument about function-like Σ₁ formulas. Suppose one of these formulas holds for (c̄,d). If the antecedent holds in the Π₁ formula for some ū with x̄=c̄ and z=d′, then the Σ₁ formula also holds for (c̄,d′). By the uniqueness hypothesis, d=d′ and the consequent holds for (c̄,d) in Π₁ formula. That shows that the Σ₁ formula implies the Π₁ formula. For the other direction, suppose the Π₁ formula holds for (c̄,d). By the existence part of function-likeness, there must be a ū and a d′ making the Σ₁ formula true for (c̄,d′). The Π₁ formula tells us that d=d′, so the Σ₁ formula holds for (c̄,d). This argument can be extended by induction to show that function-like Σ_n formulas are Π_n.}

[1] Just to be clear: by saying “x=𝒫(y) becomes false”, I mean that although L_α⊧x=𝒫(y), L_β⊧x≠𝒫(y). Here x and y are fixed elements of L, which both belong to L_α (and thus also to L_β).

[2] Again, to be clear, by “flip” I mean that x=x_α but x≠x_β for two ordinals α<β.

[3] For each x in y, let α_x be least ordinal such that there is a z of rank α_x making φ(x,y,z) true. Let ξ=sup_x∈yα_x, and set u=V_ξ. Later on I will call this sort of reasoning a waiting argument.

Prev TOC Next

Leave a comment

Filed under Set Theory

by Michael Weiss | March 5, 2026 · 2:37 pm

Set Theory Jottings 21. The Constructible Universe

Prev TOC Next

The constructible universe is traditionally denoted L. L is a subclass of V and is a proper class. Gödel proved three things about L:

All the axioms of ZF hold in L, i.e., L is a model of ZF.
V=L holds in L, i.e., L is a model of the axiom “All sets are constructible”. Cohen: “This is a small but subtle point. It says that a constructible set is constructible when the whole construction is relativized to L.”
V=L→AC and V=L→GCH are both provable in ZF.

So we can’t prove not-GCH in ZF. If we could, it would have to hold in L, but GCH holds in L. Ditto for AC.

L is constructed according to the familiar transfinite scheme, using a function ℱ (discussed below):

L₀	= ∅
L_α+1	= ℱ(L_α)
L_λ	= ⋃_α<λL_α
L	= ⋃_α∈Ω L_α

Let A be a set. ℱ(A) is a subset of 𝒫(A); it’s the set of all sets that are definable using elements of A.

Here’s the precise definition. For any set A, x⊆A is definable over A if there is a first-order formula φ(y,ū) and elements ā∈A such that

z∈x ↔ z∈A ∧ φ^A(z,ā)

where φ^A is φ relativized to A, i.e., all quantifiers in φ range only over A. (Logic Notes §5 treats relativisation.) As we said before, ℱ(A) is the set of all subsets of A that are definable over A.

So ℱ(A) is something like 𝒫(A), except we include only those sets where we can explicitly describe their criterion for membership. Cohen discusses how this notion arose from, but did not resolve, concerns about so-called impredicative definitions. (We talked about this in post 3 on the paradoxes, and in post 5 on Zermelo’s proof of the well-ordering theorem.)

Some examples. The singleton {x}, the unordered pair {x,y}, the ordered pair 〈x,y〉={{x},{x,y}}, and the power set 𝒫(x) are all definable from x or from x and y:

z∈{x}	↔ z=x
z∈{x,y}	↔ z=x ∨ z=y
z∈〈x,y〉	↔ z={x} ∨ z={x,y}
z∈𝒫(x)	↔ ∀u[u∈z → u∈x]

Each right-hand side is a first-order formula characterizing the elements of a set. As usual, imagine the vernacular expanded. For example, instead of z={x}, we have ∀t(t∈z↔t=x). The left-hand sides are abbreviations for the right-hand sides, so (for example) in the formal definition of 〈x,y〉, “z={x}’’ and “z={x,y}’’ have been expanded.

The prime examples of sets not obviously definable are choice functions. For example, it’s easy to say what we desire of a choice function c for 𝒫(ℝ), where ℝ=𝒫(ω):

(∀s⊆ℝ) (s≠∅→c(s)∈s)

But this doesn’t characterize c. (We’ve already seen how to express formally “c is a function with domain 𝒫(ℝ)∖{∅}’’, and “c(s)∈s’’.)

Gödel’s L is an example of an inner model. The method of inner models proves relative consistency results: If a theory 𝒯 is consistent, then so is 𝒯+φ, where φ is a formula φ in ℒ(𝒯). To apply this method, you have find a formula α(x) in ℒ(𝒯), and show two things:

for all ψ∈𝒯,	𝒯⊢ψ^α
and also	𝒯⊢φ^α

where ψ^α is ψ relativized to α.

You can approach this method syntactically or semantically. First, semantics: Let’s say T is a model for 𝒯. Consider the substructure selected by α(x), call it A. It’s a model of 𝒯+φ, because each formula ψ∈𝒯, when interpreted as speaking about A, is equivalent to ψ^α interpreted in T:

A⊧ψ if and only if T⊧ψ^α

Likewise for φ. We’ve found a model of 𝒯+φ sitting inside a model of 𝒯.

Syntactically, say we had a proof of a contradiction in 𝒯+φ. Go through and relativize everything with α. Now we have a proof of a contradiction in 𝒯: all the relativized axioms of 𝒯+φ can be proved in 𝒯, and it turns out that relativization preserves the logical axioms and rules of inference. (Picky point: we need ∃xα(x) to hold too.)

Gödel’s treatment emphasized the syntactic aspect, Cohen’s the semantic.

Of course the hard part is proving the relativizations:

for all ψ∈ZF,	ZF⊢ψ^L
and also	ZF⊢(V=L)^L

So Con(ZF) → Con(ZF+V=L). People write ZFL for ZF+V=L, “All sets are constructible.” Gödel also showed that ZFL⊢AC and ZFL⊢GCH, giving relative consistency for these too.

Here’s a trivial example of the method of inner models. Let Group be the first-order theory of groups, and let abelian be the axiom x·y=y·x. Any model of Group (i.e., any group) has a model of Group+abelian sitting inside of it, namely its center. The formula

ζ(x) ↔∀y[x·y=y·x]

selects the center of the group. (Picky point: we can’t let the ∀y be implicit, since we need ζ(x) to define a unary relation.) Within Group, we can prove that the center of a group is an abelian group. It’s not totally trivial that the center of a group is even a group, i.e., that it’s closed under the group operation. Anyway, this argument shows the relative consistency result

Con(Group)→Con(Group+abelian)

admittedly a trivial result, but it illustrates the method.

Before Gödel’s L, the most prominent example of this method was von Neumann’s class of well-founded sets: V=⋃_α∈ΩV_α. This shows that if ZF minus Foundation is consistent, then ZF is too. The demonstration amounts to a much easier “dry run” for Gödel’s results.

Finally, let’s note an important feature of the inclusion L⊆V. Is it a proper inclusion, i.e., are there any non-constructible sets? Gödel thought so. So do most set-theorists who believe the question has meaning. For a formalist, the only question that has meaning is, what can you prove? Well, if we did have ZF⊢V≠L, that would mean that ZFL was inconsistent. By Gödel’s relative consistency result, that can happen only if ZF itself is inconsistent!

Cohen showed that ZF+V≠L is also consistent (if ZF is), so just like AC and GCH, whether V=L cannot be settled by the axioms of ZF. For a formalist, that’s the end of the story. For a platonist—some one who believes that the universe of set theory “really exists”—the question still has meaning. (Your platonism has to be at least moderately strong: you could believe that a multiverse of sets “really exists”, with V=L true in some universes and not in others.)

For what it’s worth, the consensus among set theorists of the platonist persuasion seems to be that AC is true, GCH is false, and V=L is also false.

Prev TOC Next

Leave a comment

Filed under Set Theory

by Michael Weiss | January 20, 2026 · 5:19 pm

From Kepler to Ptolemy 23

Prev TOC Next

The Astronomia nova: “One Sustained Argument”

In his classic The Sleepwalkers, Arthur Koestler said this about the Astronomia nova:

Kepler was incapable of exposing his ideas methodically, text-book fashion; he had to describe them in the order they came to him, including all the errors, detours, and the traps into which he had fallen. The New Astronomy is written in an unacademic, bubbling baroque style, personal, intimate, and often exasperating. But it is a unique revelation of the ways in which the creative mind works.

Scolarship has decisively rebutted Koestler’s description. Gingerich first suggested this, after examining some of Kepler’s unpublished manuscripts:

Most commentators have assumed, because of Kepler’s sequential and at times autobiographical style, that Kepler has spared no detail in the chronicle of his researches. Examination of the manuscript material … shows, on the contrary, that the book evolved through several stages and represents a much more coherent plan of organization than a mere serial recital of his investigations would allow.

Stephenson’s Kepler’s Physical Astronomy stated this more forcefully:

This profoundly original work has been portrayed as a straightforward account of converging approximations, and it has been portrayed as an account of gropings in the dark. Because of the book’s almost confessional style, recounting failures and false trails along with successes, it has in most cases been accepted as a straightforward record of Kepler’s work. It is none of these things. The book was written and (I shall argue) rewritten carefully, to persuade a very select audience of trained astronomers that all the planetary theory they knew was wrong, and that Kepler’s new theory was right. The whole of the Astronomia nova is one sustained argument, and I shall make what I believe is the first attempt to trace that argument in detail.

Donahue says this in the introduction to his translation:

That is, although Kepler often seems to have been chronicling his researches, the Astronomia nova is actually a carefully constructed argument that skillfully interweaves elements of history and (it should be added) of fiction. Taken as history, it is often demonstrably false, but Kepler never intended it as history. His introduction to the “Summaries of the Individual Chapters” makes his intentions abundantly clear. Caveat lector!

Finally, Voelkel dug deeper into how and why Kepler composed the book as he did. Legal obstacles intervened. And Kepler’s correspondence with David Fabricius showed him what aspects of the “new astronomy” would prove most difficult for his fellow astronomers to swallow.

Even so, the Astronomia nova roughly follows the path that Kepler took. In broad strokes:

True Sun:: Use this instead of the mean sun (aka center of Earth’s orbit).
Imitation of the Ancients:: Aka the vicarious hypothesis. A model for an eccentric circular orbit for Mars, with equant. Good for longitudes, but not distances or latitudes.
Earth’s Orbit:: An eccentric circle with an equant. Also, bisection of eccentricity: the center of the orbit is midway between the sun and the equant.
Speed Laws:: The inverse distance law, giving way to the area law.
The Ellipse:: First realization: the orbit is an oval. Then many false starts, crowned finally with success.

Voelkel reveals the main deviations between this outline and the actual history. (1) Kepler started investigating Earth’s orbit before his work on the vicarious hypothesis. (2) In general, the phases were more entangled. (3) Far from “including all the errors, detours, and the traps” (as Koestler said), Kepler exercised selection with a purpose. As he put it himself: “What success came of that labor [i.e., his investigations], it would be boring and pointless to recount. I shall describe only so much of that labor of four years as will pertain to our methodical enquiry.”

Prev TOC Next

Leave a comment

Filed under Astronomy, History

by Michael Weiss | December 27, 2025 · 2:48 pm

Set Theory Jottings 20. Consistency of GCH and AC: Overview

Prev TOC Next

In 1938 Gödel published “The Consistency of the Axiom of Choice and of the Generalized Continuum-Hypothesis”. This paper introduces the constructible universe, a so-called inner model of ZFC. This is a class L that satisfies the ZFC axioms, plus GCH, provided that V satisfies the ZF axioms. So if ZF is consistent, then so is ZF+AC+GCH.

The 1938 paper did not employ classes in the formal sense, as the context was ZF set theory. A couple of years later Gödel elaborated his proof in a monograph, introducing the NBG axioms and treating classes formally. We’ll stick with the ZF version.

Cohen’s book gives a more readable account of all this. I will quote from it from time to time.

The proof rests on five foundation stones:

First-order logic:: Concepts such as sentence, satisfaction, model, and definability all play key roles.
Constructible Sets:: Gödel’s universe of constructible sets is the centerpiece of the whole argument; it revolves around the notion of definability. The next post introduces the constructible universe.
Absoluteness:: Certain concepts of set theory are absolute, in that they do not depend on the surrounding model. Absoluteness is covered in a later post.
Reflection Principles:: At two pivotal points, Gödel made use of a version of the Löwenheim-Skolem Theorem. The theorem says that we can always find a “small” submodel that “reflects” certain aspects of a larger model.
Mostowski Collapsing Lemma:: At a key point in the argument, Gödel needed to remove “superfluous” sets from a model; he applied the Mostowski collapsing lemma. (Cohen calls it “the trivial result… concerning ∈-isomorphisms”.)

The following sections give the argument in broad strokes. For the logic background, I’ll refer to my notes Basics of First-order Logic (henceforth Logic Notes).

Prev TOC Next

Leave a comment

Filed under Uncategorized

by Michael Weiss | November 24, 2025 · 1:20 pm

Set Theory Jottings 19. GCH implies AC.

Prev TOC Next

Sierpiński’s Theorem: GCH implies AC

I had a look at the version of the proof in Cohen (§IV.12). Sierpiński was a clever fellow, and he came up with a few tricks that would be hard to motivate.

Here I will try to imagine how Sierpiński could have devised his proof. Cohen does offer one bit of intuition:

The GCH is a rather strong assertion about the existence of various maps since if we are ever given that A≤B≤P(A) then there must be a 1–1 map either from B onto A or from B onto P(A). Essentially this means that there are so many maps available that we can well-order every set.

Let A be the set we wish to well-order. Let’s write A≤B to mean there is an injection from A into B. GCH tells us that for any U,

A≤U≤P(A) implies U≡A or U≡𝒫(A)

If U is well-ordered, then U≡A and U≡𝒫(A) both imply that A can be well-ordered, the latter because A is naturally imbedded in 𝒫(A). But this is too simple an approach: A≤U already makes A well-ordered for a well-ordered U, so if we could show the antecedent we’d be done—we wouldn’t need GCH to finish the job.

Let’s not assume U is well-ordered, but instead suppose it contains a well-ordered set. Say we could show that

A≤W+A≤𝒫(A)

for a well-ordered set W, where ‘+’ stands for disjoint union. (That is, W×{0}∪A×{1}, or some similar trick to insure disjointness.) Then we’d have

W+A≡𝒫(A) or W+A≡A

Now, if W+A≡𝒫(A), then we ought to have W≡𝒫(A), just because A is “smaller” than 𝒫(A) (in some sense)—W should just absorb A, if W is “big enough”. Also, if W is “big enough” then that should exclude the other arm of the choice, where W+A≡A. And if W≡𝒫(A), then 𝒫(A) and so also A can be well-ordered, as we have seen.

At this point Hartog’s theorem shows up at the door. This gives us a well-ordered set W with

W≤𝒫⁴(A) and W≰A

So we have

A≤W+A ≤𝒫⁴(A)+A ?≡? 𝒫⁴(A)

(where ?≡? means that the equivalence needs to be proven). W≰A excludes W+A≡A, good. Let’s postpone the issue of the ‘?’. Deal first with the problem that the bounds are not tight enough for GCH to apply. We fix that by looking at:

𝒫³(A)≤W+𝒫³(A)≤𝒫⁴(A)+𝒫³(A) ?≡? 𝒫⁴(A)

So if W+𝒫³(A)≡𝒫⁴(A), we ought to have W≡𝒫⁴(A) and hence a well-ordering of A. What about the other case, W+𝒫³(A)≡𝒫³(A)? Ah, then we have

𝒫²(A)≤W+𝒫²(A)≤W+𝒫³(A)≡𝒫³(A)

and so we can repeat the argument: either W+𝒫²(A)≡𝒫³(A), which ought to make 𝒫³(A) well-ordered and hence also A well-ordered; or W+𝒫²(A)≡𝒫²(A), in which case we repeat the argument yet again. Eventually we work our way down to

A≤W+A≤W+𝒫(A)≡𝒫(A)

and W+A≡A is excluded since W≰ A, and we are done.

All this relies on the intuition that if W+M≡𝒫(M), then we should have W≡𝒫(M): we used this with M=𝒫ⁿ(A) for n=0,…,3. Well, we can prove something a little weaker.

Lemma: If W+M≡𝒫(M)×𝒫(M), then W≥𝒫(M).

Proof: Suppose h:W+M→𝒫(M)×𝒫(M) is a bijection. Restrict h to M and compose with the projection to the second factor: π₂⚬(h↾M):M→𝒫(M). Cantor’s diagonal argument shows that this map cannot be onto. (The fact that π₂⚬(h↾M) might not be 1–1 doesn’t affect the argument.) So for some s₀∈𝒫(M), we know that h(x) never takes the form (−,s₀) for x∈M. In other words, the image of h↾W must include all of 𝒫(M)×{s₀}. Therefore 𝒫(M)×{s₀} can be mapped 1–1 to a subset of W. qed.

The missing pieces of the proof now all take the form of absorption equations. We know that 𝒫(M)×𝒫(M)≡𝒫(M+M)—as an equation for cardinals, 2^𝔪2^𝔪=2^2𝔪. If we had 2𝔪=𝔪, that would take care of that problem. The ?≡? above also takes the form 2^𝔪+𝔪 ?=? 2^𝔪, for 𝔪 the cardinality of 𝒫³(A).

The general absorption laws for addition depend on AC. But we do have these suggestive equations even without AC:

𝔞+ω+1 = 𝔞+ω, 2^𝔪+1 = 2·2^𝔪

and so if 𝔞+ω=𝔪 and 2^𝔪=𝔟, then 2𝔟=𝔟. So let’s say we set B=𝒫(A+ω). Then we have 2B≡𝒫(A+ω+1)≡B (where 2B, of course, is the disjoint union of B with itself). Also B≤B+1≤2B≡B, so B≡B+1 and so 𝒫(B)≡2𝒫(B). So if we replace A with B, then all gaps in the argument are filled and we conclude that B can be well-ordered. But obviously A can be imbedded in B, so A also can be well-ordered. QED.

Ernst Specker proved a “local” version of Sierpiński’s Theorem: if 𝔪 and 2^𝔪 both satisfy CH, then 2^𝔪=ℵ(𝔪).

Prev TOC Next

Leave a comment

Filed under Set Theory

by Michael Weiss | November 20, 2025 · 4:26 pm

Set Theory Jottings 18. The Axiom of Determinacy

Prev TOC Next

Just denying the axiom of choice doesn’t buy you much. If you’re going to throw away AC, you should add some powerful incompatible axiom in its place. The Axiom of Determinacy (AD) has been studied in this light.

Here’s one formulation. Let S be ℕ^ℕ, i.e., the set of all infinite strings of natural numbers. Let G⊆S. Alice and Bob play a game where at step 2n, Alice chooses a number s_2n, and at step 2n+1, Bob chooses a number s_2n+1. If s₀s₁s₂…∈G, Alice wins, otherwise Bob wins. We say elements of G are assigned to Alice, and elements not in G are assigned to Bob. We’ll call the infinite strings results (of the game). Rather than think of G as a set of results, think of it as a function G:S→{Alice,Bob}.

A strategy for Alice tells her how to play each move. Formally, it’s a function from the set of all number strings of finite even length to ℕ. Likewise, a strategy for Bob maps number strings of finite odd length to numbers. A game is determined if Alice or Bob has a winning strategy, i.e., if the player follows the strategy then that player will win. The Axiom of Determinacy says that each game is determined.

Interesting thing about the proof that AC → ¬AD: it’s much easier using the well-ordering theorem instead of Zorn’s lemma.

First note that there are c=ℵ₀^ℵ₀ strategies (lumping together both Alice and Bob strategies), likewise c results. Assuming AC, well-order the strategies {S_α:α<ω_c}. Here ω_c is the least ordinal with cardinality c, so the set {α:α<κ} has cardinality less than c for each κ<ω_c.

We construct a game G by inducting transfinitely through all the strategies, at step κ considering S_κ. Our goal is to assign some result to Alice or Bob that prevents S_κ from being a winning strategy. Say S_κ is an Alice strategy. Since we assign only one result at each step, fewer than c results have been assigned before step κ. However, there are c possible results if Alice follows S_κ, since Bob can play his numbers however he wants. So there exists a result where Alice follows S_κ but this result has not yet been assigned to either player. Assign it to Bob; this thwarts S_κ. If S_κ is a Bob strategy, just switch everything around. QED

The cardinality argument at the heart of this proof is harder to pull off with Zorn’s lemma (though possible, of course). (The exact same argument works with bit strings instead of strings of natural numbers, but for some reason AD is generally stated using ℕ^ℕ instead of 𝒫(ℕ).)

Prev TOC Next

Leave a comment

Filed under Set Theory

by Michael Weiss | November 18, 2025 · 8:42 am

From Kepler to Ptolemy 22

Prev TOC Next

Libration Force

The Libration Force

Kepler coined the term “libration” for the oscillation of a planet’s distance from the Sun, approaching and receding.

He analyzed the libration for the eccentric-equant model, and found it unexpectedly complicated. Stephenson (p.78):

Many absurdities were involved in supposing that a planet could move, … non-uniformly, about the vacant center of the eccentric, with no guide except the apparent magnitude of the solar disk. Such complicated hypotheses, although designed to yield a perfectly simple eccentric circular path, were not physically credible…

Notice the remarkable thing that Kepler was doing here. He was analyzing motion on an eccentric circle, a model that had been in general use for nearly two millenia, apparently the simplest possible model with any empirical accuracy. He took apart this beautifully simple model and showed that as a physical process (and in the absence of solid spheres) it was really quite complicated, so complicated as to raise doubt about whether it could be real. He had performed so radical a reassessment by interpreting astronomy, for the first time, as a physical science.

Eventually Kepler achieved the elliptical orbit. Seeking a physical explanation, he hit on a magnetic force to produce the libration:

What if all the bodies of the planets are enormous round magnets? Of the earth (one of the planets, for Copernicus) there is no doubt. William Gilbert has proved it.

But to describe this power more plainly, the planet’s globe has two poles, of which one seeks out the sun, and the other flees the sun. So let us imagine an axis of this sort, using a magnetic strip, and let its point seek the sun. But despite its sun-seeking magnetic nature, let it remain ever parallel to itself in the translational motion of the globe…

—Astronomia nova, Chapter 57.

The figure at the top of this post (taken from the Epitome of Copernican Astronomy) shows how it works. (The figure in the Astronomia nova has extra clutter.) Kepler explains:

[When] the strip is at A and E, there is no reason why the planet should approach or recede, since it holds its ends at equal distance from the sun, and would undoubtedly turn its point towards the sun if it were allowed to do so by the force that holds its axis straight and parallel. When the planet moves [counterclockwise] away from A, the point approaches the sun perceptibly, and the tail end recedes. Therefore, the globe begins perceptibly to navigate towards the sun. After E, the tail end perceptibly approaches and the head end recedes from the sun. Therefore, by a natural aversion, the whole globe perceptibly flees the sun…

—Astronomia nova, Chapter 57. [I have changed the letters from C and F to A and E to match the diagram from the Epitome.]

Implicit: the magnetic force weakens with distance, so when the head
is closer to the Sun than the tail, the net force is attractive. And vice versa.

Kepler argued that this scheme gave the force a sinusoidal dependence on the longitude, and showed that this agreed with the libration for an elliptical orbit. Some aspects of this demonstration needed special pleading. Stephenson details the strong and the weak points of the reasoning (pp.110–117).

But: “The theory had one glaring flaw, however. The magnetic axis of the planet had to maintain a constant direction, perpendicular to the apsidal line.” (Stephenson, p.117.) The Earth’s rotational axis doesn’t come close to meeting this requirement. So why should we believe it holds for Mars? Kepler acknowledged the problem:

I will be satisfied if this magnetic example demonstrates the general possibility of the proposed mechanism. Concerning the details, however, I have my doubts. For when the earth is in question, it is certain that its axis, whose constant and parallel direction brings about the year’s seasons at the cardinal points, is not well suited to bringing about this reciprocation… And if this axis is unsuitable, it seems there is none suitable in the earth’s entire body, since there is no part of it that rests in one position while the whole body of the globe revolves in a ceaseless daily whirl about that axis.

As one possible out, Kepler appealed to a planetary mind.

Besides the radial libration, planets have a libration in latitude. This enmeshed the theory in further difficulties. Ever inventive, Kepler devised ad hockery around all these rough spots. But we have a contrast: we can trace a direct path from the whirlpool force to the area law. This cannot be said for Kepler’s libration theory. Kepler’s whirlpool speculations came years before the area law. The libration force came after the elliptical orbit.

There is a reason for this. You can justify the whirlpool force (more or less) using the conservation of angular momentum. Kepler’s libration force has no counterpart in Newtonian physics.

Prev TOC Next

Leave a comment

Filed under Astronomy, History

	Michael Weiss on Aristotle and Falling Obj…
	hypnosifl on Aristotle and Falling Obj…
	Michael Weiss on Set Theory Jottings 16. Axioms…
	Bruce Smith on Set Theory Jottings 16. Axioms…
	Michael Weiss on Set Theory Jottings 12. Zermel…

Author Archives: Michael Weiss

Set Theory Jottings 25. Mostowski Collapsing Lemma

Set Theory Jottings 24. Reflection Principles

Set Theory Jottings 23. Absoluteness of Constructibility

Set Theory Jottings 22. Absoluteness

Set Theory Jottings 21. The Constructible Universe

From Kepler to Ptolemy 23

Set Theory Jottings 20. Consistency of GCH and AC: Overview

Set Theory Jottings 19. GCH implies AC.

Set Theory Jottings 18. The Axiom of Determinacy

From Kepler to Ptolemy 22

Recent Posts

Recent Comments

Archives

Categories

Meta