Set Theory Jottings 7. The (Cantor-Dedekind-Schröder)-Bernstein Theorem

Prev TOC Next

The trichotomy of cardinals says that for any 𝔪 and 𝔫, exactly one of these holds: 𝔪<𝔫, 𝔪=𝔫, or 𝔪>𝔫. It’s equivalent to the conjunction of these two propositions, for any two cardinals 𝔪 and 𝔫:

Antisymmetry:: If 𝔪≤𝔫 and 𝔫≤𝔪, then 𝔪=𝔫.
Comparability:: Either 𝔪≤𝔫 or 𝔫≤𝔪.

Cantor showed the trichotomy of ordinals via transfinite induction. I outlined the proof at the end of post 5. If we assume that every set can be well-ordered, then trichotomy for cardinals follows immediately. For Cantor, this well-ordering assumption was a “law of thought”.

We can eliminate the notion of “cardinal number” from the statement of trichotomy. Write X≤Y if there is an injection from X to Y, and X≡Y if there is a bijection. Antisymmetry says that if X≤Y and Y≤X, then X≡Y. Comparability says that either X≤Y or Y≤X.

In 1915, Hartogs showed that trichotomy implies the well-ordering theorem and hence the axiom of choice. But antisymmetry is provable without using Choice. Once upon a time, people called this the Schröder-Bernstein theorem: in 1898 proofs appeared by Schröder and Bernstein (independently). Alas, Schröder’s proof suffered a fatal flaw. A proof by Dedekind turned up in his posthumous papers; he had also sent it to Cantor, who apparently didn’t notice it. Zermelo rediscovered Dedekind’s proof. So: Schröder-Bernstein? Cantor-Bernstein? Dedekind-Bernstein? Cantor-Dedekind-Berstein? Just Bernstein? Take your pick. I will call it the Equivalence Theorem.

Schröder’s “proof” is elegant, at least. Say f:X→Y and g:Y→X are the injections. Let X₀=X and Y₀=Y, and inductively define Y_n+1=f(X_n) and X_n+1=g(Y_n). Then

X₀ ⊇ X₁ ⊇ X₂…
X₀≡ X₂≡ X₄…
X₁≡ X₃≡ X₅…

So ⋂_nX_2n=⋂_nX_2n+1. Schröder now asserts that the cardinality of ⋂_nX_2n is the same as the common cardinality of all the X_2n, and likewise for ⋂_nX_2n+1. So X₀≡X₁≡Y₀. But his assertion is false: we can have ⋂_nX_2n=⋂_nX_2n+1=∅, even with X and Y infinite. For example, let X=Y=ℕ, and f(n)=g(n)=n+1.

Dedekind’s proof uses an equivalent formulation. Since Y is equivalent to a subset of X, we might as well assume that Y actually is this subset. Then the injection f:X→Y is a map from X to itself. We have X ⊇ Y ⊇ f(X). We need to show X≡Y.

Let P=X∖Y and Q=Y∖f(X). Then

X=P ⊔ Q ⊔ f(P) ⊔ f(Q) ⊔ f(f(P)) ⊔ f(f(Q))…
Y= Q ⊔ f(P) ⊔ f(Q) ⊔ f(f(P)) ⊔ f(f(Q))…

Define h by h(x)=f(x) for x∈P ⊔ f(P) ⊔ f(f(P))…, and h(x)=x for x∈Q ⊔ f(Q) ⊔ f(f(Q))…. Verify that h is a bijection between X and Y. QED

An elegant proof, due to Julius König, often appears in textbooks. It’s longer than Dedekind’s proof; I will drag it out even more by offering some motivation.

In the figure above, sets X and Y are indicated with the injections f:X→Y (blue) and g:Y→X (red). We want to construct a 1–1 pairing between all of X and Y out of these two partial pairing. Each edge, red or blue, is a possible pairing. So a node like b has two possible partners: 1 or 3. However, a node like a has only one possible partner, namely 2. Likewise, 1’s only possible partner is b.

That means we have to start by pairing off nodes without preimages: nodes x∈X for which g⁻¹(x) doesn’t exist, and nodes y∈Y for which f⁻¹(y) doesn’t exist. Let’s call this the first round of pairings.

Once we’ve paired off a2 and 1b, it then turns out that c and 3 each have only one possible choice (second round). And so on. In the figure, this is enough to complete the bijection. (Thickened edges indicate the pairings actually made.)

If you think about it, this procedure amounts to following the f and g edges backwards until you can’t go any further. Call this the backwards chain or just the chain. Suppose it starts with x=x₁∈X and terminates in Y, say x₁,y₁,x₂,y₂…,x_n,y_n. (So g⁻¹(x₁)=y₁, f⁻¹(y₁)=x₂, etc., and f⁻¹(y_n) doesn’t exist.) In round 1, we must pair y_n with x_n=g(y_n). In round 2, we must pair y_n−1 with x_n−1=g(y_n−1). Eventually we must pair y₁ with x₁=g(y₁). In other words, we pair x₁ with y₁=g⁻¹(x₁). We are forced to use g⁻¹ for each pair x_iy_i.

On the other hand, if the chain starts and ends in X—say, x=x₀,y₁,x₁, …,y_n,x_n, where g⁻¹(x_n) doesn’t exist—then we must pair x_n with y_n=f(x_n), and then must pair x_n−1 with y_n−1=f(x_n−1), eventually pairing x₁ with y₁=f(x₁) and x₀ with f(x₀).

If the chain never terminates, then we have a choice. We can pair x with f(x) or with g⁻¹(x). To make a definite rule, let’s always use g⁻¹(x).

Summarizing, we have this definition of the bijection h:

h(x) = f(x) if the chain terminates in X
h(x) = g⁻¹(x) if the chain doesn’t terminate
h(x) = g⁻¹(x) if the chain terminates in Y

We know that g⁻¹(x) exists in the last two cases, for if it didn’t, then the chain would terminate in X.

So we’ve partitioned X into three parts, X = X₁ ⊔ X₂ ⊔ X₃:

X₁={x∈X : the chain terminates in X}
X₂={x∈X : the chain doesn’t terminate}
X₃={x∈X : the chain terminates in Y}

and f is used on X₁ while g⁻¹ is used on X₂ and X₃. Now we have to show that h is a bijection.

First note that we get the chain for f(x) by prepending f(x) to the chain for x. The termination behavior is unaffected by this prepending, so f(X_i) is contained in Y_i (i=1,2,3) where

Y₁={y∈Y : the chain terminates in X}
Y₂={y∈Y : the chain doesn’t terminate}
Y₃={y∈Y : the chain terminates in Y}

Likewise, we get the chain for g(y) by prepending g(y) to the chain for y. So g(Y_i) is contained in X_i (i=1,2,3). So both f and g “stay in their lanes”: a blue f-edge cannot cross from X_i to Y_j with i≠j, and likewise for the red g-edges. So the same is true for f⁻¹ and g⁻¹, wherever they are defined.

Now, f is injective, and g⁻¹ is injective on its domain (i.e., on g(Y)), and “the streams can’t cross”: if x∈X₁ and x′∈X₂⊔X₃, then we can’t have h(x)=h(x′) because h(x)=f(x) is in Y₁ and h(x′)=g⁻¹(x′) is in Y₂⊔Y₃.

Note that f maps X₁ onto Y₁, because if f⁻¹(y) doesn’t exist, then y belongs to Y₃. Also g⁻¹ maps X₂⊔X₃ onto Y₂⊔Y₃: If y∈Y₂⊔Y₃, then g(y) belongs to X₂⊔X₃ and g⁻¹ maps it to y. QED

There is a version of the Equivalence Theorem for linearly ordered sets, due to Lindenbaum. Note that an order-preserving map f:X→Y of ordered sets is automatically injective: if x≠x′, then either x<x′ or x′<x, so either f(x)<f(x′) or f(x′)<f(x) and so f(x)≠f(x′).

We can have order-preserving injections f:X→Y and g:Y→X without any order-isomorphism between X and Y. Simple example: X is the closed interval [0,1] and Y is the open interval (0,1) (in either ℚ or ℝ). But we have the following:

Theorem (Lindenbaum): If f:X→Y is an order-preserving map onto an initial segment of Y, and g:Y→X is an order-preserving map onto a final segment of X, then there is an order-isomorphism between X and Y.

Proof: We will show that the h constructed above works. We already know it’s a bijection, so we just have to show that it’s order-preserving.

Since f and g are order-preserving, we are left with this to prove: if x∈X₁ and x′∈X₂⊔ X₃, then x<x′ and f(x)<g⁻¹(x′).

Observe that all elements of f(X) precede all elements of the complement Y∖f(X), since f maps onto an initial segment of Y. Likewise all elements of g(Y) follow all elements of X∖g(Y).

Now, if x belongs X∖g(Y), then the chain for x terminates in X, so x∈X₁. For any x′∈X₂⊔X₃ the preimage g⁻¹(x′) must exist, since its chain doesn’t terminate in X. So x′∈g(Y) and we have x<x′ in this special case.The figure above illustrates how to extend this to the general case of any x∈X₁. It shows x=a∈X₁ and x′=c∈X₃. The terminal nodes in the chain, b and 4, are circled. Following the chain two steps gives us b∈X∖g(Y) and d∈X₃. By the previous paragraph, b<d. But f and g are order-preserving, so g(f(b))<g(f(d)). That is, a<c.

What if we had c∈X₂ instead of X₃? The argument is not affected. What if, traveling along the two chains in sync, the chain for x′ terminates before the chain for x? That would happen, for example, if 2 in the figure was a terminal node, without a preimage f⁻¹(2). But then 1<2 because 1 belongs to f(X) and 2 belongs to Y∖f(X). So g(1)=a precedes g(2)=c.

Clearly this reasoning holds for any x∈X₁ and x′∈X₂⊔X₃.

It remains to show that h(x)=f(x) precedes h(x′)=g⁻¹(x′). Once again we can follow the two chains in sync until we hit a terminal node for one of them. Because f and g are order-preserving, we are reduced to proving two things:

If x∈X∖g(Y) and x′∈X₂⊔X₃, then f(x)<g⁻¹(x′).
If y∈Y₁ and y′∈Y∖f(X), then y<y′.

In both cases, we can “ascend the ladder”. That is, each bullet says that an f-edge has its Y-vertex above the Y-vertex of a g⁻¹-edge. In the figure, the f-edges slope upwards and are blue, while the g⁻¹-edges slope downward and are red. Applying f⚬g to the Y-vertex and g⚬f to the X-vertex moves us up “one rung”. So the general case, where x is any element of X₁ and x′ is any element of X₂⊔X₃, follows from these two bullets.

So let’s prove the first bullet. The argument (a proof by contradiction) is a bit tedious. This figure will make it easier to follow:

Suppose g⁻¹(x′)<f(x). Since f maps X onto an initial segment of Y, this means that g⁻¹(x′) also belongs to f(X). Say g⁻¹(x′)=f(x″). Since f is order-preserving, and f(x″)<f(x), we have x″<x. Now, x′∈X₂⊔X₃ and x″ belongs to the x′-chain, so x″∈X₂⊔X₃ and is therefore in g(Y). But g maps Y onto a final segment of X, and x″<x, so x must also belong to g(Y). This contradicts the assumption that x∈X∖g(Y).

The second bullet is a breeze. As noted already, all elements of f(X) precede all elements of Y∖f(X). We’ve also seen that all elements of Y₁ belong to f(X). QED

Rosenstein (pp.22–23) provides another proof, patterned after Dedekind’s proof of the Equivalence Theorem.

In recursive function theory, the Myhill Isomorphism Theorem is analogous to the Equivalence Theorem.

Prev TOC Next

	Michael Weiss on Set Theory Jottings 16. Axioms…
	Bruce Smith on Set Theory Jottings 16. Axioms…
	Michael Weiss on Set Theory Jottings 12. Zermel…
	nicks808 on Set Theory Jottings 12. Zermel…
	Michael Weiss on From Kepler to Ptolemy 20

Set Theory Jottings 7. The (Cantor-Dedekind-Schröder)-Bernstein Theorem

Leave a comment Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Meta

Set Theory Jottings 7. The (Cantor-Dedekind-Schröder)-Bernstein Theorem

Share this:

Related

Leave a comment Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Meta