Bruce Smith joins the conversation, returning to a previous topic: the Paris-Harrington theorem. (Discussion of the Enayat paper will resume soon.)
BS: Post 8 and post 9 discussed the Paris-Harrington theorem, or PHT. I have some questions about “how that proof really works”. But my main motivation is more general—what are the various ways in which people know how to construct nonstandard models of PA, with various properties of interest? And what other kinds of statements might someday be proved unprovable, using those methods?
I’m especially interested in anything analogous to “forcing” (which Paul Cohen invented to prove CH unprovable in ZF). The PHT’s method involves finding a submodel, so it’s not directly analogous to forcing—but we have to start somewhere, and its proof certainly seems interesting and educational. So we can reasonably limit the scope of this post to “how the Paris-Harrington Theorem really works”.
MW: Sounds good. I glossed over several issues in post 9, so I’m happy to revisit it. But I’m glad you give me an opportunity to plug my Logic and Smullyan notes. Section 11 of the logic notes discusses forcing in its simplest context, PA and recursion theory. Section 21 of the Smullyan notes deals with forcing for set theory, and covers Cohen’s classic results.
All that said, the proof of the Paris-Harrington Theorem isn’t related to forcing, as far as I can tell. (Although you never know.) It comes out of a different line of thought in logic, the method of indiscernibles.
The book by Kossak & Schmerl has a chapter on forcing methods applied to models of PA, but I haven’t read it yet.
Anyway, ask away! Maybe you should start with a short recap.
BS: OK—here is how I understand it so far. (And, thanks for the references!)
The PHT states that a certain combinatorial claim, known as the Paris-Harrington Principle (PHP), is not provable in PA (if PA is consistent, which we’ll hereafter assume without stating).
Roughly, its proof goes like this: if the PHP holds in a nonstandard model N of PA, it can be used (partially from “within N”) to construct a submodel M of N, where M is also a model of PA, but one in which the PHP doesn’t hold! But if the PHP was provable in PA, it would hold in all models, including all nonstandard models (of which we know there is at least one, since PA is consistent and has independent sentences). QED!
(That argument does not rule out that the PHP is refutable in PA, thus holding in no models of PA — but that can be disproven in other ways, since ZF can prove the PHP holds in the standard model of PA.)
The proof of PHT involves using the PHP (from “inside N”?) to find a set B of “indiscernibles” in N, and then (I think from “outside N”) defining the subset M of N as their “downward closure”. Then from “outside N”, it proves M is a model of PA, and that the PHP doesn’t hold in M.
MW: So far so good. Just one thing: finding the set B of indiscernibles uses a mix of “inside N” and “outside N” reasoning, typical of the overspill principle. But we’ll get to that. Carry on!
BS: Ok—that is exactly the sort of subtlety that I hope to understand better!
To continue—part of how that works (proving that M is a model of PA) is that, in N, there is a “finite” c, which we (outside N) understand is actually larger than every standard number. In particular that means we can have a “finite” set of formulas—finite according to N, that is—which is really infinite. Somehow you can use the PHP (which applies to finite sets), from “inside N”, to tell you something about that infinite set of formulas.
MW: That’s right. The thing that it tells you is the so-called diagonal indiscernibility with respect to that set of formulas. We can get into the details later. By the way, the set of formulas does not include all the PA axioms—the argument is more subtle than that.
BS: In general, it seems this theorem carries out important reasoning both inside and outside N—untangling this is part of the question of how it works. For example, I think there is no way N can agree with us that “my subset M is a model of PA”, since that would mean it had a proof (that is, had an element which it could verify was the Gödel number of a proof) that PA has a model, and therefore is consistent. In fact, I am guessing the sets B and M can’t even be defined from inside N.
MW: Right! So far as N is concerned, N is an initial segment of every model of PA. Just like ℕ “really” is. (The scare-quotes because of all the semi-philosophical issues churning around that adverb.)
I wouldn’t emphasize the proof aspect, though. N can “believe” things—in other words, it can satisfy things—without necessarily having an “N-proof” of them. In much the same way, ℕ satisfies Con(PA), even though there is no proof of Con(PA) inside PA.
BS: Let me unpack that, and you can tell me if I have it right. Con(PA) means “there is no PA-proof of 0=1”, so “ℕ satisfies Con(PA)” means “there is no ℕ-element which encodes (as computed in ℕ) a PA-proof of 0=1”. This is stronger than “ℕ satisfies PA”, which just says “each axiom of PA holds in ℕ”.
MW: Exactly right. You’re also right about B: that can’t be defined inside N. To state it a little more precisely, there is no formula β(x) in L(PA) (the language of Peano Arithmetic) such that b∈B if and only if N⊧β(b). If there were such a formula, then we’d have a formula for the downward closure: μ(x) ≡ ∃y(β(y) ∧ x≤y).
BS: Thanks—that makes things much clearer. I generally read “N ⊧ φ” as “N believes φ”—I think I have seen this referred to as both “N believes φ” and “N thinks φ” in papers, as well as “N satisfies φ” and perhaps “N models φ”—but then I sometimes get confused and think “N believes φ” is saying “N has a proof of φ”. But (as we just discussed) that is a much stronger statement. “N ⊢ φ”, i.e. “N proves φ”, means “N has an element which it believes is the Gödel number of a proof of φ” (using some theory that is clear from the context). Maybe it would reduce my confusion to stick to reading “⊧” as “satisfies”.
MW: “Satisfies” is the usual textbook term. People use “believes” and “thinks” to make everything more anthropomorphic. That helps them—helps me—think about the math. I also like to talk about things that are “invisible when you’re wearing N glasses”, like the set B. The N people can see the individual elements of B, but not the set B as a whole. Of course, this is all just blog-speak for the stuffier (but perhaps clearer) language of the textbooks.
(My advisor once suggested that the entire mathematical lexicon exists only as an aid for our weak minds. Super-mathematicians would just have a single list, Definition 1, Definition 2, …. Definition 304726, … and likewise for all mathematical theorems.)
I wouldn’t write “N ⊢ φ”. N is a model. Models satisfy statements, theories prove them. So we can write “PA ⊢ φ”. Now, if you want to say, “In the model N, there is a proof, perhaps of nonstandard length, of φ from the axioms of PA”, you could write “N ⊧ (PA ⊢ φ)”. Here “(PA ⊢ φ)” is what I call ‘vernacular’: not actually an expression in the formal language, but intended as a shorthand for a formal expression. Or I might put PA ⊢ φ in quotes, to indicate it’s vernacular.
BS: Ok, I will try to follow those rules—that does clarify things.
By the way, I just noticed that you wrote N⊧β(b) even though b might be nonstandard. Does that mean we are talking about N satisfying a nonstandard formula?
MW: No—in that formula, b is a name, so β(b) is a standard formula even if the model element named by b is nonstandard.
BS: What are “names”?
MW: New constants. Suppose we have a structure A for a language L. So L has relation symbols, maybe also function symbols and constants. A has corresponding relations, functions, and elements. We don’t necessarily have a constant for every element of A. A standard trick in model theory is to expand L by adding a new constant for every element of A. Many people (including me) use LA to denote the expanded language, and call the new constants names. (I say a little bit more about them in this Topics post.)
BS: Thanks—these details do help!
Back to the PHT—my biggest question was about how we “really” prove M is a model of PA, since the stated method in Post 9 sounded like it was trying to do more than ought to be needed—it seemed to want M to be “similar to N” in some sense.
MW: Good place to start. Are you familiar with the concepts of elementary equivalence and elementary submodel?
Let me give a précis anyway. Suppose you have a first-order language L and two structures A and B for this language. A and B are elementarily equivalent if they satisfy exactly the same first-order sentences: A ⊧ φ iff B ⊧ φ for any closed formula φ of L. If A is a substructure of B, then A is an elementary substructure of B if for any first order formula in L and any
in A, then
iff
. (If A and B are both models of a theory T in the language L, then we say ‘elementary submodel’ instead.) By the way, I use the notation
as shorthand for a1,…,an. The ai’s are names.
‘Elementary substructure’ is stronger than ‘elementary equivalence’, since it allows for names of elements of A in the formulas.
BS: So do I understand correctly that ‘elementary’ basically means ‘every formula is absolute’ (when comparing the two structures involved)?
MW: Yes, that’s right. Absolute between the two structures. For elementary equivalence, this absoluteness holds for all closed formulas of L. For elementary substructure, for all closed formulas of the expanded language LA.
BS: It feels like this post is all “preliminary” so far—though it is certainly a necessary and helpful discussion (for me anyway).
MW: Well, I think we’ve gotten a little ways in. We have the setup: N is a nonstandard model of PA, B is a subset of “diagonal indiscernibles” of N (whatever those are), M is the downward closure of B. We know that N satisfies the PHP, by hypothesis. We know the goal of the proof: to show that M is a model of PA, and that M does not satisfy the PHP.
I brought up elementary equivalence for the following reason. If we could show that M was elementarily equivalent to N, then it would follow immediately that M was a model of PA. That’s how I thought it went for a moment, the first time I read the proof. But that can’t be right, because PHP holds in N and not in M!
I think you were getting at this when you wrote, “it seemed to want M to be ‘similar to N’ in some sense”. That is an aspect of the proof: the authors demonstrate a “transfer principle” that transfers certain statements back and forth between M and N. This transfer principle is then used to show that M is a model of PA. But the principle is more subtle than elementary equivalence, or elementary substructure.
BS: Ok—I am eager to see where all this goes, in the next post!