Thursday, September 23, 2010

The variant record update problem

Now, for something completely different...

I've been helping out with the design and implementation of a languge called C0 that is being used in a new course at Carnegie Mellon, 15-122 Principles of Imperative Computation. I'm giving my own point of view here, but I'd say that the goals of this language were four-fold:
  • Programs should be memory safe (a pointer either contians NULL or the address of valid, allocated memory, and you can't read off the end of an array).
  • The language should have no unspecified behavior. This is related to the point above ("oh, you did something undefined here so it's not my fault this pointer got decremented by 17 afterwards") but is still a distinct point - OCaml can be memory safe without specifying the order of evaluation of function arguments, for instance.
  • Programs (at least the family of programs required to do homework tasks for 15-122) should look and act like C programs. (Featherweight Java programs act, but do not look, like Java programs.) And not "more or less" the same as C - GCC should basically be a viable way of compiling C0 programs if they don't rely on behavior that is specified in C0 but unspecified in C.
  • The language should be amenable to garbage collection.
The language design process is primarily the work of Rob Arnold (who just presented his master's thesis yesterday) and his advisor, Frank Pfenning (also my advisor), who is teaching 15-122.

After Rob Arnold's defense, he, William Lovas, and I got into a discussion about adding unions in a safe way to the C0 language. In the process we had to re-derive the fundamental problem with having remotely C-like unions in memory-safe programming languages. I am 90% certain that this is what the literature occasionally alludes to as "the variant record update problem." This is me writing that down so I won't have to derive it for myself yet again.

A piece of "reasonable" C, using unions

It is often the clearest if we think of all data in C as simply a bundle of bits; unions in C seem mostly like a way of providing different "views" for reading and writing into a bundle of bits. This is pretty well-illustrated by the following example, which we can then use to explain a very old problem with trying to tame this kind of union and treat it as a sum type or tagged unions in a memory-safe way.
  1.  typedef struct pair { 
2. unsigned char fst;
3. char snd;
4. } pair;
5. typedef union {
6. unsigned short inj1;
7. pair inj2;
8. } foo;
9. int f(foo* foo) {
10. foo->inj1 = 1536;
11. return 7;
12. }
13. int main() {
14. foo foo;
15. foo.inj1 = 1535;
16. // foo = Inj1(1535)
17.
18. foo.inj2.fst = 9;
19. foo.inj2.snd = 24;
20. // foo = Inj2(9, 24)
21.
22. foo.inj2.fst = f(&foo);
23. // WTF? Is foo Inj1 or Inj2?
24.
25. return 1;
26. }
So, in C, we could pretty reasonably predict what should happen here given three pieces of information:
  1. The size of a char and a short. (A char is 1 byte and a short is 2 bytes, in this case.)
  2. The way C lays out structs in memory. (This is actually reasonably well-defined.)
  3. The "endian-ness" of the machine we're working on. (I just worked it out from observerd results, myself: I can never remember which is which.)
In particular, this is what the 16 bits of stack storage allocated to foo looks like at various points in this program:
                       01234567 89ABCDEF
| fst | snd |
At line 16 (in main) 11111111 10100000
At line 20 (in main) 10010000 00011000
At line 11 (in f) 00000000 01100000
At line 23 (in main) 11100000 01100000
The assignment in f() happened before the crazy assignment was complete, writing a valid 16-bit short into the 2 bytes available to it. Then, the crazy assignment completed after f() returned, causing the first byte of that 2-byte short to get overwritten with the integer value 7 that was returned from f.

A non-starter

The basic idea that everyone has when trying to turn an unsafe language with untagged unions into a tagged language with tagged unions is "all we need to do is add tags and check them at the correct time." This example shows that there is a more fundamental problem, because there are a lot of different arguments for what the code above should even do in a safe programming language. The C semantics don't work: if foo.inj1 and foo.inj2.snd were pointers, then at line 23 foo can not safely be tagged as inj1 (part of its pointer got overwritten by the number 7) or as inj2 (because the second component is a pointer that got partially overwritten by a pointer in f().

We came up with four alternatives for what could mean, and none of us could agree on any one option; this is, of course, a sure sign that none of them make much sense.
  1. Check, throw an error ("the safe option"). When we make an assignment into a union tagged with inj2, we're asserting that when the right-hand side finishes evaluating to a value, that tag will still be inj2. We have to check, after evaluating the right-hand-side to a value, that this property was maintained - and it's a runtime error if that isn't the case, kind of like running off the end of an array. Technically, this is a bit difficult, but it's certainly possible.
  2. Re-initialize memory. C0 has a notion of default values; whenever we allocate heap space for a new integer, that integer is 0. We'd actually want, after the assignment on line 18, to initialize the value of foo.inj2.snd to a default value (again, if it's a pointer we'd want to be sure it's NULL rather than some arbitrary string of bits that was there before). So, when we're ready to finally write a value, we check its tag - if the tag is wrong, we re-initialize it with the right tag and then perform the assignment; this is what happened on line 18. On line 22, it means that, because f() set the tag of foo to inj1 and we're making an assignment into inj2, we re-initialize the union as an inj2 and then perform the assignment. At line 23, foo has the tag inj2, foo.inj2.fst == 7, and foo.inj2.snd == 0.
  3. Duplicate the entire union's memory block. Rather than after-the-fact tag checking, another option is to observe that we are reading into a union/struct, copy the entire union/struct into some temporary storage after evaluating the left-hand side, make the assignment into the temporarily storage, and then write the whole of the temporary storage back to its original location. In this case, at line 23 foo has the tag inj2, foo.inj2.fst == 7, and foo.inj2.snd == 24, the value it was set to before the assignment.
  4. Secretly allocate memory. If we secretly say that a union is a pointer to some allocated memory, then when we change the tag we have to allocate some new memory. In this case, when f() changes the tag of foo, some new memory is allocated, tagged inj1, and the integer value 1536 is written there. foo is now a tag (inj1) and a pointer to the newly allocated memory, but the old memory (which foo pointed to before the assignment) is still in existence, and so we can safely write the value 7 to the memory location we came up with when we evaluated the left hand side of line 22. When we get to line 23, then, the memory where we just wrote the 7 has been orphaned, foo has a tag inj1, and contains the short integer value 1536. Note that this sort of trickery only makes any sense if we have a garbage-collected programming language; however, C0 is a garbage collected language. However, it's also the only one of these four options that preserves the metaphor that a LHS evaluates to a memory location, a RHS evaluates to a value, and then the assignment stores the RHS value in the LHS memory location without further ado.
I'd be quite interested if anybody had information about real imperative programming languages that made any one of the four choices above, or some other choices.

What's actually going on here?

The real problem here is that C's notation for unions is deeply broken from a "mathematical perspective" if we want to think of unions as sum types. A sum type is not something you "project the inj1 field from" like a struct, it's something that you inject into the inj1th branch of. So a much better notation than foo.inj1 = 4 is foo = inj1(4). We would think of this use of inj1 as somewhat equivalent to a constructor in ML, which leads us to a better understanding of the problem - when injecting into the inj2 branch, it's clear from the constructor notation that we need assign all the constituent values at the same time: foo = inj2(1,2). If this was the only way we could write to inj2, it would remove the difference between options 2 and 3 above - we would have to be clear in advance what we expected foo.inj2.snd to be, that's what it means to write into a branch.

Something like this solution is what was adopted in Ada, if we can believe the following caveat in the Ada Wikibook:
When "mutating" a record, you must assign all components of the variant structure which you are mutating at once, replacing the record with a complete variant structure.
This really seems like it would be the only remotely correct way of dealing with the problem.

Conclusion

Should unions be added to C0, and how? Well, I don't actually have too much of a dog in that fight, and even if I did it's not my decision. Because we don't have "struct literal" values, we're unable to write something like foo.inj2 = { fst = 1, snd = 2 }, so the mostly-reasonable Ada solution is unavailable. A first step might be to disallow structs inside of unions, which means that every union can only ever contain one thing. This prevents the awkward question of "what is this other part of the union after the assignment" from ever arising as an issue - which is really the distinction between options 2 and 3 above.

The restriction of union members to "small values only" (the C0 way of saying "no structs") is limiting from the perspective of C code that one might imagine wanting to write (and is therefore problematic for the goal that "good C0 programs look like good C programs,") but it seems preferable to any of the four options above. In particular, if whenever you wanted to have a struct in a union you instead had a struct pointer, that would basically amount to implementing Option 4 in a non-secretive way.

Monday, September 20, 2010

Natural deduction and sequent calculus - united in a polarized linear framework

In the last post I talked a little bit about what it means to give atomic propositions in a logical framework polarity. The main point I wanted to get around to was that if we have a proposition in the signature of the form P1+ ⊃ P2+ ⊃ P3+, then all the information contained in that rule is captured by this synthetic inference rule:
  Γ, P1+, P2+, P3+ ⊢ Q
-------------------
Γ, P1+, P2+ ⊢ Q
On the other hand, if I have a proposition in the signature of the form P1- ⊃ P2- ⊃ P3-, then all the information contained in that rule is captured by this synthetic inference rule:
  Γ ⊢ P1-
Γ ⊢ P2-
--------
Γ ⊢ P3-
By further mixing-and-matching the polarities of atomic propositions, we can get a whole host of synthetic inference rules.

First, a clarification

Before I get started with this post, I want to point out that in the last post I said something imprecise: that the strongest possible form of adequacy is to say that the synthetic inference rules induced by your signature have an exact correspondence with the "on-paper" inference rules. I then gave an example, saying that
  ∨L : (hyp A → conc C)
→ (hyp B → conc C)
→ (hyp (A ∨ B) → conc C)
corresponds to this expected synthetic inference rule
  Γ, hyp (A ∨ B), hyp A ⊢ conc C
Γ, hyp (A ∨ B), hyp B ⊢ conc C
-------------------------------- ∨L
Γ, hyp (A ∨ B) ⊢ conc C
only if (hyp A) has positive polarity. But this isn't quite the right story, because from the definition of ∨L we know nothing about the contents of Γ, and our notion of adequacy with respect to the sequent calculus requires that our judgments take the form Γ ⊢ conc C where Γ contains only atomic propositions hyp A for some propositions A. So, really, this strongest possible notion of adequacy needs one additional puzzle piece, a generalization of what the Twelfers would call regular worlds. We can assume the conclusion of the derived rule is in the image of some "on paper" judgment, but we also need to verify that the premises will always end up similarly in the image of some "on paper" judgment.

Now, with that out of the way...

Representing natural deduction in LLF

The post today starts off working with a subset of the Linear Logical Framework (LLF) designed (in large part) by Cervesato. We can reason about specifications in this framework using Jason Reed's HLF/Twelf implementation. Essentially, we will be using this framework the same way we used it last week: defining a signature and looking at the synthetic inference rules that arise from that signature.

It has recently become common to see presentations of the canonical forms of a simply-typed lambda calculus presented in spine form; doing so removes a major source of unplesentless in the proof of global soundness/hereditary substitution. You do have to pay the piper, though: it is more difficult to handle the proof of global completeness/η-expansion in a spine form calculus. However, a Twelf proof that Frank Pfenning worked out in February '09, which uses something like a third-order premise, can successfully handle the proof of global completeness for a spine-form presentation of canonical forms. One version of this Twelf proof, which can be found here, is presented as a bunch of Twelf code without much commentary, and the current story arose in part due to my attempts to provide some commentary and intuition for this Twelf proof of eta expansion and generalize it to substructural logics.

In the process, I ended up investigating another way of presenting logic that is somewhere between a natural deduction and spine form presentation. A bit of Twelf code describing this presentation can be found here, but the key point is that, when we need to verify an atomic proposition Q, we pick some hypothesis A from the context and show that, from a use of A we can prove that Q can be used. This is basically what happens in spine form, but here the "spiney" things associate the opposite of the way they do in spine form presentations. It looks something like this:
  end : use A A.
atm : hyp A -> use A (a Q) -> verif (a Q).
⊃I : (hyp A -> verif B) -> verif (A ⊃ B).
⊃E : use A (B1 ⊃ B2) -> verif B1 -> use A B2.
One thing that we've known for a long time is that the atomic proposition (use A B) is an instance of a pattern - it can be thought of as a Twelf representation of the linear funtion use A -o use B, in which case the rule end above is just the identity function λx.x.

But, since we're putting ourself conceptually in the territory of LLF, not the territory of LF/Twelf, we can represent the thing that is morally a linear function as an actual linear function. The main trick is appropriately using unrestricted implication -> versus linear implication -o in such a way that it controls the branch of the subproof that the linear assumption use B winds up in. The resulting signature looks much like a standard natural deduction presentation with a very non-standard transition between uses and verifications (the usual rule is just use (a Q) -> verif (a Q)) and a sprinkling of lollipops where we'd normally see arrows.
  atm : hyp A -> (use A -o use (a Q)) -> verif (a Q).
⊃I : (hyp A1 -> verif A2) -> verif (A1 ⊃ A2).
⊃E : use (A1 ⊃ A2) -o verif A1 -> use A2.
∧I : verif A1 -> verif A2 -> verif (A1 ∧ A2).
∧E1 : use (A1 ∧ A2) -o use A1.
∧E2 : use (A1 ∧ A2) -o use A2.
We are interested, for the purpose of this presentation, in hypothetical judgments of the form (Γ; · ⊢ hyp A), (Γ; · ⊢ verif A), and (Γ; use A ⊢ use B), where Γ contains unrestricted facts of the form hyp A. In light of this, we can show the synthetic inference rules that result from the above signature.
  ---------------------------- hyp
Γ, hyp A; · ⊢ hyp A

---------------------------- use
Γ; use A ⊢ use A

Γ; · ⊢ hyp A
Γ; use A ⊢ use (a Q)
---------------------------- atm
Γ; · ⊢ verif (a Q)

Γ, hyp A1; · ⊢ verif A2
---------------------------- ⊃I
Γ; · ⊢ verif (A1 ⊃ A2)

Γ; use B ⊢ use (A1 ⊃ A2)
Γ; · ⊢ verif A1
---------------------------- ⊃E
Γ; use B ⊢ use A2

Γ; · ⊢ verif A1
Γ; · ⊢ verif A2
---------------------------- ∧I
Γ; · ⊢ verif (A1 ∧ A2)

Γ; use B ⊢ use (A1 ∧ A2)
---------------------------- ∧E1
Γ; use B ⊢ use A1

Γ; use B ⊢ use (A1 ∧ A2)
---------------------------- ∧E2
Γ; use B ⊢ use A2
There are basically no surprises - the complete HLF/Twelf specification, including global soundness and completeness proofs, is essentially just a repeat of the most closely related related Twelf specification where the elimination rules in the signature look somewhat less unusual.

Switching up the polarity

Now, let us imagine that we add atoms with positive polarity to LLF in a way analagous to their addition in the previous post. Obviously (hyp A) is naturally positive, so we can make that change, but that is not the interesting point. Consider the derived rules if we *also* make (use A) positive and restrict our attention to sequents of the form (Γ; · ⊢ verif A) and (Γ; use A ⊢ use (a Q)).
  ---------------------------- use
Γ; use (a Q) ⊢ use (a Q)

Γ, hyp A; use A ⊢ use (a Q)
---------------------------- atm
Γ, hyp A; · ⊢ verif (a Q)

Γ, hyp A1; · ⊢ verif A2
---------------------------- ⊃I
Γ; · ⊢ verif (A1 ⊃ A2)

Γ; use A₂ ⊢ use (a Q)
Γ; · ⊢ verif A1
---------------------------- ⊃E
Γ; use (A₁ ⊃ A₂) ⊢ use (a Q)

Γ; · ⊢ verif A1
Γ; · ⊢ verif A2
---------------------------- ∧I
Γ; · ⊢ (verif A1 ∧ verif A2)

Γ; use A1 ⊢ use (a Q)
---------------------------- ∧I1
Γ; use (A1 ∧ A1) ⊢ use (a Q)

Γ; use A2 ⊢ use (a Q)
---------------------------- ∧I2
Γ; use (A1 ∧ A1) ⊢ use (a Q)
While our previous formalizaiton was an interesting sort-of modified version of natural deduction, the result when we change the polarity of the atomic proposition (use A) is precisely a focused sequent calculus.

A few related points

There are a lot of other neat explorations to consider in the near vicinity of this topic. Among them are:
  • Generalizing to a polarized, weakly-focused logic. I did most of this once, but there were some issues with my original proof.
  • What's the precise natural deduction system that the system with weak focusing + inversion on negative propositions corresponds to? Does it correspond exactly to the style of natural deduction used by Howe in "Proof Search in Lax Logic"?
  • Generalizing to a fully-focused logic (how close can we get, for instance, to the natural deduction system of Brock-Nannestad and Schürmann ("Focused Natural Deduction," LPAR 2010)?
  • Can this generalize to a focused linear sequent calculus, or do I only get to use linearity once per specification? Could the machinery of HLF be used to work around that?
  • The global soundness and completeness results that I established in HLF have the least amount of slickness possible - I basically just repeat the proof that I gave for the Twelf formalization without linearity. How close can we get to the intuition that the eta expansion theorem says use A -> verif A for all A?

Friday, September 17, 2010

Focusing and synthetic rules

So, I was trying to explain to some people at a whiteboard something that I thought was more generally obvious than I guess it is. So, post! This post assumes you have seen lots of sequent caculi and have maybe have heard of focusing before, but I'll review the focusing basics first. And here's the main idea: focusing lets you treat propositions as rules. This is not an especially new idea if you are "A Twelf Person," but the details are still a bit pecular.

Let's start with a little baby logical framework. Here are the types:
  A ::= A → A | P⁺ | P⁻
Those P⁺ and P⁻ are the atomic propositions, and there can be as many of them as we want for there to be.

Focusing, real quick

There are three judgments that we need to be worried about. Γ ⊢ [ A ] is the right focus judgment, Γ[ A ] ⊢ Q is the left focus judgment, and Γ ⊢ A is the out-of-focus judgment.

Okay. So focusing (any sequent caclulus presentation of logic, really) encourages you to read rules from the bottom to the top, and that's how the informal descriptions will work. The first set of rules deal with right-focus, where you have to prove A right now. If you are focused on a positive atomic proposition, it has to be available right now as one of the things in the context. Otherwise (if you are focused on a negative atomic proposition or A → B), just try to prove it regular-style.
  P⁺ ∈ Γ
-----------
Γ ⊢ [ P⁺ ]

Γ ⊢ P⁻
----------
Γ ⊢ [ P⁻ ]

Γ ⊢ A → B
-------------
Γ ⊢ [ A → B ]
The second set of rules deal with left-focus. One pecular bit: we write left focus as Γ[ A ] ⊢ Q, and by Q we mean either a positive atomic proposition P⁺ or a negative atomic proposition P⁻. If we're in left focus on the positive atom, then we stop focusing and just add P⁺ to the set of antecedents Γ, but if we're in left focus on a negative atomic proposition P⁻, then we must to be trying to prove P⁻ on the right right now in order for the proof to succeed. Then, finally, if we're left focused on A → B, then we have to prove A in right focus and B in left focus.
  Γ, P⁺ ⊢ Q
------------
Γ[ P⁺ ] ⊢ Q

-------------
Γ[ P⁻ ] ⊢ P⁻

Γ ⊢ [ A ]
Γ[ B ] ⊢ Q
---------------
Γ[ A → B ] ⊢ Q
Finally, we need rules that deal with out-of-focus sequents. If we have an out-of-focus sequent and we're trying to prove P⁺, then we can go ahead and finish if P⁺ is already in the context. There is no rule for directly proving A⁻, but if we have a positive or negative atomic proposition that we're trying to prove, we can left-focus and work from there. And if we're trying to prove A → B, we can assume A and keep on trying to prove B.
  P⁺ ∈ Γ
-------
Γ ⊢ P⁺

A ∈ Γ
A is not a positive atomic proposition
Γ[ A ] ⊢ Q
---------------------------------------
Γ ⊢ Q

Γ, A ⊢ B
-----------
Γ ⊢ A → B
There are a lot of different similar presentations of focusing, most of which amount to the same thing, and most of which take some shortchuts. This one is no different, but the point is that this system is "good enough" that it lets us talk about the two big points.

The first big point about focusing is that it's complete - any sequent caclulus or natural deduction proof system for intuitionstic logic will prove exactly the same things as the focused sequent calculus. Of course, the "any other sequent calculus" you picked probably won't have a notion of positive and negative atomic propositions. That's the second big point: atomic propositions can be assigned as either positive or negative, but a given atomic proposition has to always be assigned the same positive-or-negativeness (that positive-or-negativeness is called polarity, btw). And on a similar note, you can change an atomic proposition's polarity if you change it everywhere. This may radically change the structure of a proof, but the same things will definitely be provable. Both of these things, incidentally, were noticed by Andreoli.

Synthetic inference rules

An idea that was also noticed by Andreoli but that was really developed by Kaustuv Chaudhuri is the idea that, when talking about a focused system, we should really think about proofs as being made up of synthetic inference rules, which are an artifact of focusing. The particular case of unfocused sequents where the conclusion is an atomic proposition, Γ ⊢ Q, is a special case that we can call neutral sequents. The only way we can prove a neutral sequent is to pull something out of the context and either finish (if the thing in the context is the positive atomic proposition we want to prove) or go into left focus. For instance, say that it is the case that P⁻ → Q⁻ → R⁻ ∈ Γ. Then the following derivation consists only of choices that we had to make if we left-focus on that proposition.
          ...
Γ⊢Q⁻
... ------ ---------
Γ⊢P⁻ Γ⊢[Q⁻] Γ[R⁻]⊢R⁻
------ ----------------
Γ⊢[P⁻] Γ[Q⁻→R⁻]⊢R⁻
------------------
Γ[P⁻→Q⁻→R⁻]⊢R⁻ P⁻ → Q⁻ → R⁻ ∈ Γ
------------------------------------
Γ⊢R⁻
This is a proof that has two leaves which are neutral sequents and a conclusion which is a neutral sequent, and where all the choices (including the choice of what the conclusion was) were totally forced by the rules of focusing. Therefore, we can cut out all the middle steps (which are totally determined anyway) and say that we have this synthetic inference rule:
  P⁻ → Q⁻ → R⁻ ∈ Γ
Γ ⊢ Q⁻
Γ ⊢ P⁻
-----------------
Γ ⊢ R⁻
This synthetic inference rule is more compact and somewhat clearer than the rule with all the intermediate focusing steps. As a side note, proof search with the inverse method is often much faster, too, if we think about these synthetic inference rules instead of the regular rules: that's part of the topic of Kaustuv Chaudhuri and Sean McLaughlin's Ph.D. theses. Chaudhri calls these things "derived rules" in his Ph.D. thesis, but I believe he is also the originator of the terms "synthetic connective" and "synthetic inference rule."

Let's do a few more examples. First, let's look at a synthetic inference rule for a proposition that has positive atomic propositions in its premises:
           Q⁺∈Γ
... ------ ---------
Γ⊢P⁻ Γ⊢[Q⁺] Γ[R⁻]⊢R⁻
------ ----------------
Γ⊢[P⁻] Γ[Q⁺→R⁻]⊢R⁻
------------------
Γ[P⁻→Q⁺→R⁻]⊢R⁻ P⁻ → Q⁺ → R⁻ ∈ Γ
------------------------------------
Γ⊢R⁻
By convention, when one of the premises is of the form Q⁺ ∈ Γ, we go ahead and write the premise Q⁺ into the context everywhere, so the synthetic inference rule for this proposition is:
  P⁻ → Q⁺ → R⁻ ∈ Γ
Γ, Q⁺ ⊢ P⁻
-----------------
Γ, Q⁺ ⊢ R⁻
If the conclusion ("head") of the proposition is a positive atom instead of a negative one, then we end up with an arbitrary conclusion.
  ...      ....
Γ⊢P⁻ Γ,Q⁺⊢S
------ -------
Γ⊢[P⁻] Γ[Q⁺]⊢S
----------------
Γ[P⁻→Q⁺]⊢S P⁻ → Q⁺ ∈ Γ
---------------------------------------
Γ⊢R
The synthetic inference rule looks like this, where S is required to be an atomic proposition, but it can be either positive or negative:
  P⁻ → Q⁺ ∈ Γ
Γ ⊢ P⁻
Γ, Q⁺ ⊢ S
----------
Γ ⊢ S
If we have a higher-order premise (that is, an arrow nested to the left of an arrow - (P⁻ → Q⁺) → R⁻ is one such proposition), then we gain new assumptions in some of the branches of the proof. Note that the basic "shape" of this rule would not be affected if we gave P⁻ or Q⁺ the opposite polarity - synthetic inference rules are a little less sensitive to the polarity of atoms within higher-order premises.
   ...  
Γ,P⁻⊢Q⁺
-------
Γ⊢P⁻→Q⁺
--------- ---------
Γ⊢[P⁻→Q⁺] Γ[R⁻]⊢R⁻
--------------------
Γ[(P⁻→Q⁻)→R⁻]⊢R⁻ (P⁻ → Q⁺) → R⁻ ∈ Γ
---------------------------------------
Γ⊢R⁻
The synthetic inference rule, one more time, looks like this:
  (P⁻ → Q⁺) → R⁻ ∈ Γ
Γ, P⁻ ⊢ Q⁺
----------
Γ ⊢ R⁻

Application to logical frameworks

One annoyance in all of these derived rules is that each of them had a premise like (P⁻ → Q⁺) → R⁻ ∈ Γ. However, in a logical framework, we usually define a number of propositions in some "signature" Σ, and consider these propositions to be always true. Therefore, given any finite signature, we can "compile" that signature into a finite set of synthetic inference rules, add those to our logic, and throw away the signature - we don't need it anymore, as the synthetic inference rules contain precisely the logical information that was contained in the signature. Hence the motto, which admittedly may need some work: focusing lets you treat propositions as rules.

This is a strategy that hasn't been explored too much in logics where atomic propositions have mixed polarity - Jason Reed and Frank Pfenning's constructive resource semantics papers are the only real line of work that I'm familiar with, though Vivek's comment reminds me that I learned about the idea by way of Jason from Vivek and Dale Miller's paper "A framework for proof systems," section 2.3 in particular. (They in turn got it from something Girard wrote in French, I believe. Really gotta learn French one of these days.) The big idea here is that this is expressing the strongest possible form of adequacy - the synthetic inference rules that your signature gives rise to have an exact correspondance to the original, "on-paper" inference rules.

If this is our basic notion of adequacy, then I claim that everyone who has ever formalized the sequent calculus in Twelf has actually wanted positive atomic propositions. Quick, what's the synthetic connective corresponding to this pseudo-Twelf declaration of ∨L in the sequent calculus?
  ∨L : (hyp A → conc C)
→ (hyp B → conc C)
→ (hyp (A ∨ B) → conc C)
If you thought this:
  Γ, hyp (A ∨ B), hyp A ⊢ conc C
Γ, hyp (A ∨ B), hyp B ⊢ conc C
-------------------------------- ∨L
Γ, hyp (A ∨ B) ⊢ conc C
then what you wrote down corresponds to what we like to write in "on-paper" presentations of the intuitionstic sequent calculus, but it is not the correct answer. Twelf has only negative atomic propositions, so the correct answer is this:
  Γ ⊢ hyp (A ∨ B)
Γ, hyp A ⊢ conc C
Γ, hyp B ⊢ conc C
-------------------------------- ∨L
Γ ⊢ conc C
This is still adequate in the sense that complete on-paper sequent calculus proofs are in one-to-one correspondence with the complete LF proofs: the reason that is true is that, when I am trying to prove Γ ⊢ hyp (A ∨ B), by a global invariant of the sequent calculus I can only succeed by left-focusing on some hyp (A ∨ B) ∈ Γ and then immediately succeeding. However, the partial proofs that focusing and synthetic connectives give rise to are not in one-to-one correspondence.

In order to get the rule that we desire, of course, we need to think of hyp A as a positive atomic proposition (and conc C as negative). If we do that, then the first proposed synthetic inference rule is dead-on correct.

Poll

Hey, I'm kind of new at logicblogging and don't really know who is following me. This was really background for a post I want to write in the future. Background-wise, how was this post?

[Update Nov 11, 2010] Vivek's comments reminded me of the forgotten source for the "three levels of adequacy," Vivek and Dale's "A framework for proof systems," which is probably a more canonical source than Kaustuv's thesis for using these ideas for representation. Also, the tech report mentioned in Vivek's comment replays the whole story in intuitionistic logic and is very close to the development in this blog post.

Sunday, September 12, 2010

Background to constructive provability logic

So after some off-line discussions, while I continue to feel like the natural-deduction based judgmental methodology really maybe isn't the best way to do the work of experimentally developing logics (you really want the sequent calculus version for that - but that's another post), it's an excellent storytelling device. The idea is that it should help you explain a logic that is initially non-intuitive to an audience that you're trying not to scare off.

I'm not even certain how fully I believe that, since I tend to like using sequent calculus presentations for more-or-less everything that isn't a programming language. However, this tech report, by forcing me to work through a lot of things very carefully, has clarified my views on a lot of things - in particular, I think I have a "modern formulation" of the judgmental methodology that neatly generalizes to things like substructural logics without going out of its way, while simultaneously doing a good job of putting on paper all the tricks I learned from Dan about formalizing logics in Agda.

My wildest hope is that this story can motivate to Random J. PLPerson not only what Bernardo and I are doing specifically, but also give the intuition behind a number of things I've learned from Noam Zeilberger and Dan Licata about the philosophy behind both higher-order focusing and behind doing logic in Agda, and without actually requiring any background with focusing or knowledge of Agda to get there. I'd be happy to hear from you whether you think I've succeeded: any criticism would be greatly appreciated. (Draft Technical Report: Constructive Provability Logic)

Thursday, September 9, 2010

Local soundness and completeness are PL-like properties

Currently, I'm trying to write up a tech report that clearly and fully present a peculiar modal logic that I discovered with the help of another graduate student last spring. In this tech report, I'm trying to closely follow what I think is correct to call the judgmental methodology , a philosophy for designing logics that was first outlined by Martin Lof in the 1983 Siena lectures and was refined by Pfenning and Davies in the 1999 paper "A Judgmental Reconstruction of Modal Logic."

Something that I think I knew but that I hadn't ever fully internalized is the degree to which the developments and checks recommended by the judgmental methodology are much closer to what you want when defining a programming language (where type safety - progress and preservation - are primary concerns) than what you want when defining a logic (where consistency, which is a sort of normalization property, is a big concern).

A really, really high level picture of the methodology followed by Pfenning and Davies is this:
  • Define a substitution principle
  • Show that everything that looks it should be a reducible expression (redex) is actually, you know, reducible (local soundness)
  • Show that everything that has an interesting type can be η-expanded (local completeness)
  • Check that the substitution principle which you claimed to be true and possibly used in the previous two steps actually holds
Now, as it turns out, proving the property of local completeness gets you most of the way there to the property called global completeness, but local soundness is woefully inadequate to get what you might call global soundness, the property that given a proof with redexes you can reduce redexes over and over until you get a proof with no redexes anymore. What I think local soundness ensures is that you'll always be able to keep reducing things that looks like redexes unless there aren't any more things that look like redexes. But this is just a progress property, that everything that "looks like a redex" actually is one. The combined local soundness proofs additionally can be seen as the cases of a variant form of the preservation theorem called subject reduction - this means that the local soundness proofs demonstrate both (a sort of) progress property and (a sort of) preservation property.

Is it really the case that the only "more interesting" part of type safety theorems is the fact that programming languages have values - places where, in practice, you want to stop with all of this reducing business - and the safety proof has to show that these uses of values are somehow consistent with the desired progress theorem? And has the "progress-ey" part of local soundness been considered elsewhere? Certainly the preservation-ey part is well-known enough to have its own name, "subject reduction." It seems like the kind of discussion that would be natural to find in the hybrid child of Proofs and Types and TAPL, but I don't believe such a book exists.