Showing posts with label focusing. Show all posts
Showing posts with label focusing. Show all posts

Tuesday, April 3, 2012

Differently-higher-order-focusing

I was toying with an idea last weekend, but it passed the "this is taking too much of your time, go back to your thesis" threshold, so I'm going to jot down what I figured out here, link to some Agda code, and maybe return to it in the future.

I'm going to use intuitionistic multiplicative, additive linear logic (without the exponential \({!}A^-\)) in this note. Doing it this way is the easiest way to explain everything, though it does mean I'm assuming some familiarity with intuitionistic linear logic. I'm also not going to use shifts; the Agda formalizations do use shifts, but the difference isn't important.

In polarized linear logic without shifts, there's one big syntactic class \(A\) with all the propositions from linear logic, but they are sorted into positive and negative propositions based on their topmost connectives. Atomic propositions can be given arbitrary polarity \(p^+\) or \(p^-\) as long as that polarity is consistent.

  • \(A^+ ::= p^+ \mid {\bf 0} \mid A \oplus B \mid {\bf 1} \mid A \otimes B\)
  • \(A^- ::= p^- \mid \top \mid A ~\&~ B \mid A \multimap B\)

Quick review of higher-order focusing

Higher-order focusing as originally defined by Noam takes as the definition of connectives as given by a pattern judgment. Pattern judgments for positive types track the construction of terms; a reasonable way of reading \(\Delta \Vdash A^+\) is that it says "given all the stuff in \(\Delta\), I can construct a proof of \(A^+\)."

\[ \infer {p^+ \Vdash p^+} {} \qquad \infer {A^- \Vdash A^-} {} \]\[ \infer {\Delta \Vdash A \oplus B} {\Delta \Vdash A} \qquad \infer {\Delta \Vdash A \oplus B} {\Delta \Vdash B} \]\[ \infer {\cdot \Vdash {\bf 1}} {} \qquad \infer {\Delta_A, \Delta_B \Vdash A \otimes B} {\Delta_A \Vdash A & \Delta_B \Vdash B} \]

Similarly, the negative pattern judgment tracks how negative types are used. A reasonable reading of the pattern judgment \(\Delta \Vdash A^- > C\) is that it says "given all the stuff in \(\Delta\), I can use a proof of \(A^-\) to prove \(C\)."

\[ \infer {\cdot \Vdash p^- > p^-} {} \qquad \infer {\cdot \Vdash A^+ > A^+} {} \]\[ \infer {\Delta \Vdash A ~\&~ B > C} {\Delta \Vdash A > C} \qquad \infer {\Delta \Vdash A ~\&~ B > C} {\Delta \Vdash B > C} \qquad \infer {\Delta, \Delta' \Vdash A \multimap B > C } {\Delta \Vdash A & \Delta' \Vdash B > C} \]

One of the nicest things about higher-order focusing, in my opinion, is that it removes the question of what order the invertible left rules get applied in altogether: when we come to a point where invertible rules must be applied, the invertible rules happen all at once by logical reflection over the pattern judgment.1

\[ \infer {\Delta, [A^+] \vdash C} {\forall(\Delta' \Vdash A^+) \longrightarrow \Delta, \Delta' \vdash C} \qquad \infer {\Delta \vdash [A^-]} {\forall(\Delta' \Vdash A^- > C) \longrightarrow \Delta, \Delta' \vdash C} \]

In particular, this means that a derivation of \(\Delta, [p^+ \oplus (A^- \otimes q^+) \oplus {\bf 1}] \vdash C\) has three immediate premises corresponding to the three one of which is a derivation of \(\Delta, p^+ \vdash C\), one of which is a derivation of \(\Delta, A^-, q^+ \vdash C\), and one of which is a derivation of \(\Delta \vdash C\).

This "do all the inversion all at once" idea is a strengthening of the idea that appears in Andreoli's original presentation of focusing, in the definition of CLF, in Kaustuv Chaudhuri's focused linear logic, and in my work on structural focalization. All of these aforementioned presentations have a separate, ordered inversion context that forces you to decompose positive propositions into contexts and negative propositions in one arbitrary-but-fixed order, which kind of lets you pretend they all happen at once, since you can't make any choices at all until you're done. This isn't the only way to look at it, though. In Miller and Liang's LJF, in Frank Pfenning's course notes on Linear Logic, and in probably a bunch of other places, the inversion steps can happen in any order, though for a fully focused system you have to do get through all the possible inversion steps before you focus again. In order to get the strength of a Andreoli/Chaudhuri/me-focused systems in an LJF-like setting, one must prove a Church-Rosser-esque property that all ways of doing inversion are the same.

The reason I think the use of higher-order formulations is particularly nice is that it blows away the distinction between systems that fix the inversion order (like Structural Focalization and its many predecessores does) and systems that allow nondeterminism in the inversion order (like LJF and its kin). There's no question about the presence or absence of inessential nondeterminism when all the nondeterminism seriously happens in one rule that has as many premises as necessary to get the job done.

One of the less-awesome things about higher-order focusing is that it requires this extra substitution judgment, a complication that I won't go into here but that gets increasingly annoying and fiddly as we go to more and more interesting substructural logics, which are of particular interest to me. (This is not to say that it doesn't work, I wrote a note on higher-order focusing for ordered logic way back two years ago when Request For Logic was a series of Unicodey-notes instead of a blog.) To try to get the best of both worlds, Taus and Carsten's Focused Natural Deduction is the only system I know of that tries to use the pattern judgments just for negative introduction/positive elimination while using the familiar negative eliminations and positive introduction structures.

I want to play the same game Taus and Carsten play, but without using the pattern judgments at all. I think I have a better idea, but it ended up being more work than I have time for to actually prove that it's better (or to conclude that I'm wrong and it's not).

Alternate formulation

If \(P(\Delta)\) is a judgment on linear contexts \(\Delta\), then \(I^+_{A^+}(\Delta. P(\Delta))\) - which we can mathematically \(\eta\)-contract and write \(I^+_{A^+}(P)\) where convenient - is also a judgment, defined inductively on the structure of the positive proposition \(A^+\) as follows:

  • \(I^+_{p^+}(\Delta. ~ P(\Delta)) = P(p^+) \)
  • \(I^+_{A^-}(\Delta. ~ P(\Delta)) = P(A^-) \)
  • \(I^+_{\bf 0}(\Delta. ~ P(\Delta)) = \top \)
  • \(I^+_{A \oplus B}(\Delta. ~ P(\Delta)) = I^+_{A}(P) \times I^+_{B}(P) \)
  • \(I^+_{\bf 1}(\Delta. P(\Delta)) = P(\cdot)\)
  • \(I^+_{A \otimes B}(\Delta. ~ P(\Delta)) = I^+_{A}(\Delta_A. ~ I^+_{B}(\Delta_B. ~ P(\Delta_A, \Delta_B)) \)
This definition has a certain obvious correspondance to the pattern judgment from "Noam-style" higher order focusing, but it's very direct: given a property \(P\) that involves a linear context, \(I_{A^+}(P)\) proves that property on every context which can prove \(A^+\).

Given this judgment, we can use it pretty straightforwardly to define the inversion judgment. \[ \infer {\Delta, [A^+] \vdash C} {I^+_{A^+}(\Delta_{A}. ~~\Delta, \Delta_{A} \vdash C)} \] The result is exactly same as the formulation that used the higher-order pattern judgment, but this formulation is, in my mind at least, rather more direct while just as appropriately higher-order.

The story for negative types is the same. If \(P(\Delta,C)\) is a judgment on linear contexts \(\Delta\) and conclusions (conclusions are positive propositions \(A^+\) and negative atomic propositions \(p^-\)), then \(I^+_{A^-}(P)\) is defined inductively on the structure of types as follows:

  • \(I^-_{p^-}(\Delta,C. ~ P(\Delta,C)) = P(\cdot,p^-)\)
  • \(I^-_{A^+}(\Delta,C. ~ P(\Delta,C)) = P(\cdot,A^+)\)
  • \(I^-_{A ~\&~ B}(\Delta,C. ~ P(\Delta,C)) = I^-_{A}(P) \times I^-_{B}(P)\)
  • \(I^-_{A \multimap B}(\Delta. ~ P(\Delta,C)) = I^+_{A}(\Delta_A.~I^-_{B}(\Delta,C. ~ P((\Delta_A,\Delta), C)\)
The right-inversion rule follows the same pattern as the left-inversion rule: \[ \infer {\Delta \vdash [ A^- ]} {I^-_{A^-}(\Delta',C. ~~ \Delta, \Delta' \vdash C)}\]

Where's the Beef?

I think this crazy new formulation of focusing is somewhat more elegant than the pattern-judgment-using formulation Taus and Carsten gave, even though it is otherwise rule-for-rule the same. However, that's not an argument that will convince anyone besides myself; there needs to be some advantage to using this formulation for it to have value. In particular, this formulation should make it easier and/or more beautiful to prove cut, identity expansion, and the completeness of focusing - it should at least have similar grace when formalized in Agda to the Agda proof of structural focalization. It turns out that the identity expansion theorem can be done with this new formulation in Agda, and it's beautiful. (When I showed it to Dan Licata he started muttering something something Yoneda Lemma, so it's at least likely the structure of identity expansion has some interesting and common categorial structure.) It's certainly obvious that there's some interesting deep structure at work.

Furthermore, it seems like, given this formulation, it should be possible to get the cut admissibility theorem working as an obviously turned-around version of identity expansion, which is the kind of interesting structure I've had my eye out for ever since I observed it in the process interpretation of linear logic. I haven't made that work yet, though, so I'm going to stick with inversion contexts for now and maybe return to poking at this post-thesis, if nobody else has figured it out in the meantime - or, indeed, if nobody else has already figured it out in the past.

Here are some Agda fragments for dealing with this idea rigid logic (a non-associative, non-unit-law-having version of ordered logic that is useless except as a way of thinking about substructural logics in Agda, as far as I can tell), and some other Agda fragments for persistent logic. This whole blog post should serve as a disclaimer that these are all quite half-baked.


1An aside on "higher-order"

This is the part of higher-order focusing that gives it the name "higher-order," in the sense that the derivation constructors take functions as arguments. I can't quite figure out at this moment how this usage fits into Chris's discussion of what should and should not be called higher order.

Noam called the paper introducing pattern judgments to the PL world "Focusing and higher-order abstract syntax" (POPL 2008). Noam now sees that naming as unfortunate, and I agree; it blurs the definition of "higher-order abstract syntax" (HOAS) in a different direction that it gets blurred elsewhere in the literature. In my idiolect, which of course I can't force on everyone else, I reserve HOAS for the use of substitution functions to encode term structure - in other words, I use it as a rough synonym of what I and others also call "abstract binding trees." (I'm not sure who coined "abstract binding tree," I associate it with Bob Harper.)

The whole point of higher-order focusing is that this function - the one from the pattern judgment \(\Delta' \Vdash A\) to the provability judgment \(\Delta, \Delta' \vdash C\) - is a function that case analyzes its arguments (to figure out which pattern judgment was given as an argument to the function). Putting our set-theorist hats on, it's a map, a potentially infinite list of (pattern derivation, provability derivation) tuples such that every pattern derivation is associated with exactly one provability derivation. Then we take off our set-theorist hats 'cause we dislike those hats; it's a real (pure, terminating, effect-free) function, like in the sense of functional programming. Or just programming. I personally try to use the phrase logical reflection to emphasize this pattern-matching higher-order-ness; Noam now prefers abstract higher-order syntax, which is fine too.

Monday, March 5, 2012

What does focusing tell us about language design?

This post is largely based on some conversations I've had about Polarized Logic/Focusing/Call-By-Push-Value recently, especially with Neel Krishnaswami and Wes Filardo, though it was originally prompted by Manuel Simoni some time ago.

I think that one of the key observations of focusing/CBPV is that programs are dealing with two different things - data and computation - and that we tend to get the most tripped up when we confuse the two.

  • Data is classified by data types (a.k.a. positive types). Data is defined by how it is constructed, and the way you use data is by pattern-matching against it.
  • Computation is classified by computation types (a.k.a. negative types). Computations are defined their eliminations - that is, by how they respond to signals/messages/pokes/arguments.

There are two things I want to talk about, and they're both recursive types: call-by-push-value has positive recursive types (which have the feel of inductive types and/or algebras and/or what we're used to as datatypes in functional languages) and negative recursive types (which have the feel of recursive, lazy records and/or "codata" whatever that is and/or coalgebras and/or what William Cook calls objects). Both positive and negative recursive types are treated by Paul Blain Levy in his thesis (section 5.3.2) and in the Call-By-Push Value book (section 4.2.2).

In particular, I want to claim that Call-By-Push-Value and focusing suggest two fundamental features that should be, and generally aren't (at least simultaneously) in modern programming languages:

  1. Support for structured data with rich case analysis facilities (up to and beyond what are called views)
  2. Support for recursive records and negative recursive types.

Minor addition to "core Levy"

I'll be presenting with an imaginary extension to Bauer's Levy language in this post.1 I've mucked around Levy before, of course, and it would probably help to review that previous post before reading this one. I want to make one addition to Levy before we begin making big, interesting ones. The derived form that I want to add - e1 orelse e2 - is computation code with type F bool if e1 and e1 have type F bool as well. This is definable as syntactic sugar, where x is selected to not be free in e2:

   e1 to x in 
   if x then return true 
   else e2
One other aside - do remember that, at the cost of potential confusion, I modified Bauer's original Levy implementation to make force a prefix operator that binds more tightly than application - force g z y can be written with parentheses as (((force g) z) y).

Positive types, positive recursive types

The functional languages have the construction of positive types just about right: they're made up of sums and products. I used a special LLF-ish construction for datatypes in my exploration of Levy#, but the more traditional way of introducing datatypes is to say that they are recursive types μt.T(t). The recursive types we're used to thinking about are naturally interpreted as positive types, because they are data and are inspected by pattern matching.

There's a tendency in programming language design to shove positive recursive types together with labeled sum types to get a familiar datatype mechanism.2 I will go along with this tendency and merge labeled sums and recursive types, writing them as μt.[L1: T1(t), L2: T2(t), ..., Ln: Tn(t)].

Here are datatype definitions for trees of ints and lists of ints:

   type+ tree = 
   μtree. 
   [ Leaf: int, 
     Node: tree * tree ]

   type+ list = 
   μlist. 
   [ Nil, 
     Cons: int * list ]
Note from the definition of lists that we also allow types to have no arguments: it's possible to treat the definition of Nil as syntactic sugar for Nil: unit. The associated value is Nil, which is syntactic sugar for Nil ().

There are infinitely many trees and lists as we've defined them. In fact, it's just a convention of how we have to write programs that we think of positive types as being finite sums. Even though we can't write it down as a datatype declaration, it's perfectly reasonable to think of a built-in type of infinite precision integers as being defined as follows:

   type+ int = 
   μint. 
   [ 0, 
     1,
     ~1,
     2,
     ~2,
     3,
     ... ]
The same goes for built-in strings and the like - think of built-in types being positive enumerations that were provided specially by the language since the language didn't give us the ability to write down the full declaration containing all possible constants.

Powerful views

A point that has not, in my humble opinion, been made simply enough is that the perspective of focusing says that we should think about integrating much more powerful case analysis mechanisms into our programming languages. I learned about this point from Licata, Zeilberger, and Harper's 2008 Focusing on Binding and Computation, but their setting had enough other interesting stuff going on that it probably obscured this simple point.

Standard ML (and, to the best of my knowledge, all other functional languages circa-1997) only provides a limited form of case analysis - arbitrary finite views into the outermost structure of the datatype:

   case x of
    | Leaf 9 => ...
    | Leaf y => ...
    | Node (Node (Leaf 5, y), z) => ...
    | Node (y, z) => ...
This limitation comes with a nice tradeoff, in that we can pretty effectively estimate how much work compiled code needs to do to handle a pattern match. However, the structure of focusing suggests that any mathematical function from a value's structure to computation is fair game. One well-studied modernization of pattern matching is views, which allows us to group computations in other ways. One use case would be allowing us to take an integer variable x and say that it is even or odd:
   case x of
    | 0 => return true
    | x is even => return false
    | ~1 => return false
    | x is odd => return true
The focusing-aware view of pattern matching suggests that what a pattern match is actually doing is defining a case individually for each value structure - if we could write a countably-infinite-sized program, then we could expand the view-based program above to the no-longer-view-based, but infinitely long, program below:
   case x of
    | 0 => return true
    | 1 => return true
    | ~1 => return false
    | 2 => return false
    | ~2 => return false
    | 3 => return true
    | ...
So: the focusing perspective says that any effect-free (effects include nontermination!) mathematical function we can write from value structure to computations is fair game for case analysis; views are just one known way of doing this in an interesting way. In principle, however, we can consider much richer case-analysis machinery to be fair game. For instance, there is a mathematical function \(f\) from integer values int to computation code of type F bool with the variable coll free:
  • If the integer is 1, the result is return true
  • If the integer is less than 1, the result is return false
  • Otherwise, if the integer is divisible by 2 and the result of dividing the integer by 2 is i, then the result is force coll i
  • Otherwise, let j be 3 times the integer, plus one. Then the result is force coll j
Given this mathematical function, we have the core case analysis at the heart of the Collatz function. If we expand out this case analysis into an infinitely-long mapping as before, it would look like this:
   rec coll: int -> F bool is
     fn x: int =>
       case x of
        | 0 => return false
        | 1 => return true
        | ~1 => return false
        | 2 => force coll 1
        | ~2 => return false
        | 3 => force coll 10
        | ~3 => return false
        | 4 => force coll 2
        | ~4 => return false
        | 5 => force coll 16
        | ...
The important question here is: what is the right way to think about how we write down total functions from value structures? Are views about right, or do we need something different, clearer/more/less powerful? I don't have a sense for the answers, but I'm pretty confident that's the right question.

Views are only over the structure of values

One critical caveat: I used the phrase value structure in the previous discussion repeatedly on purpose. Because computations can always contain effects, we cannot poke at the computations at all when we think about views - the value structure is just the part of the value that we can inspect without traversing a thunk. Therefore, if we have this definition...

   type+ somefn = 
   μsomefn. 
   [ BoolFn: U (bool -> F bool), 
     IntFn: U (int -> F int) ]
...then a case analysis on a value of type somefn can have at most two branches - one for the BoolFn case and one for the IntFn case. We can't pattern-match into computations, so even though it would be reasonable to say that there are many, many values of type somefn, there are only two value structures that can be inspected by case analysis.

Negative types, negative recursive types

In call-by-push-value, the computation types (a.k.a. the negative types of polarized logic/focusing) are defined by the way we use them. Computation code of computation type does stuff in response to being used - a function does stuff in response to getting an argument, and so the most well-understood negative type is implication.

Conjunction (the product or pair type) is a bit slippery, says the CBPV/focusing methodology. We've already used conjunction as a positive type (t1 * t2), but we can also think of conjunction as being a negative type. This is because it's equally valid to think of a pair being positively defined by its construction (give me two values, that's a pair) and negatively as a suspended computation that awaits you asking it for its first or second component.

CPBV/focusing makes us choose: we can have either type of conjunction, or we can have both of them, but they are different. It will be convenient if we keep the normal tuples as positive types, and associate negative types with records - that is, with named products. Records are defined by how you project from them, and just as we tie labeled sums in with positive recursive types, we'll tie labeled products in with negative recursive types.

Negative recursive types are naturally codata in the same way that positive recursive types are naturally data: the canonical example is the infinite stream.

   type- stream = 
   μstream. 
   { head: F int, 
     tail: stream }
It's natural to define particular streams with fixedpoints:
   val Ones = thunk rec this: stream is
   { head = return 1, 
     tail = force this }

   val CountUp = thunk rec this: int -> stream is
   fn x: int =>
   { head = return x, 
     tail = 
       force plus x 1 to y in
       force this y }
Say I wanted to get the fourth element of an infinite stream str of type U stream. The ML-ish way of projecting from records would write this as #tail (#head (#head (#head (force str)))), but I will pointedly not use that notation in favor of a different record notation: (force str).head.head.head.tail. It nests better, and works better in the case that record elements are functions. (Because functions are negative types, like records, it is very natural to have functions be the arguments to records.)

Negative recursive types as Cook's objects

Here's a bit of a case study to conclude. The entities that William Cook names "Objects" in his paper On Understanding Data Abstraction, Revisited are recognizably negative recursive types in the sense above.3 Cook's examples can be coded in Standard ML (see here), but the language gets in the way of this encoding in a number of places.4 To see what this would look like in a language with better support, I'll encode the examples from Cook's paper in my imaginary CBPV implementation. The recursive negative type looks much like a stream, and encodes the interface for these set objects.

   type- iset = 
   μiset. 
   { isEmpty: F bool, 
     contains: int -> F bool, 
     insert: int -> iset, 
     union: U iset -> iset }
One difficulty: the Insert and Union operations that Cook uses involve mutual recursion. I don't have mutual recursion, so rather than pretending to have it (I've abused my imaginary language extension enough already) I'll code it up using records. The record type insertunion I define isn't actually recursive, it's just a record - this is analogous to using ML's datatype mechanism (which, as we've discussed, makes positive recursive types) to define a simple enumeration.
   type- insertunion = 
   μinsertunion.
   { Insert: U iset * int -> iset,
     Union: U iset * U iset -> iset }

   val InsertUnion = thunk
     rec x: insertunion is
     { Insert = fn (s, n) =>
       (force s).contains n to b in
       if b then force s else 
       rec self: iset is
       { isEmpty = return false,
         contains = fn i => (force equal i n) orelse (force s).contains i,
         insert = fn i => (force x).Insert (self, i),
         union = fn s' => (force x).Union (self, s') },

       Union = fn (s1, s2) => 
       rec self: iset is
       { isEmpty = (force s1).isEmpty orelse (force s2).isEmpty,
         contains = fn i => (force s1).contains i orelse (force s2).contains i,
         insert = fn i => (force x).Insert (self, i),
         union = fn s' => (force x).Union (self, s') } }

   val Insert: U (U iset * int -> iset) = thunk ((force InsertUnion).Insert)

   val Union: U (U iset * U iset -> iset) = thunk ((force InsertUnion).Union)
We've defined union of sets and insertion into a set, but we haven't actually defined any sets yet! Once we give the implementation of an empty set, however, we can manipulate these sets with a Java-esque method invocation style:
   val Empty = thunk rec self: iset is
   { isEmpty = return true,
     contains = fn x => return false,
     insert = fn i => force Insert (self, i),
     union = fn s' => force s' }

   val JustOne = thunk (force Empty).insert 1
   (force Empty).insert(3).union(JustOne).insert(5).contains(4)
As Cook noted, it's also very natural to talk about infinite sets in this style, such as the set of all even numbers and the set of all integers:
   val Evens = thunk rec self: iset is 
   { isEmpty = return false,
     contains = fn i => 
       force mod i 2 to x in
       (case x of 
         | 0 => return true
         | _ => return false),
     insert = fn i => force Insert (self, i),
     union = fn s' => force Union (self, s') }

   val Full = thunk rec self: iset is
   { isEmpty = return false,
     contains = fn i => return true,
     insert = fn i => force self,
     union = fn s' => force self }


1 If anyone is interested in helping me implement a toy language along these lines, I'm all ears. Also, the disclaimer that since I don't have an implementation there are surely bugs bugs bugs everywhere is strongly in effect.

2 As a practical matter, this makes it easier for the language to know where to put implicit roll/unroll annotations, so that programmers doesn't have to write these explicitly, which would be a pain.

3 Cook does give the disclaimer that this definition is "recognized as valid by experts in the field, although there certainly are other valid models." I carefully call these things "Cook objects" rather than "objects" - my claim is that negative recursive types correspond to what William Cook names objects, not that they correspond to what you call objects. Note, furthermore, that I could be wrong about what William Cook calls objects! I only have his paper to go on and I have been known to be a few cookies short of a haystack.

4 There are two issues with the encoding in SML. The most basic is that we have to encode the recursive types using SML's recursive type mechanism, which is biased towards the positive encoding of recursive types as finite sums. The other issue is that the way SML ties together (mutual) recursion with closures gets in the way - the standard way of writing a "thunk" as a function from unit is all over the place, and it wan't possible to define EmptySet so that it was mutually recursive with the definition of Insert and Union as a result. So, I'm certainly not arguing that SML facilities programming in (what Cook calls) an object-oriented style in a strong sense - it's unnecessarily painful to code in that style - but the minimal semantic building blocks of what Cook presented and named Object are present and accounted for. [Update:] As gasche discusses in the comments, Ocaml's records, and the way it deals with laziness, make it quite a bit better at encoding the example, but it's still not ideal.

Thursday, January 12, 2012

Structural focalization updated

I've uploaded to both ArXiV and my webpage a significantly revised draft of the paper Structural focalization, which I've spoken about here before. Feedback is welcome!

One of the points I make about the structural focalization technique is that, because it is all so nicely structurally inductive, it can be formalized in Twelf. As part of a separate project, I've now also repeated the whole structural focalization development in Agda! The code is available from GitHub. While a structural focalization proof has some more moving parts than a simple cut-and-identity proof, it also has one significant advantage over every Agda proof of cut admissibility that I'm aware of: it requires no extra structural metrics beyond normal structural induction! (My favorite structural metric is the totally nameless representation, but there are other ways of threading that needle, including, presumably, these "sized types" that everyone seems to talk about.)

In regular, natural-deduction substitution, you can get away without structural metrics by proving the statement that if \(\Gamma \vdash A\) and \(\Gamma, A, \Gamma' \vdash C\) then \(\Gamma, \Gamma' \vdash C\); the extra "slack" given by \(\Gamma'\) means that you operate by structural induction on the second given derivation without ever needing to apply weakening or exchange. Most cut-elimination proofs are structured in such a way that you have to apply left commutative and right commutative cuts on both of the given derivations, making this process tricky at the best; I've never gotten it to work at all, but you might be able to do something like "if \(\Gamma, \Gamma' \longrightarrow A\) and \(\Gamma, A, \Gamma'' \longrightarrow C\) then \(\Gamma, \Gamma', \Gamma'' \longrightarrow C\)." If someone can make this work let me know!

A focused sequent calculus, on the other hand, has three separate phases of substitution. The first phase is principal substitution, where the type gets smaller and you can do whatever you want to the derivations, including weakening them. The second phase is rightist substitution, which acts much like natural-deduction substitution, and where you can similarly get away with adding "slack" to the second derivation. The third phase is leftist substitution, and you can get by in this phase by adding "slack" to the first derivation: the leftist cases read something like "if \(\Gamma, \Gamma' \longrightarrow A\) and \(\Gamma, A \longrightarrow C\) then \(\Gamma, \Gamma' \longrightarrow C\)."

In Structural focalization, I note that the structural focalization technique could be seen as a really freaking complicated way of proving the cut and identity for an unfocused sequent calculus. But in Agda, there's a reason you might actually want to do things the "long way" - not only do you have something better when you finish (a focalization result), but you get cut and identity without needing an annoying structural metric.

Saturday, December 17, 2011

Notes on classical sequent calculi (1/2)

These are some notes I made to try to help me understand Noam's focused presentation of classical logic in Polarity and the Logic of Delimited Continuations. I hope these notes coud be useful to others.

Sequent presentations of classical logic

I write, inevitably, from the perspective of an intuitionistic proof theorist, so let's start there. In intuitionistic logics, sequents have the form \(\Gamma \vdash A~\mathit{true}\), where the hypothetical context \(\Gamma\) has the form \(A_1~\mathit{true}, \ldots, A_n~\mathit{true}\). The whole sequent \(\Gamma \vdash A~\mathit{true}\) is read as "assuming the truth of all the things in \(\Gamma\), we know \(A~\mathit{true}\)." Let's look at a couple of ways of presenting sequent calculi for classical logic.

Two-sided judgmental classical sequent calculi

One way of presenting a classical sequent calculus is to give a two-sided sequent, \(\Gamma \vdash \Delta\). As before, \(\Gamma\) has the form \(A_1~\mathit{true}, \ldots, A_n~\mathit{true}\), but \(\Delta\) has the form \(B_1~\mathit{false}, \ldots, A_m~\mathit{false}\), and the whole sequent is read as "taken together, if all the things judged true in \(\Gamma\) are true and all the things judged false in \(\Delta\) are false, then there is a contradiction."

The language of propositions is \(A ::= P \mid \neg A \mid A \wedge B \mid A \vee B\) - uninterpreted atomic propositions \(P\), negation, conjunction ("and"), and disjunction ("or"). I'm leaving out truth \(\top\) and falsehood \(\bot\) because they're boring. The rules for these two-sided classical sequent calculi look like this:

\[ \infer {\Gamma, ~ P~\mathit{true} \vdash \Delta, ~ P~\mathit{false}} {} \] \[ \infer {\Gamma \vdash \Delta, ~ \neg A~\mathit{false}} {\Gamma, ~ A~\mathit{true} \vdash \Delta} \qquad \infer {\Gamma, ~ \neg A~\mathit{true} \vdash \Delta} {\Gamma \vdash \Delta, ~ A~\mathit{false}} \] \[ \infer {\Gamma \vdash \Delta, ~ A \wedge B~\mathit{false}} {\Gamma \vdash \Delta, ~ A~\mathit{false} &\Gamma \vdash \Delta, ~ B~\mathit{false}} \qquad \infer {\Gamma, ~ A \wedge B~\mathit{true} \vdash \Delta} {\Gamma, ~ A~\mathit{true}, ~ B~\mathit{true} \vdash \Delta} \] \[ \infer {\Gamma \vdash \Delta, ~ A \vee B~\mathit{false}} {\Gamma \vdash \Delta, ~ A~\mathit{false}, ~ B~\mathit{false}} \qquad \infer {\Gamma, ~ A \vee B~\mathit{true} \vdash \Delta} {\Gamma, ~ A~\mathit{true} \vdash \Delta &\Gamma, ~ B~\mathit{true} \vdash \Delta} \]

Two asides. First, in presentations that do not emphasize the fact that \(A_i~\mathit{false}\) and \(B_j~\mathit{true}\) are judgments and not propositions, there is another reading of the two-sided sequent \[A_1,\ldots,A_n \vdash B_1,\ldots,B_m\] This interpretation is that the truth of all of the \(A_i\) implies the truth of one of the \(B_j\) - this reading suggests a reading of any intuitionistic sequent proof as a classical sequent proof with one conclusion. You should convince yourself that this interpretation is equivalent to the interpretation above (hint: it's just a mode of use of De Morgan's laws).

Second aside: your rules may differ. I'm using a style of presentation where every connective is broken down by a unique connective and, from the perspective of bottom-up proof search, it's never a mistake to apply any rule, because the conclusion implies all of the premises (a property called invertibility). The "true" (or left) rule for conjunction (that is, "and" or \(\wedge\)) and the "false" (or right) rule for disjunction (that is, "or" or \(\vee\)) both have a different, non-invertible presentation. In the case of conjunction, it's this pair of rules: \[ \infer {\Gamma, ~ A \wedge B~\mathit{true} \vdash \Delta} {\Gamma, ~ A~\mathit{true} \vdash \Delta} \qquad \infer {\Gamma, ~ A \wedge B~\mathit{true} \vdash \Delta} {\Gamma, ~ B~\mathit{true} \vdash \Delta} \] You could "make a mistake" applying these rules in bottom-up proof search: just because there is a proof of \(\Gamma, ~ A \wedge B~\mathit{true} \vdash \Delta\) does not mean that there is a proof of \(\Gamma, ~ A~\mathit{true} \vdash \Delta\).

One-sided judgmental sequent sequent calculi

Of course, hypotheses are just hypotheses, there's no a priori reason why we need to separate the true ones and the false ones into separate contexts. Let's use a unified context and call it \(\Psi\). \[\Psi ::= \cdot \mid \Psi, A~\mathit{true} \mid \Psi, A~\mathit{false}\] Then, we can have the sequent form \(\Psi \vdash \#\), which we read as "all the assumptions in \(\Psi\) together imply a contradiction" - we pronounce \(\#\) as "contradiction." We'll need rewrite all of our rules:

\[ \infer {\Psi, ~ P~\mathit{true}, ~ P~\mathit{false} \vdash \#} {} \] \[ \infer {\Psi, ~ \neg A~\mathit{false} \vdash \#} {\Psi, ~ A~\mathit{true} \vdash \#} \qquad \infer {\Psi, ~ \neg A~\mathit{true} \vdash \#} {\Psi, ~ A~\mathit{false} \vdash \#} \] \[ \infer {\Psi, ~ A \wedge B~\mathit{false} \vdash \#} {\Psi, ~ A~\mathit{false} \vdash \# &\Psi, ~ B~\mathit{false} \vdash \#} \qquad \infer {\Psi, ~ A \wedge B~\mathit{true} \vdash \#} {\Psi, ~ A~\mathit{true}, ~ B~\mathit{true} \vdash \#} \] \[ \infer {\Psi, ~ A \vee B~\mathit{false} \vdash \#} {\Psi, ~ A~\mathit{false}, ~ B~\mathit{false} \vdash \#} \qquad \infer {\Psi, ~ A \vee B~\mathit{true} \vdash \#} {\Psi, ~ A~\mathit{true} \vdash \# &\Psi, ~ B~\mathit{true} \vdash \#} \]

Hopefully you'll agree that this is "obviously the same" as the first presentation.

One-sided, truth-oriented sequent calculi

But wait! The "false" rule for conjunction looks just like the "true" rule for disjunction, and the "true" rule for conjunction looks just like the "false" rules for disjunction. Can we simplify these rules?

The usual answer is that you can, indeed, do with fewer rules and without a false judgment at all. However, we need two twists to deal with the rules that involved both the true and false judgments. First, we need to let every atomic proposition come in two flavors, the "regular" flavor \(P\) and the "negated" flavor \(\overline{P}\). Then, the rule dealing with atomic propositions looks like this: \[ \infer {\Gamma, ~ P~\mathit{true}, ~ \overline{P}~\mathit{true} \vdash \#} {} \] Second, instead of negation being a proposition \(\neg A\), we define a negation function, which I will write as \((A)^\bot\) to distinguish it from the propositional negation \(\neg A\). W. The negation function is defined as follows: \[ \begin{align} {(P)^\bot} = & \overline{P}\\ {(\overline{P})^\bot} = & P\\ {(A \wedge B)^\bot} = & {(A)^\bot} \vee {(B)^\bot}\\ {(A \vee B)^\bot} = & {(A)^\bot} \wedge {(B)^\bot}\\ \end{align} \] With this definition, we can eliminate the negation proposition altogether - the negation function just applies De Morgan laws all the way down to atomic propositions. We now get our sequent calculus for "half off" - there's no more official negation, and we don't need the false judgment at all anymore. We only need two more rules (for a total of three)!

\[ \infer {\Psi, ~ A \wedge B~\mathit{true} \vdash \#} {\Psi, ~ A~\mathit{true}, ~ B~\mathit{true} \vdash \#} \qquad \infer {\Psi, ~ A \vee B~\mathit{true} \vdash \#} {\Psi, ~ A~\mathit{true} \vdash \# &\Psi, ~ B~\mathit{true} \vdash \#} \]

It would also be possible to play this game the other way around: gather everything on the right-hand side, bias the whole thing towards the "false" judgment, and basically get the "other half" of the two-sided sequent calculi. This ability to play the game equally well either way is part of what people mean when they say that classical logic is "very symmetric."

However, given that it's all the same, why not reason about truth and not falsehood? I've never understand why classical linear logic (in particular) always seems to bias itself towards one-sided sequent calculi on the right. There are important differences in what it means to think like a classical linear logician and what it means to think like an intuitionistic linear logician, but I really think that it unnecessarily exacerbates this divide when we have to turn all of our \(\oplus\)es to \(\&\)s and \(\otimes\)es to pars in order to talk to one another.

Polarized presentations of classical logic

Now for the real purpose of this note: writing out the review of classical logic that Noam gives in "Polarity and the Logic of Delimited Continuations." This discussion is a synthesis of that presentation and a little bit of "On the unity of duality."

Two for the price of two

Fundamentally, the observation Noam is making is that the one-sided truth-oriented sequent calculus goes too far - really, there are two kinds of disjunction, and two kinds of conjunction, which is why it seemed like the original calculus seemed to have redundancies. The third system above (the one-sided, truth-oriented sequent calculus) made it look like we were getting our logic for "half-off" - but really that's because the first two judgmental presentations were defining twice as many connectives as appeared to the naked eye. (As an aside, if you study classical linear logic, you're forced into the same conclusion for different reasons.)

Jason Reed taught me that, if you have two different judgments in a logic, it's worth seeing what happens if you syntactically differentiate the things you judge to be true and the things you judge to be false. Let's go ahead and "guess the right answer" - I'm going to call the things we judge to be true positive, and that the things we judge to be false negative. There's more than one of everything! \[ \begin{align} A^- = & \neg^- A^+ \mid P^- \mid A^- \wedge^- B^- \mid A^- \vee^- B^-\\ A^+ = & \neg^+ A^- \mid P^+ \mid A^+ \wedge^+ B^+ \mid A^+ \vee^+ B^+ \end{align} \] Here are a bunch of rules: note that the fact that the two negations change the polarity of the propositions; the rules make it evident that this is the right thing to do, because we have (for example) \(\neg^+ A^- ~\mathit{true}\) but \(A^- ~\mathit{false}\):

\[ \infer {\Psi, ~ \neg^- A^+~\mathit{false} \vdash \#} {\Psi, ~ A^+~\mathit{true} \vdash \#} \qquad \infer {\Psi, ~ \neg^+ A^-~\mathit{true} \vdash \#} {\Psi, ~ A^-~\mathit{false} \vdash \#} \] \[ \infer {\Psi, ~ A^- \wedge^- B^-~\mathit{false} \vdash \#} {\Psi, ~ A^-~\mathit{false} \vdash \# &\Psi, ~ B^-~\mathit{false} \vdash \#} \qquad \infer {\Psi, ~ A^+ \wedge^+ B^+~\mathit{true} \vdash \#} {\Psi, ~ A^+~\mathit{true}, ~ B^+~\mathit{true} \vdash \#} \] \[ \infer {\Psi, ~ A^- \vee^- B^-~\mathit{false} \vdash \#} {\Psi, ~ A^-~\mathit{false}, ~ B^-~\mathit{false} \vdash \#} \qquad \infer {\Psi, ~ A^+ \vee^+ B^+~\mathit{true} \vdash \#} {\Psi, ~ A^+~\mathit{true} \vdash \# &\Psi, ~ B^+~\mathit{true} \vdash \#} \]

So, are we good? Well, no, not really. The problem is that the "+" or "-" stuck to an atomic proposition isn't an annotation or modifier the way the overbar was in the one-sided, truth-oriented sequent calculus above. \(P^+\) and \(P^-\) are different atomic propositions, and it wouldn't be right to given an inference rule that had, as its conclusion, \(\Psi, ~ P^+~\mathit{true}, ~ P^-\mathit{false}\). Why? Well, for now let's go with "because I said so." The argument I have for this point isn't bulletproof, and it has to do with the role of atomic propositions as stand-ins for other propositions.

However, if you accept my argument from authority, we are left no way to prove, or even to state, anything equivalent to the classical \(P \vee \neg P\) into polarized logic, since any way we try to polarize this formula will lead to \(P\) needing to be both positive and negative. We're going to need some way, different from negation, of including positive propositions in negative ones.

These inclusions of positive propositions into negative ones (and vice versa) are called shifts - \({\downarrow}A^-\) is a positive proposition and \({\uparrow}A^+\) is a negative proposition. We could just add these two rules and call it a day... \[ \infer {\Psi, ~ P^+~\mathit{true}, ~ {\uparrow}P^+~\mathit{false} \vdash \#} {} \qquad \infer {\Psi, ~ P^-~\mathit{false}, ~ {\downarrow}P^-~\mathit{true} \vdash \#} {} \] ...but this is hardly general: the rules above should be derivable; this rule should be derivable as well: \[ \infer {\Psi, ~ P^+~\mathit{true}, ~{\uparrow}(P^+ \vee^+ Q^+)~\mathit{false} \vdash \#} {} \] All three of these derivable rules share a common property: in an on-paper proof, we would say that the contradiction is "trivial." The hypothesis \({\uparrow}P^+~\mathit{false}\) is trivial due to the fact that \(P^+\) is true by a different hypothesis, and because the truth of \(P^+\) allows us to trivially conclude that \(P^+ \vee^+ Q^+\) is true, \(P^+ \vee^+ Q^+\) is trivially contradictory as well.

This idea is encoded in two rules which capture proof-by-contradiction. One way we establish a contradiction is by showing that \(A^+\) is both false (by assumption) and trivial (by direct proof). The other way we establish a contradiction is by showing that \(A^-\) is both true (by assumption) and false (by direct proof of absurdity). These are embodied in the following two rules:

\[ \infer {\Psi \vdash \#} {{\uparrow}A^+~\mathit{false} \in \Psi &\Psi \vdash A^+~\mathit{trivial}} \qquad \infer {\Psi \vdash \#} {{\downarrow}A^-~\mathit{true} \in \Psi &\Psi \vdash A^-~\mathit{absurd}} \]

Now, of course, we need to give a bunch more rules to describe how to prove positive propositions trivial and negative propositions absurd.

\[ \infer {\Psi, ~ P^+~\mathit{true} \vdash P^+~\mathit{trivial}} {} \qquad \infer {\Psi, ~ P^-~\mathit{false} \vdash P^-~\mathit{absurd}} {} \] \[ \infer {\Psi \vdash \neg^+ A^-~\mathit{trivial}} {\Psi \vdash A^-~\mathit{absurd}} \qquad \infer {\Psi \vdash \neg^- A^+~\mathit{absurd}} {\Psi \vdash A^+~\mathit{trivial}} \] \[ \infer {\Psi \vdash A^+ \wedge^+ B^+~\mathit{trivial}} {\Psi \vdash A^+~\mathit{trivial} &\Psi \vdash B^+~\mathit{trivial}} \qquad \infer {\Psi \vdash A^+ \vee^+ B^+~\mathit{trivial}} {\Psi \vdash A^+~\mathit{trivial}} \qquad \infer {\Psi \vdash A^+ \vee^+ B^+~\mathit{trivial}} {\Psi \vdash B^+~\mathit{trivial}} \] \[ \infer {\Psi \vdash A^- \wedge^- B^-~\mathit{absurd}} {\Psi \vdash A^-~\mathit{absurd}} \qquad \infer {\Psi \vdash A^- \wedge^- B^-~\mathit{absurd}} {\Psi \vdash B^-~\mathit{absurd}} \qquad \infer {\Psi \vdash A^- \vee^- B^-~\mathit{absurd}} {\Psi \vdash A^-~\mathit{absurd} &\Psi \vdash B^-~\mathit{absurd}} \]

Even yet, we are not done! We need to deal with the shifts, which embody another form of proof-by-contradiction: to prove that \(A^-\) holds trivially, assume it's false and derive a contradiction.

\[ \infer {\Psi \vdash {\downarrow}A^-~\mathit{trivial}} {\Psi, A^-~\mathit{false} \vdash \#} \qquad \infer {\Psi \vdash {\uparrow}A^+~\mathit{absurd}} {\Psi, A^+~\mathit{true} \vdash \#} \]

The thing that we've come up with by this process is what I've been calling a "weakly focused" version of classical logic. If we wanted to turn it into a "fully focused" presentation of classical logic, we'd only need to make one change: the first "proof by contradiction" rules, which we call "focusing" rules, would need to require that the context \(\Psi\) includes only judgments of the form \(P^+~\mathit{true}\), \({\downarrow}A^-~\mathit{true}\), \(P^-~\mathit{false}\), and \({\uparrow}A^+~\mathit{false}\). A context including only judgments of these four forms is called stable. To get full focusing, we would modify the "trivial focus" rule like this (a similar modification would be made to the "absurd focus" rule):

\[ \infer {\Psi \vdash \#} {{\uparrow}A^+~\mathit{false} \in \Psi &\Psi \vdash A^+~\mathit{trivial} &\Psi~\mathit{stable}} \]

Thinking about the sequent calculus as a bottom-up proof search procedure, if we are looking for a proof of a non-stable sequent, we can use our original, invertible rules to break down the connectives in the contexts until we have only stable sequents, at which point we can apply a focusing rule.

Until next time...

I haven't quite had time to do the thing I originally set out to do, which was to work through the notation in "Polarity and the logic of delimited continuations" better. But I will save that for another time. The motivation is the same as the one from before: it seems like we're almost certainly duplicating work. Is it possible to give the presentation of polarized classical logic from the previous section using about half as many rules?

Saturday, November 12, 2011

Another take on polarized natural deduction

This has been sitting on my office whiteboard for a few days, where it doesn't do anybody (well, except for me and my officemates) any good. It's a canonical-forms presentation of natural deduction for polarized logic that corresponds to the focused sequent calculus I presented and analyzed in the (recently-updated) Structural focalization draft (PDF warning). The polarized sequent calculus in that draft isn't new: it's a tweaked version of Liang and Miller's authoritative LJF.1 This canonical-forms presentation, however, is not something I've seen, so I'd be interested to know if it's been seen elsewhere: I know this is an area where a lot of other people have been working.

A bidirectional type system for polarized logic

There is, in my mind at least, no argument about what the propositions of polarized intuitionstic logic are; the following is straight of the aforementioned draft, but the basic idea dates back to Girard's 1991 post to the LINEAR mailing list, creatively titled "On the sex of angles".

\( {\bf Positive~propositions:} ~~ A^+, B^+, C^+ ~~ ::= ~~ p^+ \mid {\downarrow}A^- \mid \bot \mid A^+ \vee B^+ \mid \top^+ \mid A^+ \wedge^+ B^+ \)
\( {\bf Negative~propositions:} ~~ A^-, B^-, C^- ~~ ::= ~~ p^- \mid {\uparrow}A^+ \mid A^+ \supset B^- \mid \top^- \mid A^- \wedge^- B^- \)

What makes a proposition positive or negative? Good question! I won't address it here. (I address it a bit in the draft.)

Following tradition and best practices, we will structure the canonical forms presentation as a bidirectional type system. There are three judgments to worry about, as compared to the two judgments in other canonical forms presentations of logic. These judgments include contexts \(\Gamma\), which are sets of negative variables \(x\) associated with negative propositions (\(x{:}A^-\)) and positive variables \(z\) associated with atomic positive propositions (\(z{:}p^+\)).

  • \( \Gamma \vdash R \Rightarrow A^- \) - this is the familiar synthesis judgment from canonical forms presentations; it expresses that the atomic term \(R\) synthesizes \(A^-\). The word "synthesis" is used because it is possible to think of the type \(A^-\) as an output to the judgment, whereas \(\Gamma\) and \(R\) are inputs. In the other two judgments, everything will be treated as an input.

    \( R ::= x \mid R~V \mid \pi_1 R \mid \pi_2 R \)

  • \( \Gamma \vdash V \Leftarrow A^+ \) - this is the new judgment corresponding to right focus in the focused sequent calculus; we say that the value \(V\) checks against \(A^+\).

    \( V ::= z \mid N \mid {\sf inl}~V \mid {\sf inr}~V \mid \langle\rangle^+ \)

  • \( \Gamma \vdash N \Leftarrow [\Omega] A^-\) - this is a modification of the familiar checking judgment from canonical forms presentations, which usually lack the bit about \([\Omega]\), which is an ordered list of positive propositions. The reason we need \([\Omega]\) is precisely because we're dealing with positive propositions, which most canonical forms presentations lack or deal with in an unsatisfactory (in my humble opinion) manner. (I'll return to this point in the discussion at the end.)We say that thus judgment expresses that the normal term \(N\) decomposes \(\Omega\) and verifies \(A^-\).

    \( N ::= R \mid z.N \mid V \mid {\sf let}~R~{\sf in}~N \mid x.N \mid \lambda N \mid \langle\rangle^- \mid \langle N_1, N_2 \rangle^- \mid {\sf abort} \mid [ N_1, N_2 ]\)

Except for the first four rules, everything is patterned in the usual style of presentation for a natural deduction system: for each connective, we give first the introduction rules and then the elimination rules.

Hypothesis and atomic propositions

\[ \infer {\Gamma \vdash R \Leftarrow []p^-} {\Gamma \vdash R \Rightarrow p'^- & p^- = p'^-} \qquad \infer {\Gamma \vdash x \Rightarrow A^-} {x{:}A^- \in \Gamma} \qquad \infer {\Gamma \vdash z \Leftarrow p^+} {z{:}p^+ \in \Gamma} \qquad \infer {\Gamma \vdash z.N \Leftarrow [p^+, \Omega] C^-} {\Gamma, z{:}p&+ \vdash N \Leftarrow [\Omega] C^-} \]

Shifts

\[ \infer {{\uparrow}A^+~{\it stable}} {} \qquad \infer {p^-~{\it stable}} {} \] \[ \infer {\Gamma \vdash V \Leftarrow []{\uparrow}A^+} {\Gamma \vdash V \Leftarrow A^+} \qquad \infer {\Gamma \vdash {\sf let}~R~{\sf in}~N \Leftarrow []C^-} {\Gamma \vdash R \Rightarrow {\uparrow}A^+ & \Gamma \vdash N \Leftarrow [A^+]C^- & C^-~{\it stable}} \] \[ \infer {\Gamma \vdash N \Leftarrow {\downarrow}A^-} {\Gamma \vdash N \Leftarrow []A^-} \qquad \infer {\Gamma \vdash x.N \Leftarrow [{\downarrow}A^-, \Omega]C^-} {\Gamma, x{:}A^- \vdash N \Leftarrow [\Omega]C^-} \]

Connectives

\[ \infer {\Gamma \vdash \lambda N \Leftarrow []A^+ \supset B^-} {\Gamma \vdash N \Leftarrow [A^+]B^-} \qquad \infer {\Gamma \vdash R~V \Rightarrow B^-} {\Gamma \vdash R \Rightarrow A^+ \supset B^- & \Gamma \vdash V \Leftarrow A^+} \] \[ \infer {\Gamma \vdash \langle\rangle^- \Leftarrow \top^-} {} \qquad {\it (no~elim~rule~for~\top^-)} \] \[ \infer {\Gamma \vdash \langle N_1, N_2 \rangle^- \Leftarrow A^- \wedge^- B^-} {\Gamma \vdash N_1 \Leftarrow A^- & \Gamma \vdash N_2 \Leftarrow B^-} \qquad \infer {\Gamma \vdash \pi_1 R \Rightarrow A^-} {\Gamma \vdash R \Rightarrow A^- \wedge^- B^-} \qquad \infer {\Gamma \vdash \pi_2 R \Rightarrow B^-} {\Gamma \vdash R \Rightarrow A^- \wedge^- B^-} \] \[ {\it (no~intro~rule~for~\bot)} \qquad \infer {\Gamma \vdash {\sf abort} \Leftarrow [\bot, \Omega]C^-} {} \] \[ \infer {\Gamma \vdash {\sf inl}~V \Leftarrow A^+ \vee B^+} {\Gamma \vdash V \Leftarrow A^+} \qquad \infer {\Gamma \vdash {\sf inr}~V \Leftarrow A^+ \vee B^+} {\Gamma \vdash V \Leftarrow B^+} \qquad \infer {\Gamma \vdash [N_1, N_2] \Leftarrow [A^+ \vee B^+, \Omega] C^-} {\Gamma \vdash N_1 \Leftarrow [A^+, \Omega] C^- & \Gamma \vdash N_2 \Leftarrow [B^+, \Omega] C^-} \] \[ \infer {\Gamma \vdash \langle\rangle^+ \Leftarrow \top^+} {} \qquad \infer {\Gamma \vdash \langle\rangle.N \Leftarrow [\top^+, \Omega] C^-} {\Gamma \vdash N \Leftarrow [\Omega] C^-} \] \[ \infer {\Gamma \vdash \langle V_1^+, V_2^+ \rangle \Leftarrow A^+ \wedge^+ B^+} {\Gamma \vdash V_1^+ \Leftarrow A^+ & \Gamma \vdash V_2^+ \Leftarrow B^+} \qquad \infer {\Gamma \vdash N \Leftarrow [A^+ \wedge^+ B^+, \Omega] C^-} {\Gamma \vdash N \Leftarrow [A^+, B^+, \Omega] C^-} \]

Discussion

There are two possible questions I want to address about this system in the previous section.

What's with those positive "elimination" rules?

It would be possible to complain that the system above is not very "natural deduction-ey" after all - for all the positive connectives, I basically give sequent calculus left rules instead of natural deduction elimination rules. What happened to the usual "large elimination"-style elimination rules, for instance the usual disjunction-elimination rule whose proof term is a case analysis? \[ \infer {\Gamma \vdash ({\sf case}~R~{\sf of}~{\sf inl}~x \rightarrow N_1 \mid {\sf inr}~y \rightarrow N_2) \Leftarrow C} {\Gamma \vdash R \Rightarrow A \wedge B & \Gamma, x{:}A \vdash N_1 \Leftarrow C & \Gamma, y{:}B \vdash N_2 \Leftarrow C} \]

I think that the answer can be given by looking at the shifts. Essentially, every large elimination as we know and love it follows from the structure of the \({\uparrow}\) elimination rule, which all on its own looks an awful lot like a cut. You should verify for yourself that, if you let \({\sf case}~R~{\sf of}~{\sf inl}~x \Rightarrow N_1 \mid {\sf inr}~y \Rightarrow N_2\) be defined as syntactic sugar for \({\sf let}~R~{\sf in}~[ x.N_1, y.N_2]\), then the following rule is derivable whenever \(C^-~{\it stable}\) holds.2 \[ \infer {\Gamma \vdash {\sf case}~R~{\sf of}~{\sf inl}~x \Rightarrow N_1 \mid {\sf inr}~y \Rightarrow N_2 \Leftarrow C^-} {\Gamma \vdash R \Rightarrow {\uparrow}({\downarrow}A^- \vee {\downarrow}B^-) & \Gamma, x{:}A^- \vdash N_1 \Leftarrow C^- & \Gamma, y{:}B^- \vdash N_2 \Leftarrow C^-} \]

Pay attention to those two appearances of the downshift \({\downarrow}\) - they tell you something important about the structure of the usual elimination rules, which is that they "lose focus" while decomposing the disjunction. The usual way of thinking of normal natural deduction doesn't require, when you decompose \(A \vee B\) in an elimination, that you continue decomposing \(A\) and \(B\), which is represented here by the fact that, to match the structure of the usual elimination rule, you have to put downshifts \in explicitly. Jacob Howe, in his thesis and in his excellent paper "Proof search in lax logic," demonstrates this by making a focused sequent calculus that corresponds to the usual (constantly-interrupted) notion of decomposing positive propositions that you get if you follow your intuitions from natural deduction too closely.

By gathering all the large eliminations together in the \({\uparrow}\) elimination rule, we allow for the usual large eliminations to be defined, but also allow for the possibility that we might want to "chain" large eliminations in a well-defined way. (As an exercise, consider the structure of the elimination rule for \({\uparrow}(({\downarrow}A^- \wedge^+ {\downarrow}B^-) \vee (p^+ \wedge^+ \top^+))\).) This is why I claim that this is a natural deduction system that corresponds to the focused sequent calculus, instead of Howe's system where it's the other way around.3

Where are all the patterns?

Patterns have been associated with focused and/or canonical forms presentations of logic ever since... well, since Neel wrote the paper "Focusing on pattern matching"... or maybe since Noam wrote "Focusing and higher-order abstract syntax"... well, really at least since the CLF tech report. A lot of these, notably Noam's systems, have presented the rules of logic using pattern judgments, devices which abstractly represent the way in which values of a particular (positive) type are constructed or the way atomic terms of a particular (negative) type are eliminated.

There's this picture that isn't fully formed in my head, but that I've been thinking about for some time. On the left side of this picture, I think, you have the (pattern-free) presentation of natural deduction that I have given here at the top, and the (pattern-free) focused sequent calculus from "Structural focalization" at the bottom. Then, in the middle, you have (at the top) a natural deduction system that uses Noam's pattern judgments to introduce negative propositions and eliminate positive propositions - this is precisely (or at least very nearly) Taus Brock-Nannestad and Carsten Schürmann's system from "Focused Natural Deduction." Below it, there is a sequent calculus system that uses Noam's pattern judgments to eliminate negative propositions and introduce positive propositions. Kevin Watkins and Frank Pfenning came up with this idea and named it the "skeleton calculus" (a reference to the "spine calculus" of Cervesato and Pfenning), but it hasn't been written up that I know of. The skeleton calculus was what I was thinking about this morning when I decided to write this post. Then, on the far right, you have Noam's system, which is entirely pattern-based: patterns are used to both introduce and eliminate all connectives, so that the logic itself basically doesn't "know" about any connectives at all. This hazy picture is why, in the structural focalization draft, I mentioned that I thought Noam's system was a "natural synthesis of natural deduction and sequent calculus presentations".

But why should the picture look like the one I sketched above? Why not have a natural deduction system that uses patterns to introduce positives and eliminate negatives, or a sequent calculus that uses patterns to eliminate positives and introduce negatives? There's also the elephant in the room: CLF, which has both natural deduction and sequent calculus presentations, but which, in both instances, uses patterns only in the elimination of positive connectives. What are all these options doing here, and what are we to make of them? I don't know (yet).


1 I should add that, while the adaptation of LJF isn't particularly interesting, the proof term assignment I give is different than any others I've seen and I'm pretty happy with it; that's another case where I'd be interested to know if others have done anything similar.
2 This additional requirement of stability just reflects that it's always possible to restrict large eliminations in a canonical forms presentation of natural deduction to situations where the; this isn't always required in canonical forms presentations of natural deduction, but is an important part of making sure the sequent calculus presented in "Structural focalization" corresponds correctly to the natural deduction presentation.
3 I specifically suspect that this is a natural deduction system isomorphic to the focused sequent calculus from Structural focalization, but I don't want to make that claim until I've proved it.

Tuesday, September 27, 2011

My New Focalization Technique is Unstoppable

While it took, as they say, a bit of doing, I have a completed draft of a paper on my website that I believe provides a really elegant solution to what has been a very, very annoying problem for some time: writing down a proof called the completeness of focusing. Don't worry, the paper explains what that means: it has to, because one of the things the paper argues is that most existing statements of the completeness of focusing aren't general enough! In the process, this writeup gave me a good (enough) excuse to talk in a very general way about a lot of fundamental phenomena around the idea of polarization in logic that we've been noticing in recent years.

Anyway, the draft is here - Structural Focalization. The accompanying Twelf development is, of course, on the Twelf wiki: Focusing. I've thrown the paper and the Twelf code up on arXiv while I figure out while I figure out what to do with it; comments would be greatly appreciated!

I don't want to have a post that's just "hey here's a paper," so here's an addition, a follow-up to the rather popular "Totally Nameless Representation" post from awhile back. I still prove weakening in Agda by exactly the technique presented there, which commenter thecod called the "presheaf method." But for natural deduction systems, I almost never use the term metric approach that was the core of what I was presenting in "totally nameless representation," since it works just fine to use the approach where term M of type A (the one you're substituting for x) is in context Γ and the term N (the one with x free) is in context Γ' ++ A :: Γ - the free variable x is allowed to be in the middle of the context, in other words; you don't have to use exchange on the second term, so you're always calling induction on something Agda recognizes as structurally smaller.

This worked for natural deduction systems, but I didn't think it would work for sequent calculi, since you needed to toy around with both contexts and induct on both terms in presentation of cut for a sequent calculus. However, for a focused sequent calculus like what I present in the paper, you still can do without the totally nameless metric! If you set things up right, the rightist substitutions (where you work on decomposing the right term, the one with the variable free) allow you to extend the context only of the first term, and the leftist substitutions (where you work on decomposing the left term, the one you're substituting for the variable) allow you to work on the second term, and the two are only connected by the principal substitutions (which reduce the type, which more or less lets you get away with anything as far as the induction principle is concerned).

A code example of this, which could use some more commentary, can be found on GitHub: Polar.agda.

Saturday, May 28, 2011

Party like it's 1995

It is humbling but enlightening when you're struggling to figure out how to describe something and you realize someone already nailed the exposition two decades ago. That's how I felt when I stumbled across this invited paper by Dale Miller, "Observations about using logic as a specification language." Still a good paper!

I wonder if the late eighties and early nineties felt as exciting at the time as they look in retrospect for the logic community. I guess the answer is "yes" - I recall Bob Harper talking about one informal European workshop in that period where he was presenting LF, Girard was presenting linear logic, and maybe one of the uniform proofs people were presenting uniform proofs too? Maybe someone who's spending the summer in Pittsburgh can prod Bob into blogging about that meeting, it's a good story. (This reminds me, I need to add Bob's blog to the sidebar!).

Also: completeness of focusing via cut and identity


Speaking of uniform proofs: I've been fiddling for some time with this view of that the completeness of a focused sequent calculus can be proven more-or-less directly from the standard metatheory of the focused sequent calculus: cut admissibility and identity expansion. (The completeness of focusing, if you recall, is good for many reasons. It gives us logic programming, and it allows us to think in terms of synthetic inference rules in our logical frameworks.) Frank first observed this way back when, I cleaned up and Twelfed up his proof a long time ago and wrote up a version of the argument for ordered linear logic in a tech report recently. the main weakness of our strategy is that, in its current version, it doesn't deal with inversion on positive types, which is why I call it "weak focusing" instead of "focusing."

A recent MathOverflow question asked (in a slightly convoluted way) about the completeness of focusing with respect to a Prolog-like language. Since Prolog-like languages don't run into the problem with positive types, "our" strategy* works fine, so I used this as an excuse to finally write out the full argument in a setting that didn't have a lot of other extraneous stuff going on.

* I'd like to know if someone came up with this strategy before us!

Wednesday, December 22, 2010

Two twists on a graph

So, something that shouldn't be terribly surprising is that one can take a graph (edges and vertices) and represent it as a set of vertices. This innocent-looking cube will be our example:
   edge(a, b)   
edge(a, c)
edge(c, d)
edge(b, d)
edge(a, e)
edge(b, f)
edge(c, g)   
edge(d, h)
edge(e, f)
edge(e, g)
edge(g, h)
edge(f, h)
vertex(a)
vertex(b)
vertex(c)
vertex(d)
vertex(e)
vertex(f)
vertex(g)
vertex(h)

We'll collect this set of facts and call the collection of propositions (20 atomic propositions in all) GRAPH.

Also not terribly interesting, but maybe ever-so-slightly interesting, is that one can take this representation of a graph and do forward-chaining logic programming with it. Consider, for example, the following forward-chaining logic program made up of two rules.
  edge(Y,X) <- edge(X,Y).
reachable(Y) <- edge(X,Y), reachable(X), vertex(Y).
These are just logical propositions; the upper-case identifiers are implicitly universally quantified, the comma represents conjunction, and (following common convention for logic programming) we write implication backwards as B <- A instead of A -> B. We'll call this collection of two rules REACH.

Recall that the behavior of a forward chaining logic program is to take facts that it has and learn more facts. If we just throw the graph at this program, it will use the first rule to derive all the backwards edges (edge(b,a), edge(c,a), and so on) but nothing else. If we add an additional single fact reachable(a), then our forward chaining logic program will furiously learn that b, c, and indeed all the other vertices are reachable. We can rephrase this in the following way: for a specific choice of X and Y, there is a path from X to Y if and only if there is a derivation of this sequent:
  GRAPH, REACH ⊢ reachable(X) -> reachable(Y)
In fact, the forward chaining logic programming engine, given reachable(X) for some specific X, will derive a fact of the form reachable(Y) for every reachable Y.

Twist 1: weighted logic programming

The first interesting twist on our boring program is this: we consider those proofs of the sequent
  GRAPH, REACH ⊢ reachable(X) -> reachable(Y)
and we ask, what if some proofs are better than others? One way to introduce a metric for "what is better" is to say that every one of those atomic propositions has a cost, and every time we use that atomic proposition in a proof we have to pay that cost: we want the cheapest proof:
   edge(a, b) = 66   
edge(a, c) = 53
edge(c, d) = 12
edge(b, d) = 57
edge(a, e) = 19
edge(b, f) = 53
edge(c, g) = 92   
edge(d, h) = 6
edge(e, f) = 8
edge(e, g) = 84
edge(g, h) = 162
edge(f, h) = 4
vertex(a) = 0
vertex(b) = 0
vertex(c) = 0
vertex(d) = 0
vertex(e) = 0
vertex(f) = 0
vertex(g) = 0
vertex(h) = 0
This sort of problem is amenable to what is called weighted logic programming, a declarative way of specifying dynamic programming problems. In this case, if the edge weights are all positive, this is a declarative way of specifying Dijkstra's algorithm for single-source shortest path. Before, our forward-chaining logic programming engine took a single fact reachable(X) and computed all the nodes reachable from X. Now, our weighted logic programming engine takes a single fact reachable(X) with cost 0 and computes the minimum-cost route to every node reachable from X.

Related work

In fact we can play the same game on any semiring as long as the "score" of a proof is the semiring product of the score of the axioms (in this case, "plus" was the semiring product) and the way we combine the scores of different proofs is the semiring sum of the scores of the two proofs (in this case, "min" was the semiring sum). This sort of generalization-to-an-arbitrary semiring is something that isn't noticed as much as it probably should be. If you're interested in learning or thinking more about weighted logic programming, the first eleven pages of the paper I coauthored with Shay Cohen and Noah Smith, Products of Weighted Logic Programs, were written specifically as a general audience/programming languages audience introduction to weighted logic programming. Most of the other literature on weighted logic programming is by Jason Eisner, though the semiring generalization trick was first extensively covered by Goodman in the paper Semiring Parsing; that paper was part of what inspired Eisner's work on weighted logic programming, at least as I understand it.

Twist 2: linear logic programming

Another direction that we can "twist" this initially graph discussion is by adding linearity. Linear facts (in the sense of linear logic) are unlike "regular" facts - regular facts can be used any number of times or used not at all - they are persistent. Linear facts can only be used once in a proof - and in fact, they must be used once in a proof.

We've had the vertex propositions hanging around all this time and not doing much; one thing we can do is make these vertex propositions linear, and we'll make the reachable propositions linear as well. In this example, we'll designate linear propositions by underlining them and coloring them red; our program before turns into this:
  edge(Y,X) <- edge(X,Y).
reachable(Y) <- edge(X,Y), reachable(X), vertex(Y).
By the way, if you're familiar with linear logic, the logical meaning of this formula is captured by the following two propositions written properly in linear logic:
  ∀x. ∀y. !edge(x,y) -o edge(y,x).
∀x. ∀y. !edge(x,y) -o reachable(x) -o vertex(y) -o reachable(y).
We'll talk a little bit more about this specification (call it LREACH) shortly; for the moment let's switch to a different train of thought.

Greedy logic programming with linear logic

When we talked before about the cost of using a fact in a proof, we were actually assuming a little bit about proofs being focused; we'll talk a little bit about focusing and synthetic rules in this section (see this old blog post for background).

If we make all of our atomic propositions positive, then the synthetic rule associated with the critical second rule above ends up looking like this:
  Γ, edge(X,Y); Δ, reachable(Y) ⊢ C
---------------------------------------------
Γ, edge(X,Y); Δ, reachable(X), vertex(Y) ⊢ C
Recall that we like to read forward chaining rules from the bottom to the top. If we assume that the context only ever contains a single reachable(X) proposition, then we can think of the linear context as a resource consuming automaton that is in state X. At each step, the automaton consumes an available resource vertex(Y) with the constraint that there must be an edge from X to Y; the automaton then transitions to state Y.

One efficient way to give a logic programming semantics to these kind of forward-chaining linear logical specifications is to just implement an arbitrary run of the automaton without ever backtracking. This is a greedy strategy, and it can be used to efficiently implement some greedy algorithms. This was the point of Linear Logical Algorithms, a paper I coauthored with Frank Pfenning. One neat example from that paper was the following graph program, which computes a spanning tree for a graph rooted at the vertex a:
  edge(Y,X) <- edge(X,Y).
intree(a) <- vertex(a).
tree(X,Y), intree(Y) <- edge(X, Y), intree(X), vertex(Y).
This program works like another little resource-consuming automaton that, at each step, consumes a vertex resource to add a non-tree vertex to the tree along a newly added tree edge. This is, incidentally, the same way Prim's algorithm for minimum spanning trees progresses.

Hamiltonian paths in linear logic

The greedy strategy where we look at one possible set of choices is very different than the behavior of our briefly-sketched forward-chaining persistent and weighted logic programming engines. The latter worked by taking some initial information (reachable(X) for some X) and finding, all at once and with reasonable efficiency, every single Y for a given X such that the following sequent holds:
  GRAPH, REACH ⊢ reachable(X) -> reachable(Y)
Call the (persistent) edges EDGE, and the (linear) vertices VERT. Now we can look at the (seemingly) analogous problem stated in linear logic to see why it is in practice, so different:
  LREACH, EDGE; VERT ⊢ ⊢ reachable(X) -o reachable(Y)
Because linear logic requires that we eventually use every resource in VERT, this derivation corresponds to a run of the automaton where the automaton visited every single node and then ended at Y. If we consider just the case where X and Y are the same, complete derivations correspond to Hamiltonian tours.

By adding linearity, we've made the forward chaining language more expressive, but in doing so we created a big gulf between the "existential" behavior (here's one way the automaton might work, which we found by being greedy) and the "universal" behavior (the automaton can, potentially, follow a Hamiltonian path, which we learned by calling a SAT solver). This gap did not exist, at least for our running example, in the persistent or weighted cases.

Combining the twists?

Both linear logic programming and weighted logic programming arise start from a simple observation: the shape of proofs can capture interesting structures. The "structure" in this case was a path through a graph; in weighted logic programming the structures are often parse trees - you know, "boy" is a noun, "fast" is an adverb, "the" is a demonstrative adjective, and the two combined are a noun phrase, until you get a tree that looks something like this:
    Sentence 
/ \
NP VP
/| | \
/ | | \
D N V A
| | | |
the boy ran fast
(More on this in the "Products of Weighted Logic Programs" paper.)

Weighted logic programming enriches this observation by allowing us to measure structures and talk about the combined measure of a whole class of structures; linear logic, on the other hand, greatly improves on the things that we can measure. But what about using the observations of weighted logic programming to assign scores in the richer language of linear logic programming? What would that look like? I'll give a couple of sketches. These are questions I don't know the answers to, don't know that I'll have time to think about in the near future, but would love to talk further about with anybody who was interested.

Minimum spanning tree

I commented before that the logic programming engine described in Linear Logical Algorithms implements something a bit like Prim's algorithm to produce a graph spanning tree. In fact, the engine we described does so with provably optimal complexity - O(|E| + |V|), time proportional to the number of edges in the graph or the number of vertices in the graph (whichever is larger). What's more, we have a good cost semantics for this class of weighted logic programs; this means you don't actually have to understand how the interpreter works, you can reason about the runtime behavior of linear logical algorithms on a much more abstract level. The recorded presentation on Linear Logical Algorithms, as well as the paper, discuss this.

If we replace a queue inside the engine with an appropriately configured priority queue, then the logic programming engine actually implements Prim's algorithm for the low cost of an log(|V|) factor. But can this be directly expressed as a synthesis of weighted logic programming and linear logic programming? And can we do so while still maintaining a reasonable cost semantics? I don't know.

Traveling salesman problem

While minimum spanning tree is a greedy "existential" behavior of linear logic proof search, the weighted analogue of the Hamiltonian Path problem is just the Traveling Salesman problem - we not only want a Hamiltonian path, we want the best Hamiltonian path as measured by a straightforward extension of the notion of weight from weighted logic programming.

Not-linear logics?

Adding some linearity turned a single-source shortest path problem into a Hamiltonian path problem - when we turned vertices from persistent facts into linear resources, we essentially required each vertex to be "visited" exactly once. However, an affine logic interpretation of the same program would preserve the resource interpretation (vertices cannot be revisited) while removing the obligation - vertices can be visited once or they can be ignored altogether.

This makes the problem much more like the shortest path problem again. If edge weights are positive, then the shortest path never revisits a node, so it's for all intents and purposes exactly the shortest path problem again. If edge weights are negative, then disallowing re-visiting is arguably the only sensible way to specify the problem at all. Can this observation be generalized? Might it be possible to solve problems in (weighted) affine logic more easily than problems in weighted linear logic?