Tuesday, September 27, 2011

My New Focalization Technique is Unstoppable

While it took, as they say, a bit of doing, I have a completed draft of a paper on my website that I believe provides a really elegant solution to what has been a very, very annoying problem for some time: writing down a proof called the completeness of focusing. Don't worry, the paper explains what that means: it has to, because one of the things the paper argues is that most existing statements of the completeness of focusing aren't general enough! In the process, this writeup gave me a good (enough) excuse to talk in a very general way about a lot of fundamental phenomena around the idea of polarization in logic that we've been noticing in recent years.

Anyway, the draft is here - Structural Focalization. The accompanying Twelf development is, of course, on the Twelf wiki: Focusing. I've thrown the paper and the Twelf code up on arXiv while I figure out while I figure out what to do with it; comments would be greatly appreciated!

I don't want to have a post that's just "hey here's a paper," so here's an addition, a follow-up to the rather popular "Totally Nameless Representation" post from awhile back. I still prove weakening in Agda by exactly the technique presented there, which commenter thecod called the "presheaf method." But for natural deduction systems, I almost never use the term metric approach that was the core of what I was presenting in "totally nameless representation," since it works just fine to use the approach where term M of type A (the one you're substituting for x) is in context Γ and the term N (the one with x free) is in context Γ' ++ A :: Γ - the free variable x is allowed to be in the middle of the context, in other words; you don't have to use exchange on the second term, so you're always calling induction on something Agda recognizes as structurally smaller.

This worked for natural deduction systems, but I didn't think it would work for sequent calculi, since you needed to toy around with both contexts and induct on both terms in presentation of cut for a sequent calculus. However, for a focused sequent calculus like what I present in the paper, you still can do without the totally nameless metric! If you set things up right, the rightist substitutions (where you work on decomposing the right term, the one with the variable free) allow you to extend the context only of the first term, and the leftist substitutions (where you work on decomposing the left term, the one you're substituting for the variable) allow you to work on the second term, and the two are only connected by the principal substitutions (which reduce the type, which more or less lets you get away with anything as far as the induction principle is concerned).

A code example of this, which could use some more commentary, can be found on GitHub: Polar.agda.

Thursday, September 8, 2011

Nondeterministic thoughts on nondeterminism

So if you've been hanging around lately, I've been writing posts where I think I'm talking about new ideas. (I'm not always correct.) This post, on the other hand, is definitely more a review-and-synthesis sort of post; mostly stuff I gleaned over the summer from reading up on Dijkstra's GCL, Ball et al.'s Automatic predicate abstraction of C programs, and a number of K. Rustan M. Leino's papers as synthesized for me by my advisors Aditya Nori and Sriram Rajamani at Microsoft Research India.

The first section represents my somewhat simplistic thoughts on other's people's work in the semantics of imperative programming languages, mostly thoughts I had this summer at MSR. I hope they are merely naive and simplistic and not actively wrong, and that you (gentle reader) will have patience to correct me where I am actively wrong. Then I have three short, mostly unrelated discussions that I needed to clear out of my head. The first discussion reviews a neat way of understanding loop invariants, due to Leino. The second talks about the algebraic properties of non-deterministic programs. The third discussion tries to relate nondeterminism to the work I'm doing on representing transition systems in linear logic, though it's mostly just speculative and may not make sense to anyone but me in its current half-baked form.

C programs and their abstractions

Our starting point will be, essentially, a generic while language that we'll write with C-like syntax. Here's a program:

  /* Example 1 */
  1. if(x > 0) {
  2.   x += 5;
  3. } else {
  4.   x = -x - 2;
  5. }
  6. return x;

The thing about a (fully-defined) C program is that it's deterministic - for any initial assignments to variables, there's a unique behavior. So, for the above program, if the initial value of x is, say, 4, then the program will execute line 1 (with x=4), then line 2 (with x=4), then line 6 (with x=9), and the program will then return 9. If the initial value of x is -12, then the program will execute line 1 (with x=-12), then line 4 (with x=-12), then line 6 (with x=10), and the program will then return 10. I've now implicitly told you what what counts as a "behavior" - it's a stream of line numbers and states that may or may not end with a returned value. So the meaning of a program is a function from initial states to streams of line numbers and states.

We can even think of the line numbers as a convenient fiction: we could translate the program above into this form:

   linenum = 1;
   checkpoint();
   if(x > 0) {
     linenum = 2;
     checkpoint();
     x += 5;
   } else {
     linenum = 4;
     checkpoint();
     x = -x - 2;
   }
   linenum = 6;
   checkpoint();
   return x;

Then we say that the behavior of a program is a function from initial states to the stream of intermediate states as reported by the special checkpoint() function; the "line number" part of the stream is just handled by the value associated with linenum in the memory state.

So that's the meaning (the denotation) that I choose for deterministic C programs: they're functions from initial states to streams of states (where the state includes the line number). From here on out I'll just think of any given deterministic C program as a way of specifying one such function. There are also many such functions that can't be written down appropriately with the syntax of a C program; that's not my concern here.

Nondeterminism

Instead, I want to talk about non-deterministic C programs. The meaning of a nondeterministic C program is a function from initial states to sets of streams of states,1 and the syntax we'll use to write down nondeterministic C programs is an extension of the syntax of deterministic C programs. This means, of course, that there's a trivial inclusion of deterministic C programs into nondeterministic C programs.

The easiest way to come up with nondeterministic C programs (which represent sets of functions from initial states to streams) is to turn if branches into nondeterministic branches. The standard way that the C model checking folks represent nondeterministic choice is to write if(*). Here's an example:

  /* Example 2 */
  1. if(*) {
  2.   x += 5;
  3. } else {
  4.   x = -x - 2;
  5. }
  6. return x;

The *, near as I can tell, was chosen to resemble the "wildcard" symbol, since any boolean expression we put in that if statement (such as x > 0 to recover our Example 1) results in a program that refines Example 2. (Terminology: a program refines another if, for every initial state, every stream in the meaning of the more-refined program also belongs to the meaning of the less-refined program.)

Assume/impossible

Nondeterministic choice allows us to enlarge the meaning of a nondeterministic program. Inserting assume() statements will cut down the set of streams in the meaning of a nondeterministic program. In particular, we have to exclude any streams that would, for any initial state, violate an assumption. We therefore have to be careful that we don't cut down the set of streams so far that there aren't any left: there are no deterministic programs that refine assume(false) - every initial state maps to the empty set of streams. For that matter, there also aren't any deterministic programs that refine Example 3:

  /* Example 3 */
  1. assume(x > 0);
  2. return x;

For every initial state where the value of x is positive, the meaning of Example 3 is a set containing only the stream that goes to line one, then line two, then returns the initial value of x. For every initial state where the value of x is not positive, the meaning of the program has to be the empty set: any stream would immediately start by violating an assumption.

Assumptions were used in Ball et al.'s "Automatic predicate abstraction of C programs" paper, which explains part of the theory behind the SLAM software verifier. In that work, they got rid of all of the if statements as a first step, replacing them with nondeterministic choices immediately followed by assumptions.2

  /* Example 4 */
  1. if(*) {
       assume(x > 0);
  2.   x += 5;
  3. } else {
       assume(x <= 0);
  4.   x = -x - 2;
  5. }
  6. return x;

The program in Example 4 is basically a degenerate nondeterministic program: its meaning is exactly equivalent to the deterministic Example 1. On the other hand, if we remove the statement assume(x <= 0) after line 3 in Example 4, we have a nondeterministic program that is refined by many deterministic programs. For instance, the deterministic program in Example 1 refines Example 4 without the assume(x <= 0), but so do Examples 5 and 6:

  /* Example 5 */
  1. x = x;
  4. x = -x - 2;
  6. return x;

  /* Example 6 */
  1. if(x > 100) {
  2.   x += 5;
  3. } else { 
  4.   x = -x - 2;
  5. }
  6. return x;

Example 4 as it was presented shows how assume() together with nondeterministic choice can encode normal if statements. We could also define an statement impossible and say that a stream just cannot ever reach an impossible statement. Impossibility can be defined in terms of assume - impossible is equivalent to assume(false). Alternatively, we can use impossibility to define assumptions - assume(e) is equivalent to if(e) { impossible; }. So there's a bit of a chicken-and-egg issue: I'm not totally sure whether we should build in impossible/if combination and use it to define assume() or whether we should build in assume() and use it to define impossible and if statements. It probably doesn't matter.

Assert/abort

In "Automatic predicate abstraction of C programs," assert() statements are understood to be the things that are supposed to be avoided. However, in Leino's work, they have a meaning of absolute and unbounded nondeterminism, which is the interpretation I want to use. If the expression e in an assert(e) statement evaluates to false, anything can happen - it's as if we could jump to an arbitrary point in memory and start executing code; absolutely any deterministic program that refines a nondeterministic program up to the point where the nondeterministic program fails an assertion will definitely refine that nondeterministic program.

So assert() represents unbounded nondeterminism: and in the sense of "jump to any code," not just in the sense of "replace me with any code" - the program assert(false); while(true) {} is refined by every program, including ones that terminate. This interpretation is easy to connect to the SLAM interpretation where we say "assertion failures are to be avoided," since obviously one of the things you might prove about your C code is that it doesn't jump to arbitrary code and start executing it.

Analogy: assert() is to abort as assume() is to impossible - we can define assert(e); as if(e) { abort; }.

Abstracting loops

The three primitives we have discussed so far are almost enough to let us perform a fun trick that my advisors at MSR attribute to Leino. First, though, we need one more primitive, a "baby" form of assert/abort called havoc(x), which allows the value associated with the variable x to be changed in any way. In other words, a program with the statement havoc(x) is refined by the program where the statement havoc(x) is replaced by the statement x = 4, the statement x = x - 12, the statement x = y - 16, or even the statement if(z) { x = y; } else { x = w; }.

Given the havoc primitive, imagine we have a program with a loop, and no checkpoints inside the loop:

  1. /* Before the loop */
     while(e) {
       ... loop body, 
        which only assigns 
        to variables x1,...,xn ...
     }
  2. /* After the loop */

Say we know the following two things:

  • The loop will always terminate if the expression e_inv evaluates to true at line 1, and
  • From any state where e_inv and e both evaluate to true, after the loop body is run, e_inv will evaluate to true.
Then, we know the program above refines the following program:

  1. /* Before the loop */
     assert(e_inv);
     havoc(x1); ... havoc(xn);
     assume(e_inv && !e);
  2. /* After the loop */

This is a somewhat unusual way of looking at loop invariants: we can take a loop and abstract it with nondeterministic straight line code. If we get to the beginning of the loop and our loop invariant is violated, all hell breaks loose, but if the loop invariant holds at the beginning, then when we exit the loop we know the following things: variables not assigned to by the loop haven't changed, the loop invariant holds, and the negation of the loop guard holds.

I like this particular game: it's a non-traditional way of looking at the analysis of loops by asking "is this program abstraction sound."

Algebraic structure of non-determinism

Notice that assume(false) is the unit of non-deterministic choice: writing the nondeterministic program if(*) assume(false); else Stmt is, in all program contexts, the same as writing just Stmt. Furthermore, nondeterministic choice is commutative (if(*) Stmt1; else Stmt2; is equivalent to if(*) Stmt2; else Stmt1;) and associative (it doesn't matter how I nest nondeterministic choices if I want to make a three-way nondeterministic choice). This means that nondeterministic choice and the impossible statement (which is equivalent to assume(false)) form a monoid - it's like the wildcard * could also be interpreted as multiplication, and we could write if(*) Stmt2; else Stmt1; as Stmt1 × Stmt2 and write impossible as 1.

Furthermore, if(*) assert(false); else Stmt is a nondeterministic program that is refined by every program, since assert(false) refines every program and we can just use the "wildcard" reading of nondeterminism to replace * with true. Algebraically, this means that the abort statement (which is equivalent to assert(false)) annihilates nondeterminism - we could write impossible as 0, and then we have 0 × Stmt = Stmt × 0 = 0.

Is abort the unit of a binary operation Stmt1 + Stmt2 in the same way that the number 0 is the unit of addition? It's not clear that it's useful for the abstraction of C programs, but I think if we go looking for a binary operation that abort is the unit of, what we'll find is perhaps best called both.

Both-and statements

The both statement is a weird statement that allows us to complete the picture about the algebraic structure of abort/assert(false), impossible/assume(false) and if(*). A deterministic program only refines the program both Stmt1 and Stmt2 if it refines both Stmt1 and Stmt2. As an example the program below is exactly equivalent to Example 1 - the first statement forces all streams starting from initial states where x is not positive to hit line 4 and not line 2 (lest they violate the assume(x > 0), and the second statement forces the initial states where x is positive to hit line 2 and not line 4 (lest they violate the assume(x <= 0).

  /* Example 7 */
     both {
  1.   if(*) {
         assume(x > 0);
  2.     x += 5;
  3.   } else {
  4.     x = -x - 2;
  5.   }
     } and {
  1.   if(*) {
  2.     x += 5;
  3.   } else {
         assume(x <= 0);
  4.     x = -x - 2;
  5.   }
     }
  6. return x;

The both statement is a binary operation whose unit is abort, forming a monad: doing anything at all - as long as it's some specific thing - is the same as doing that specific thing. The both is also annihilated by impossible, because doing nothing as long as it's one specific thing is the same thing as doing nothing. (This is all very fuzzy, but can be made formal in terms of set union and intersection operations.) That's interesting, because both statements aren't really like addition at all: we have two monoids whose the units both annihilate the binary operator that they aren't the unit of. If distributivity works in both directions (I'm not sure it does...) then we have a Boolean algebra without negation (what's that called?).

Two linear logic interpretations

One of the things that I like to do is to take the state of an evolving machine, encode it as a linear logic context Δ, and then say that the linear logic derivability judgment Δ ⊢ A proves something about the state. I think there are two ways of doing this for the language semantics I wrote out above. One is familiar, both are interesting.

The existential interpretation

The way I'm used to thinking about these things is that Δ ⊢ A shows that A is a possible (that is, existential) property of the system encoded into the linear logic context Δ. In that case, we want to encode nondeterministic as the additive conjunction A & B, as we can execute a nondeterministic choice by taking either choice. We want to encode impossible as , which prevents us from proving anything interesting about a series of nondeterministic choices that lead us to an impossible point in the program. We'll furthermore want to encode abort as 0, since once we reach an abort we can prove any existential property that we want to about the program!

The universal interpretation

Another way of understanding derivability would be to say that Δ ⊢ A shows that A is a necessary property of the system encoded into the linear logic context Δ. In that case, we would want to encode nondeterministic choice as the additive disjunction A ⊕ B, since to prove that something necessarily holds of a nondeterministically branching program, we have to show that it holds regardless of how the nondeterministic choice is resolved. This ensures that we will have to consider all possible resolutions of nondeterministic choices, but reaching an impossible state means that no programs can get to the current state, so all universal properties hold vacuously. We therefore would want to encode impossible as 0. On the other hand, reaching an abort means we know nothing about the universal properties of the program - the program can now do anything - so we encode abort as .


1 Note that functions from initial states to sets of streams is *different* than sets of functions from initial states to streams! Both might be valid ways of looking at the meaning of nondeterministic programs depending on how you look at it, in fact, I originally wrote this post think in terms of the other definition.
2 Certainly Ball et al. didn't come up with this idea - the idea is implicit in the Guarded Command Language; I'm just referring to Ball et al. because I'm sticking with their syntax.

Tuesday, September 6, 2011

Clowns to the left of me, jokers to the right, here I am, stuck in a hole abstraction

Happy day-after-labor-day weekend! This also stems from a discussion I had with Aaron Turon, who, a while back with Claudio Russo, also thought about using hole abstractions for maps and folds over data (with some fascinating twists, but I'll leave Aaron and Claudio's twists for Aaron and Claudio).

One of the things Aaron noticed was that it's nice to define a tail-recursive map over trees this way, and indeed that is true. I transliterated some code Aaron wrote out into Levy#: you can see it at turon.levy. The example really starts rubbing against two of the unnecessary deficiencies of Levy# - a lack of polymorphism (the trees have to be integer trees) and a lack of mutual recursion. We really want to define the map in terms of two mutually (tail-)recursive functions moveUp and moveDown, both of which take a holey tree -o tree and a tree that belongs "in the hole," the difference being that in moveUp I've already applied the map function to the subtree in the hole, and in moveDown I still need to apply the map function to the subtree in the hole.

This only works for monomorphic maps - if we had polymorphism, our hole abstraction zippers would allow us to write maps from 'a tree to 'a tree using a map function with type 'a -> F 'a. It would not allow us to write the maps we're really used to from 'a tree to 'b tree using a map function with type 'a -> F 'b.

In order to capture more general maps - or fold/reduce operations - the non-hole-abstraction approach is to use the generalization of the derivative that Conor McBride describes in clowns to the left of me, jokers to the right. We observe that a map function like the one in turon.levy sweeps left-to-right through the data structure, always segmenting the data structure into some already-processed "clowns to the left," some not-yet-processed "jokers to the right," and a path through the middle. Then, we either have a sub-tree of clowns, which means we'll now look up for more jokers, or a sub-tree of jokers, which means we need to look down into the jokers to process them. The overall picture is this one:



A nice thing about the hole abstraction framework is that we can represent and dynamically manipulate precisely this picture, whereas the derivative/zipper approach generally forces one to think about the derivative-like structure from the "inside out." If the jokers are a tree of ints and the clowns are a tree of bools, we can describe them like this:
   data ILeaf: int -o itree
| IBranch: itree -o itree -o itree

data BLeaf: bool -o btree
| BBranch: btree -o btree -o btree
Then, the data structure that we will use as a representation of the intermediate state of computation as illustrated above is either stuck in the middle with clowns to the left or stuck in the middle with jokers to the right.
   data Clowns: btree -o cj -o cj # Clowns to the left of me
| Jokers: cj -o itree -o cj # Jokers to the right
Just like our difference lists from before, there are no closed members of the type cj; we can only form values of type cj -o cj. The map implementation can be found at clownjoker.levy.

If we did have polymorphism, the clown/joker data structure for a map would look like this:
   data Clowns: 'clown tree 
-o ('clown, 'joker) cj
-o ('clown, 'joker) cj

| Jokers: ('clown, 'joker) cj
-o 'joker tree
-o ('clown, 'joker) cj
The more general clown/joker data structure for a fold would look like this:
   data Clowns: 'clowns 
-o ('clowns, 'joker) cj
-o ('clowns, 'joker) cj

| Jokers: ('clowns, 'joker) cj
-o 'joker tree
-o ('clowns, 'joker) cj
This is, it must be said, all reasonably unsatisfying. We have seen that hole abstractions can completely replace the derivative data structure - we could well and truly "scrap our zippers." To use the clown/joker generalization of derivatives, we have to define an obvious, boiler-plate-ey datatype cj, which it is merely mechanical to compute for any datatype of jokers. Can we do better?