Wednesday, August 24, 2011

Holey Data, Postscript: Hole Abstraction

I love the PL community on Twitter. Yesterday, David Van Horn presented this observation:

Most papers in computer science describe how their author learned what someone else already knew. -- Peter LandinTue Aug 23 13:04:05 via Buffer Favorite Retweet Reply

I'm gonna be perfectly honest: I was specifically thinking of the holey data series when I responded.

@lambda_calculus Presumably the implication is that this is bad for the advancement of science. OTOH, it's a pretty viable model for a blog.Tue Aug 23 13:33:53 via web Favorite Retweet Reply

'Cause really, I was kind of surprised that I hadn't found any previous presentations of the idea (and was therefore encouraged when Conor indicated that he at least sort of had thought of it too). Chung-chieh Shan's response was great:

@simrob @lambda_calculus I'm not sure that Landin intended to imply that it's bad!Tue Aug 23 22:29:48 via web Favorite Retweet Reply

I guess I hope he's right (or at least that I am), because in the comments to Part 3, Aaron Turon correctly observes that I missed a whopper: Minamide's 1998 POPL Paper "A Functional Representation of Data Structures with a Hole." Horray for blogs, I guess: if this has been a paper review, I would have been quite embarrassed, but as a blog comment I was delighted to find out about this related work.

A Functional Representation of Data Structures with a Hole

Minamide's paper effectively covers the same ground I covered in Part 1 of the Holey Data series: his linear representational lambdas are called hole abstractions. It's a very well written paper from the heady days when you could talk about the proposed Standard ML Basis Library and it was still common for authors to cite Wright and Felleisen when explaining that they were proving language safety by (what amounts to) proving progress and preservation lemmas.

My favorite part of reading the paper was that it simultaneously confirmed two suspicions that I picked up after a recent discussion with Dan Licata:
  1. Levy#/call-by-push-value was an unnecessarily restrictive calculus for programming with linear representational functions - ML would work just fine.
  2. By using call-by-push-value, I'd avoided certain red herrings that would have plagued the development otherwise - somewhere on almost every page I thought "yep, that explanation is less clear than it could be because they don't know about difference between value (positive) types and computation (negative) types yet."
One neat thing this paper describes that I hadn't thought about is automatically transforming non-tail-recursive programs that are naturally structurally inductive to tail-recursive programs based on hole abstractions/difference lists/linear representational functions. It seems like this optimization in particular is where many of the paper's claimed performance gains are found. It also seems like zippers-as-derivatives are pretty clearly lurking around the discussion in Section 4, which is neat.

Overall, this paper made me quite happy - it suggests there is something canonical about this idea, and proves that the idea leads to concrete performance gains. It also made me sad, of course, because it seems like the ideas therein didn't really catch on last time around. But that's part of why I didn't stop with Part 1 in the series: it's not clear, even to me, that hole abstractions/difference lists/linear representational functions are worth adding to a functional programming language if all they can do is be applied and composed. However, with the additional expressiveness that can be found in pattern matching against hole abstractions (hell, that's a way better name than "linear representational functions," I'm just going to call them hole abstractions from now on), I think there's a significantly stronger case to be made.

I've thrown a transliteration of Minamide's examples up on GitHub. The binary tree insertion example, in particular, could come from Huet's paper: it's a beautiful example of how even the "Part 1" language can implement zipper algorithms so long as you never need to move up in the tree. As for the hfun_addone function, I don't really understand it at all, I just transliterated it. In particular, it seems to not be entirely tail recursive (in particular, it seems to regular-recurse along the leftmost spine of the binary tree - if I'm not mistaken about this, I accuse line 131 of being the culprit.)

6.3 Logic Variable

(Note: I've tried to make sure this section doesn't come across as mean-spirited in any way, but I want to emphasize: this is not intended to be any sort of criticism of Yasuhiko Minamide. His paper was terrific! Go read it!)

My other favorite part of reading the paper was Section 6.3, which discusses difference lists. I plan to actually email Minamide and ask him about Section 6.3. Here's my reconstruction of events: Fictional Minamide has a great idea, implements it into a prototype ML compiler, tests it. His paper has strong theoretical and practical results, and gets accepted to POPL 1998. However, one of the reviewers says "oh, this is almost exactly difference lists in logic programming, you need to explain the connection." Fictional Minamide is not a logic programming person, has no access to a logic programming person, doesn't particularly have any reason to care about logic programming, but he does feel the need to address the concerns of a reviewer that he can't actually communicate with. Fictional Minamide manages to find, with great difficulty in the pre-Google-Scholar world, some poorly expressed explanation for what difference lists are in some random papers that are mostly about something else.1 After a couple of hours, Fictional Minamide gives up, exclaiming "okay, this makes absolutely no sense whatsoever," and writes something kinda mumbly about graph reduction that seems vaguely plausible to satisfy the reviewer's demand.

The result is a section that mostly misses the point about the connection between hole abstraction and difference lists, both of which are declarative abstractions that allow a programmer to think about working modulo an uninitialized pointer in memory (though the way Minamide and I do it, the types help you way more). This is not intended as any criticism of either Real Minamide or Fictional Minamide. Indeed, it's mostly a criticism of the conference review processes: I'm pretty sure you could find similar "Section 6.3"s in my papers as well!2 I do hope, however, that my exposition in Part 1, which was in fact motivated by difference lists and reached more-or-less the exact same setup that Minamide came up with, clarifies the record on how these two ideas are connected.

[Update Aug 25] I heard back from Real Minamide - while it was a long time ago, he did recall one of the reviewers mentioning logic variables, leading to the inclusion of that section. I win! Well, kind of. There was probably no "this makes no sense" exclamation; Minimade says that at the time his understanding at the time was in line with my comment about working modulo an uninitialized pointer. The comments about graph reduction, which were why I thought the section misses the point, were more of a side comment.

Minamide also remembered another wonderfully tantalizing tidbit: he recalls that, at POPL 1998, Phil Wadler said he'd seen a similar idea even earlier. Perhaps hole abstraction is just destined to be continuously reinvented until it finally gets included in the C++2040 standard.3

1 The work he cites is on the adding the logic programming notions of unbound variables to functional programming languages, which (while I haven't looked at them) certainly don't look they would give a good introduction to the simple-but-goofy logic programming intuitions behind difference lists.

That said, I basically have never seen a good, clear, self-contained description of what a difference list is - I consider Frank's logic programming notes to be pretty good, but I recognize that I'm not a fair judge because I was also present at the associated lecture.

2 Grep for "∇", or "nominal", especially in anything prior to my thesis proposal, if you want a head start.

3 I wonder what brackets they'll use, since as of the current standard they seem to have run out.

Monday, August 22, 2011

Holey Data, Part 3/3: The Type Of A One-Hole Context

Let's review:
  • As a prelude, I introduced the Levy# language, Bauer and Pretnar's implementation of call-by-push-value, a programming formalism that is very persnickety about the difference between value code and computation code, extended with datatypes. I have come to the position, especially after a conversation with Dan Licata, that we should really think of Levy# as an A-normal form intermediate language that is explicit about control: using it allowed me to avoid certain red herrings that would have arisen in an ML-like syntax, but you'd really want the ML-like language to actually program in.
  • In Part 1, I showed how representational linear functions could capture a logic programming idiom, difference lists. Difference lists can be implemented in ML/Haskell as functional data structures, but they don't have O(1) append and turn-into-a-regular-list operations; my Levy# difference lists, represented as values of type list -o list, can perform all these operations in constant time with local pointer manipulation. The cost is that operations on linear functions are implemented by destructive pointer mutation - I should really have an affine type system to ensure that values of type list -o list are used at most once.
  • In Part 2, I observed that the perfectly natural, λProlog/Twelf-inspired operation of matching against these linear functions greatly increased their expressive power. By adding an extra bit of information to the runtime representation of data (an answer to the the question "where is the hole, if there is a hole?"), the matching operation could be done in constant time (well, time linear in the number of cases). A simple subordination analysis meant that checking for exhaustiveness and redundancy in case analysis was possible as well.
In the case study in Part 2, I motivated another kind of pattern matching, one foreign to the λProlog/Twelf way of looking at the world but one intimately connected to the famous functional data structure known as The Zipper. In this last installment, I show how linear representational functions can completely subsume zippers.

The essence of zippers

A zipper is a functional data structure that allows you to poke around on the inside of a datatype without having to constantly descend all the way int the datatype. Illustrations of zippers generally look something like this:1

The zipper data structure consists of two parts: one of them is a "path" or "one-hole-context", which, owing to Conor's influence [1], I will call a derivative, and the other is a subterm. We maintain the invariant that as we move into the derivative ("up") or into the subterm ("down") with constant-time operations, the combination of the derivative and the subterm remains the original term that we are editing. A Levy# implementation of derivative-based zippers as presented in Huet's original paper on zippers [2] can be found in thezipper.levy.

To understand linear function-based zippers, we see that a linear term a -o b is kind of obviously a b value with one a value missing, so if we pair a value a -o b with a value of type a, we would hope to be able to use the pair as a zipper over b values. The one thing that we can't do with the linear functions we've discussed so far, however, is look "up" in the tree.

A list is a good simple example here, since the two parts of a linear function-based zipper over a list are a difference list (the beginning of the list) and a list (the end of the list). Looking "up" involves asking what the last item in the list -o list is (if there is at least one item). Since a value of type list -o list has the structure [hole: list] Cons n1 (Cons n2 ... (Cons nk-1 (Cons nk hole))...), it's reasonable to match this term against the pattern [hole: list] outside (Cons i hole) to learn that outside is [hole: list] Cons n1 (Cons n2 ... (Cons nk-1 hole)...) and that i is nk. The key is that, in the pattern, the linear variable hole appears as the immediate subterm to the constructor Cons, so that we're really only asking a question about what the closest constructor to the hole is.

This kind of interior pattern matching is quite different from existing efforts to understand "pattern fragments" of linear lambda cacluli (that I know of), but it's perfectly sensible. As I'll show in the rest of this post, it's also easy enough to modify the implementation to deal with interior pattern matching efficiently, easy enough to do exhaustiveness checking, and easy enough to actually use for programming. For the last part, I'll use an example from Michael Adams; you can also see the linear pattern matching version of Huet's original example on Github: (linzipper.levy).


The only trick to implementing zippers that allow interior pattern matching is to turn our linked list structure into a doubly linked list: the final data structure basically is a double-ended queue implemented by a doubly-linked list.

First: we make our "tail pointer," which previously pointed into the last structure where the hole was, a pointer to the beginning of the last allocated cell, the one that immediately surrounds the hole. This lets us perform the pattern match, because we can see what the immediate context of the hole is just by following the tail pointer. (It also requires that we create a different representation of linear functions that are the identity, but that's the kind of boring observation that you can get by implementing a doubly-linked list in C.)

After we have identified the immediate context of the hole, we also need a parent pointer to identify the smaller "front" of the linear function. The in-memory representation of [hole: list] Cons 9 (Cons 3 (Cons 7 hole)), for example, will look something like this:

Just like the "where is the hole" information, this parent pointer gets to be completely ignored by the runtime if it hangs around after the linear function is turned into a non-holey value.

Exhaustiveness checking

Providing non-exhaustive match warnings for case analysis on the inside of a linear function is a bit tricker than providing non-exhaustive match warnings for case analysis on the outside of a linear function, but it's only just a bit trickier. First, observe that the subordination relation I talked about in the previous installment is the transitive closure of an immediate subordination relation that declares what types of values can be immediate subterms to constructors of other types. For instance, when we declare the type of the constructor Cons to be int -o list -o list, Levy# learns that int and list are both immediately subordinate to list.

Levy# was already tracking both the immediate subordination relation and the full subordination relation; we can change the output of the "print subordination" command to reflect this reality:
   Levy. Press Ctrl-D to exit.
Levy> data Z: even | EO: even -o odd | OE: odd -o even ;;
data Z: even
| EO: even -o odd
| OE: odd -o even

Levy> $subord ;;
Subordination for current datatypes:
even <| even
even <| odd (immediate)
odd <| even (immediate)
odd <| odd
Now, the way we enumerate the possible cases for an interior pattern match against a linear value of type holeTy -o valueTy is the following:
  • Enumerate all of the types ctxTy that holeTy is immediately subordinate to. These are all the types with constructors that might immediately surround the hole (which, of course, has type holeTy).
  • Filter out all the types ctxTy above that aren't subordinate to, or the same as, the value type valueTy.
  • For every constructor of the remaining types ctxTy, list the positions where a value of type holeTy may appear as an immediate subterm.
Using the immediate subordination relation is actually unnecessary: alternatively, we could just start the filtering step with all types subordinate (or equal) to valueTy. I think the use of the immediate subordination relation is kind of neat, though.

Case Study: No, Really, Scrap Your Zippers

When I said that this idea had been kicking around in my head since shortly after WGP 2010, I meant specifically since the part of WGP 2010 when Michael Adams presented the paper "Scrap Your Zippers". One of the arguments behind the Scrap Your Zippers approach is that Huet's zippers involve a lot of boilerplate and don't work very well for heterogenerous datatypes.

Small examples tend to minimize the annoyingness of boilerplate by virtue of their smallness, but the example Adams gave about heterogeneous zippers works (almost) beautifully in our setting. First, we describe the datatype of departments, which are pairs of an employee record (the boss) and a list of employees (subordinates):
   data E: name -o int -o employee ;;

data Nil: list
| Cons: employee -o list -o list

data D: employee -o list -o dept ;;

val agamemnon = E Agamemnon 5000 ;;
val menelaus = E Menelaus 3000 ;;
val achilles = E Achilles 2000 ;;
val odysseus = E Odysseus 2000 ;;
val dept = D agamemnon
(Cons menelaus
(Cons achilles
(Cons odysseus Nil)))
Our goal is to zipper our way in to the department value dept and rename "Agamemnon" "King Agamemnon."

The reason this is only almost beautiful is that Levy# doesn't have a generic pair type, and the type of zippers over a heterogeneous datatype like dept is a pair of a linear representational function T -o dept and a value T for some type T. So, we need to come up with specific instances of this as datatypes, the way we might in Twelf:
   data PD: (dept -o dept) -o dept -o paird ;;
data PE: (employee -o dept) -o employee -o paire ;;
data PN: (name -o dept) -o name -o pairn ;;
Given these pair types, we are able to descend into the structure without any difficulty:
   # Create the zipper pair
val loc = PD ([hole: dept] hole) dept ;;

# Move down and to the left
comp loc =
let (PD path dept) be loc in
let (D boss subord) be dept in
return PE
([hole: employee] path (D hole subord))

# Move down and to the left
comp loc =
let (PE path employee) be loc in
let (E name salary) be employee in
return PN
([hole: name] path (E hole salary))
At this point, we could just do the replacement in constant time by linear application:
   comp revision = 
let (PN path name) be loc in
return path KingAgamemnon
For comparison, the Scrap Your Zippers framework implements this with the function fromZipper.

However, everything above could have been done in the previous installment. Our new kind of pattern matching reveals its power when we try to walk "up" in the data structure, so we'll do that instead. The first step takes us back to an employee -o dept zipper, and is where we give Agamemnon his kingly designation:
   comp loc = 
let (PN path name) be loc in
let ([hole: name] path (E hole salary)) be path in
return PE path (E KingAgamemnon salary)
The first step was easy: the only place a name hole can appear in an dept datatype is inside of an employee. An employee, however, can appear in a value of type dept in one of two paces: either as the boss or as one of the members of the list of subordinates. Therefore, if we want to avoid nonexhaustive match warnings, we have to give an extra case:2
   comp revision = 
let (PE path employee) be loc in
match path with
| [hole: employee] D hole subord ->
return D employee subord
| [hole: employee] path (Cons hole other_subord) ->
return dept
;; # Error?
Note that, in the first match above, the case was [hole: employee] D hole subord, not [hole: employee] path (D hole subord). As in the previous installment, I used the subordination relation to conclude that there were no values of type dept -o dept other than the identity; therefore, it's okay to allow the simpler pattern that assumes path is the identity.

The code for this example is in scrap.levy.


Proposals for radically new language features should be treated with some care; do they really add enough expressiveness to the language? I hope that this series has at least suggested the possibility that linear function types might be desirable as a core language feature in a functional language. Linear representational functions expand the class of safe, pure algorithms to capture algorithms that could previously only be done in an impure way, and they give a completely principled (and cast-free) way of scrapping your zipper boilerplate.

And perhaps the most important feature of this proposal is one I haven't touched so far, which is the added simplicity of reasoning about programs, both informally and formally. As for the "informally," if you've ever been flummoxed by the process of making sure that your heavily tail-recursive program has an even number of List.rev errors, or if you have avoided making functions tail-recursive for precisely this reason, I think linear representational functions could be a huge help. One thing I'd like to do if I have time is to work through type inference in context with linear representational functions; I imagine many things would be much clearer. In particular, I suspect there wouldn't be any need for both Cons and Snoc lists or the "fish" operator.

The formal reasoning aspect can be glimpsed by looking at the previous case study: proving that the tail-recursive functions lin_of_zip and zip_of_lin are inverses has the structure of a double-reverse theorem (proving that List.rev (List.rev xs) = xs). Maybe it's just me being dense, but I have trouble with double-reverse theorems. On the other hand, if we needed to prove that the structurally-inductive and not tail-recursive functions lin_of_zip' and zip_of_lin' (which we can now implement, of course) are inverses, that's just a straightforward induction and call to the induction hypothesis. And making dumb theorems easier to prove is, in my experience at least half the battle of using theorem provers.

Expressiveness and efficiency, reloaded

The claim that linear representational functions can completely subsume zippers is undermined somewhat by the emphasis I have put on making matching and other operations O(1). Just to be clear: I can entirely implement linear representational functions using functional data structures (in fact, by using derivatives!) if I'm not concerned with performance. There's even a potential happy medium: rather than invalidating a zipper, I think it would be possible to merely mark a zipper as "in use," so that the second time you use a zipper it gets silently copied. This means that if you think about affine usage, you get constant-time guarantees, but if you just want to use more-awesome zippers, you can program as if it was a persistent data structure. This "persistent always, efficient if you use it right" notion of persistant data structures has precedent, it's how persistent union-find works.

1 That particular image is a CC-SA licensed contribution by Heinrich Apfelmus on the Haskell Wikibook on zippers. Thanks, Heinrich!
2 If we had a polymorphic option type we could just return Some or None, of course. Meh.
[1] Conor McBride, "The Derivative of a Regular Type is its Type of One-Hole Contexts," comically unpublished.
[2] Gúrard Huet, "Function Pearl: The Zipper," JFP 1997.
[3] Michael D. Adams "Scrap Your Zippers," WGP 2010.

Monday, August 15, 2011

Holey Data, Part 2/3: Case Analysis on Linear Functions

The previous installment was titled "(Almost) Difference Lists" because there was one operation that we could (technically) do on a Prolog difference list that we can't do on the difference lists described in the previous section. If a Prolog difference list is known to be nonempty, it's possible to match against the front of the list. Noone ever does this that I can tell, because if the Prolog difference list is empty, this will mess up all the invariants of the difference list. Still, however, it's there.

We will modify our language to allow pattern matching on linear functions, which is like pattern matching on a difference list but better, because we can safely handle emptiness. The immediate application of this is that we have a list-like structure that allows constant time affix-to-the-end and remove-from-the-beginning operations: a queue! Due to the similarity with difference lists, I'll call this curious new beast a difference queue. This is all rather straightforward, except for the dicussion of coverage checking, which involves a well-understood analysis called subordination. But we'll cross that bridge when we come to it.

A pattern matching aside. Pattern matching against functions? It's certainly the case that in ML we can't pattern match against functions, but as we've already discussed towards the end of the intro to Levy#, the linear function space is not a computational function space like the one in ML, it's a representational function space like the one in LF/Twelf, a distinction that comes from Dan Licata, Bob Harper, and Noam Zeilberger [1, 2]. And we pattern match against representational functions all the time in LF/Twelf. Taking this kind of pattern matching from the logic programming world of Twelf into the functional programming world is famously tricky (leading to proposals like Beluga, Delphin, (sort of) Abella, and the aforementioned efforts of Licata et al.), but the trickiness is always from attempts to open up a (representational) function, do something computational on the inside, and then put it back together. We're not going to need to do anything like that, luckily for us.

Difference queues

We're going to implement difference queues as values of type q -o q, where q is an interesting type: because definitions are inductive, the type q actually has no inhabitants.
   data QCons: int -o q -o q ;;
An alternative would be to implement difference queues using the difference lists list -o list from the previous installment, which would work fine too.

We'll also have an option type, since de-queueing might return nothing if the queue is empty, as well as a "valof" equivalent operation to force the queue from something that may or may not be a queue. This getq option will raise a non-exhaustive match warning during coverage checking, since it can obviously raise a runtime error.
   data None: option
| Some: int -o (q -o q) -o option

val getq = thunk fun opt: option ->
let (Some _ q) be opt in return q
The operations to make a new queue and to push an item onto the end of the queue use the functionality that we've already presented:
   val new = thunk return [x:q] x ;;

val push = thunk
fun i: int ->
fun queue: (q -o q) ->
return [x:q] queue (QCons i x)
The new functionality comes from the pop function, which matches against the front of the list. An empty queue is represented by the identity function.
   val pop = thunk fun queue: (q -o q) ->
match queue with
| [x:q] x -> return None
| [x:q] QCons i (queue' x) -> return Some i queue'
Lets take a closer look at the second pattern, [x:q] QCons i (queue' x). The QCons constructor has two arguments, and because the linear variable x occurs exactly once, it has to appear in one of the two arguments. This pattern is intended to match the case where the linear variable x appears in the second argument of the constructor (read: inside of the the q part, not inside of the int part), and the pattern binds a linear function queue' that has type q -o q.

You can see how these difference queues are used in linearqueue.levy.

Another pattern matching aside. The above-mentioned patterns (and, in fact, all accepted patterns in this current extension of Levy#) actually come from the set of Linear LF terms that are in what is known as the pattern fragment (appropriately enough). The pattern fragment was first identified by Dale Miller as a way of carving out a set of unification problems on higher-order terms that could 1) always be given unitary and decidable solutions and 2) capture many of the actual unification problems that might arise in λProlog [3]. Anders Schack-Nielsen and Carsten Schürmann later generalized this to Linear LF [4], which as I've described is the language that we're essentially using to describe our data.

Coverage checking with subordination

In the previous section we saw that the two patterns [x:q] x and [x:q] QCons i (queue' x) were used to match against a value of type q -o q, and the coverage checker in Levy# accepted those two patterns as providing a complete case analysis of values of this type. But the constructor QCons has two arguments; why was the coverage checker satisfied with a case analysis that only considered the linear variable occurring in the second argument?

To understand this, consider the pattern [x:q] x and [x:q] QCons (di x) queue', where the linear variable does occur in the first argument. This pattern binds the variable di, a linear function value of type q -o int. But the only inhabitants of type int are constants, and the q must go somewhere; where can it go? It can't go anywhere! This effectively means that there are no closed values of type q -o int, so there's no need to consider what happens if the q hole appears inside of the first argument to QCons.

Because of these considerations, Levy# has to calculate what is called the subordination relation for the declared datatypes. Subordination is an analysis developed for Twelf that figures out what types of terms can appear as subterms of other types of terms. I added a new keyword to Levy# for reporting this subordination information:
   Levy> $subord ;;
Subordination for current datatypes:
int <| q
q <| q
int <| option
So int is subordinate to both q and option, and q is subordinate only to itself. Subordination is intended to be a conservative analysis, so this means that there might be values of type int -o q and int -o option and that there might be non-identity values of type q -o q, but that there are no values of type q -o option and the only value of type option -o option is [x: option] x. Levy# uses the no-overlapping-holes restriction to make subordination analysis more precise; without this restriction, a reasonable subordination analysis would likely declare q subordinate to option.1

Some more examples of subordination interacting with coverage checking can be seen in linearmatch.levy.

Subordination and identity in case analysis

We use subordination data for one other optimization. The following function is also from linearmatch.levy; it takes a value of type int -o list and discards everything until the place where the hole in the list was.
   val run = thunk rec run: (int -o list) -> F list is
fun x: (int -o list) ->
match x with
| [hole: int] Cons hole l -> return l
| [hole: int] Cons i (dx hole) -> force run dx
Because Levy# is limited to depth-1 pattern matching, a pattern match should really only say that the hole is somewhere in a subterm, not that the hole is exactly at a subterm. This would indicate that the first pattern should really be [hole: int] Cons (di hole) l, but by subordination analysis, we know that int is not subordinate to int and so therefore the only value of type int -o int is the identity function, so di = [hole:int] hole and we can beta-reduce ([hole:int] hole) hole to get just hole.

So subordination is a very helpful analysis for us; it allows us to avoid writing some patterns altogether (patterns that bind variables with types that aren't inhabited) and it lets us simplify other patterns by noticing that for certain types "the hole appears somewhere in this subterm" is exactly the same statement as "this subterm is exactly the hole."


In order to efficiently pattern match against the beginning of a list, we need to be able to rapidly tell which sub-part of a data structure the hole can be found in. This wasn't a problem for difference lists and difference queues, since subordination analysis is enough to tell us where the hole will be if it exists, but consider trees defined as follows:
   data Leaf: tree
| Node: tree -o tree -o tree
If we match against a value of type tree -o tree, we need to deal with the possibility that it is the identity function, the possibility that the hole is in the left subtree, and the possibility that the hole is in the right subtree. This means that, if we wish for matching to be a constant-time operation, we also need to be able to detect whether the hole is in the left or right subtree without doing a search of the whole tree.

This is achieved by adding an extra optional field to the in-memory representation of structures, a number that indicates where the hole is. Jason Reed correctly pointed out in a comment that for the language extension described in the previous installment, there was actually no real obstacle to having the runtime handle multiple overlapping holes and types like tree -o (tree -o tree). But due to the way we're modifying the data representation to do matching, the restriction to having at most one hole at a time is now critical: the runtime stores directions to the hole at every point in the structure.

The memory representation produced by the value code [hole: tree] Node (Node Leaf Leaf) (Node (Node hole Leaf) Leaf) might look something like this, where I represent the number indicating where the hole is by circling the indicated hole in red:

If the hole is filled, the extra data (the red circles) will still exist, but the part of the runtime that does operations on normal inductively defined datatypes can just ignore the presence of this extra data. (In a full implementation of these ideas, this would likely complicate garbage collection somewhat.)

Case Study: Holey Trees and Zippers

The binary trees discussed before, augmented with integers at the leaves, are the topic of this case study. A famous data structure for functional generic programming is Huet's zipper [5, 6], which describes inside-out paths through inductive types such as trees. The idea of a zipper is that it allows a programmer to place themselves inside a tree and move up, left-down, and right-down the tree using only constant-time operations based on pointer manipulation.

The zipper for trees looks like this:
   data Top: path
| Left: path -o tree -o path
| Right: tree -o path -o path
In this case study, we will show how to coerce linear functions tree -o tree into the zipper data structure path and vice versa.

In order to go from a linear function tree -o tree to a zipper path, we use a function that takes two arguments, an "outside" path and an "inside" tree -o tree. As we descend into the tree-with-a-hole by pattern matching against the linear function, we tack the subtrees that aren't on the path to the hole onto the outside path, so that in every recursive call the zipper gets bigger and the linear function gets smaller.
   val zip_of_lin = thunk rec enzip: path -> (tree -o tree) -> F path is 
fun outside: path ->
fun inside: (tree -o tree) ->
match inside with
| [hole: tree] hole ->
return outside
| [hole: tree] Node (left hole) right ->
force enzip (Left outside right) left
| [hole: tree] Node left (right hole) ->
force enzip (Right left outside) right
Given this function, the obvious implementation of its inverse just does the opposite, shrinking the zipper with every recursive call and tacking the removed data onto the linear function:
   val lin_of_zip = thunk rec enlin: path -> (tree -o tree) -> F (tree -o tree) is 
fun outside: path ->
fun inside: (tree -o tree) ->
match outside with
| Top ->
return inside
| Left path right ->
force enlin path ([hole: tree] Node (inside hole) right)
| Right left path ->
force enlin path ([hole: tree] Node left (inside hole))
That's the obvious implementation, where we tack things on to the outside of the linear function. Linear functions have the property, of course, that you can tack things on to the inside or the outside, which gives us the opportunity to consider another way of writing the inverse that looks more traditionally like an induction on the structure of the zipper:
   val lin_of_zip' = thunk rec enlin: path -> F (tree -o tree) is
fun path: path ->
match path with
| Top -> return [hole: tree] hole
| Left path right ->
force enlin path to lin in
return [hole: tree] lin (Node hole right)
| Right left path ->
force enlin path to lin in
return [hole: tree] lin (Node left hole)
These functions, and examples of their usage, can be found in linear-vs-zipper.levy.


This installment was written quickly after the first one; I imagine there will be a bigger gap before the third installment, so I'm going to go ahead and say a bit about where I'm going with this, using the case study as motivation.

I wrote three functions:
  • lin_of_zip turns zippers into linear functions by case analyzing the zipper and tacking stuff onto the "beginning" our outside of the linear function,
  • lin_of_zip' turns zippers into linear functions by inducting over the path and tacking stuff onto the "end" or inside of the linear function, and
  • zip_of_lin turns linear functions into zippers by case analyzing the "beginning" or outside of the linear function and tacking stuff on to the zipper.
What about zip_of_lin', which turns linear functions into zippers by case analyzing the "end" or inside of the linear function? It's easy enough to describe what this function would look like:
#  val zip_of_lin' = thunk rec enzip: (tree -o tree) -> F path is 
# fun lin: (tree -o tree) ->
# match lin with
# | [hole: tree] hole -> return Top
# | [hole: tree] lin' (Node hole right) ->
# force enzip lin' to path in
# return Left path right
# | [hole: tree] lin' (Node left hole) ->
# force enzip lin' to path in
# return Right left path ;;
Toto, we're not in the pattern fragment anymore, but if we turn the representation of linear functions into doubly linked lists (or a double-ended-queues implemented as linked lists, perhaps), I believe we can implement these functions without trouble. At that point, we basically don't need the zippers anymore: instead of declaring that the derivative of a type is the type of its one-hole contexts, we can make the obvious statement that the linear function from a type to itself is the type of one-hole contexts of that type, and we can program accordingly: no new boilerplate datatype declarations needed!


A relatively simple modification of the runtime from the previous installment, a runtime data tag telling us where the hole is, allows us to efficiently pattern match against linear representational functions. This modification makes the use of linear representational functions far more general than just a tool for efficiently implementing a logic programming idiom of difference lists. In fact, I hope the case study gives a convincing case that these holey data structures can come close to capturing many of the idioms of generic programming, though that argument won't be fully developed until the third installment, where we move beyond patterns that come from the Miller/Schack-Nielsen/Schürmann pattern fragment.

More broadly, we have given a purely logical and declarative type system that can implement algorithms that would generally be characterized as imperative algorithms, not functional algorithms. Is it fair to call the queue representation a "functional data structure"? It is quite literally a data structure that is a function, after all! If it is (and I'm not sure it is), this would seem to challenge at least my personal understanding of what "functional data structures" and functional algorithms are in the first place.

1 This is true even though there are, in fact, no closed values of type q -o option even if we don't have the no-overlapping-holes restriction (proof left as an exercise for the reader).
[1] Daniel R. Licata, Noam Zeilberger, and Robert Harper, "Focusing on Binding and Computation," LICS 2008.
[2] Daniel R. Licata and Robert Harper, "A Universe of Binding and Computation," ICFP 2009.
[3] Dale Miller, "A Logic Programming Language with Lambda-Abstraction, Function Variables, and Simple Unification," JLC 1(4), 1991.
[4] Anders Schack-Nielsen and Carsten Schürmann, "Pattern Unification for the Lambda Calculus with Linear and Affine Types," LFMTP 2010.
[5] Gúrard Huet, "Function Pearl: The Zipper," JFP 1997.
[6] Wikipedia: Zipper (data structure).

Friday, August 12, 2011

Holey Data, Part 1/3: (Almost) Difference Lists

This series has been brewing in my head since a year ago, starting at the 2010 Workshop on Generic Programming in Baltimore. It was at least a quarter-baked idea by the time I asked, on the CSTheory Stackexchange, a question that amounted to "are there functional programming difference lists that act like logic programming difference lists"? [1]

Difference lists

Difference lists, which are deep lore of the logic programming community, are essentially data structures that allow you to append to the beginning and the end of the list as a constant time operation; they can also be turned back into "regular" lists as a constant time operation. You can see [2,3,4] for details, but I'd almost encourage you not to: logic programming difference lists are a bit crazy.

Let me give you two ways of thinking about difference lists. If you're a lazy person (in the Haskell sense), you may say that difference lists really have something to do with laziness in functional programming languages. William Lovas showed me some crazy Haskell code yesterday that does something of this flavor, and there's a paper that espouses this viewpoint [5]. I'll stick with a different viewpoint: difference lists are functions from lists to lists. This is how difference lists are implemented in the Haskell Data.DList library [6], and you can find people that back-ported the idea that difference lists are "really" functions to higher-order logic programming languages [7].

The problem with the idea of difference lists being functions is that you lose one of the fundamental properties that you started with: while it depends on how you implement things, difference lists implemented as functions will usually have a O(1) append operation but a O(n) "turn me into a regular list" operation. For a discussions of why, see the answers to my question on CSTheory Stackexchange.

I've believed for some time that this unfortunate O(n)ness could be overcome, but I wanted a working proof-of concept implementation before I talked about it. It was during my recent visit to France that I realized that I really needed to base the proof-of-concept off of a language that was appropriately persnickety about the difference between what I called "value code" and what I called "computation code" in the previous post. And then I found Levy, and embraced and extended it into Levy#.1 So at this point you should read the previous post.

Representing datatypes

In the last post I talked about declaring and using datatypes in Levy#, using the example of lists of integers:
   data Nil: list
| Cons: int -o list -o list
It's necessary to say a few words about how these things are implemented. Every value in Levy# is represented in the OCaml interpreter as a heapdata ref - that is, a mutable pointer into a "box" in memory. Some boxes hold integers, other boxes hold the code and environments that represent suspended (thunked) computations. Declared constants from datatype declarations are another kind of box which holds both a tag (like Nil or Cons) and an appropriate number of values (that is, pointers) for the given datatype.

As an example, here's the way the list Cons 3 (Cons 7 Nil), stored in the variable xs, could be laid out in memory:

Representing difference lists


On the level of types, a difference list will be represented as a member of the value type (the Linear LF type) list -o list. This adequately represents what a difference list is better than the list -> list type of Haskell or Lambda Prolog/Twelf. In Haskell, lots of functions have type list -> list, including the list reversal function. Lambda Prolog/Twelf does better; the LF type list -> list really has to be substitution function - it can only take a list and plug it into designated places in another list. But λx. Nil is a perfectly good LF value of type list -> list. In other words, it's not necessarily the case that applying something to a function is equivalent to tacking that thing on to the end of the difference list - the applied argument can simply be ignored.

Linear functions must use their argument exactly once, so linear functions list -o list accurately represent the idea of difference lists, a list with exactly one missing list somewhere in it. For lists this is kind of a dumb way to put it: the missing list has to go at the end of a series of Conses or nowhere at all, so the linearity forces there to be one (as opposed to zero) occurrences of the hole. If we were talking about terms of type tree -o tree, then linearity would enforce that there was one (as opposed to zero or two or three...) occurrences of the hole. We'll come back to that later...


Recall from the previous post that Levy# starts off syntactically precluded from forming values of this type list -o list; we'll have to fix that by adding new syntax. On the level of syntax, a difference list will be represented using a Twelf-like notation where "λ x:τ.v" is written as [x:tau] v.

Example The difference list containing first 9 and then 4 will be represented as [hole: list] Cons 9 (Cons 4 hole); let's call that difference list dxs. Let dys be the some other piece of value code with type list -o list, and let xs be the example difference list Cons 3 (Cons 7 Nil) from the previous section.

Then dxs xs appends the difference list and regular list by filling in the hole with xs; the result is Cons 9 (Cons 4 (Cons 3 (Cons 7 Nil))). Similarly, [hole: list] dxs (dys hole) is a piece of value code that forms a new difference list by appending the two smaller difference lists.

In terms of the expressive power of these difference lists, that's pretty much the story: the file linear.levy has some more examples, which can be run in the "holey-blog-1" tagged point in the repository.

A Linear Logic Aside. This, perhaps, clears up the reason I used linear implication to define constructors like Cons. Say instead of Cons we had ConsBad, a Linear LF constant of type int -> list -> list. That type is equivalent in Linear LF to a term of type !int -o !list -o list. The exclamation points ("bangs") prevent the occurrence of a linear variable in either argument to ConsBad, so [hole: list] ConsBad 9 (ConsBad 4 hole) would NOT be a (piece of value code)/(Linear LF term) of type list -o list.

Memory representation

So far, I've explained how difference lists can be represented as linear functions, but what I originally promised was difference lists with O(1) append and turn-into-a-regular-list operations. To do this, I need to explain how linear functions can be represented in memory. A linear function is represented as another kind of box in memory, one with two pointers. The first pointer acts like a regular value - it points to the beginning of a data structure. The second pointer acts like a pointer into a data structure: it locates the "hole" - the location of the linear variable. This is the in-memory representation of our example [hole: list] Cons 9 (Cons 4 hole) from above, stored as the variable dxs.

If we want to apply this linear function to a value and thereby insert a list into the hole, we merely have to follow the second hole-pointer and write to it, and then return the first value-pointer as the result. Voilà, application with only local pointer manipulation! If the result of applying xs to dxs was stored as the variable ys, the state of memory would look something like this:

The catch is that the linear function box that dxs refers to no longer can be used in the same way: if we tried to write something different to the hole pointer, disaster could result: it would change the value of ys!

Therefore, the difference list, and all linear function values, are use-only-once data structures, and the current implementation invalidates used linear function values (shown by the red X in the figure above) and will raise a runtime error if they are used twice. This could be precluded with a type system that ensured that values whose type was a linear function are used, computationally, in an affine way - that is, used at most once. Work on affine type systems is interesting, it's important, I've written about it before, but I see it as an orthogonal issue. That's all I really have to say about that.

And that's how we implement linear2 representational functions with constant time composition and application, which corresponds to implementing different lists with O(1) append and turn-into-a-regular-list operations.

(Disclaimer: this section is a bit of an idealization of what's actually going on in the OCaml implementation, but it's honest enough in the sense that everything's really implemented by local pointer manipulation, and represents what I think you'd expect an complied implementation of this code to do.)

Expressiveness and efficiency

One cost of our representation strategy is that we can only use values whose type is a linear function in an affine way. Another cost is, because the implementation really thinks in terms of "the hole," we can't have two-hole structures like tree -o tree -o tree. (Edit: Jason, in the comments, points out that I may be wrong about more-than-one-hole-structures being problematic.) But, if we were interested primarily in expressiveness and were willing to sacrifice constant-time composition and application guarantees, then it would absolutely be possible to have reusable difference lists and linear types representing n-holed contexts for any n.

Case Study: The Dutch National Flag Problem

William Lovas proposed this example, and I think it's great: the Dutch National Flag problem was proposed by Edsger Dijkstra as a generalization of what happens in a Quicksort partition step. The logic/functional programming variant of this problem is as follows: you have a list of red, white, and blue tagged numbers. You want to stably sort these to put red first, white second, and blue third: in other words, if the ML function red filters out the sublist of red-tagged numbers and so on, you want to return red list @ white list @ blue list.

That's easy: the catch is that you can't use append, because you're only allowed to iterate over the initial unpartitioned list, and you're only allowed to iterate over that list one time.

The solution in Levy# extended with difference lists is to pass along three difference lists representing the partitioned beginning of the full list and a regular list representing the un-partitioned end of the full list. At each step, you pick an item off the un-partitioned list and (constant-time) append it on to the end of the appropriate difference list. When there's nothing left in the un-partitioned list, you (constant-time) append the three difference lists while (constant-time) turning the concatenated difference lists into a normal list.
val dutch' = thunk rec loop : 
(list -o list)
-> (list -o list)
-> (list -o list)
-> list
-> F list is
fun reds : (list -o list) ->
fun whites : (list -o list) ->
fun blues : (list -o list) ->
fun xs : list ->
match xs with
| Nil -> return (reds (whites (blues Nil)))
| Cons x xs ->
(match x with
| Red i ->
force loop
([hole: list] reds (Cons x hole))
| White i ->
force loop
([hole: list] whites (Cons x hole))
| Blue i ->
force loop
([hole: list] blues (Cons x hole))
This solution is in the repository in the file dutch.levy.


I think that the incorporation of linear representational functions as difference lists is beautiful. I've written plenty of Standard ML where I've done tail-recursive manipulations on lists and so either 1) had to keep track of whether I'd called List.rev the right number of times or 2) worried whether all these append operations would end up hurting efficiency. This solution gives you the append functions that you need in a way that makes it easy to think about both the order of items and the runtime cost. And it's all enabled by some very pretty types!

The real straw that broke my back and led to this line of thinking, by the way, was an issue of expressiveness, not efficiency: I was trying to reason about such code that did tail-recursive operations on lists in Agda. That's another story, but it's one that's connected to my thesis in an important way, so I will try to get around to writing it.

In the next installment of the Holey Data series, I'll show how we can pattern match against difference lists in Levy#, which turns linear representational functions from (almost) difference lists into something significantly awesomer than difference lists.

(P.S. Thanks Paul a.k.a. Goob for sneaking me into the English Department's cluster to make the illustrations for this blog post.)

1 Originally I wanted to base it off of a Noam/Dan/Bob style language based on focusing in polarized logic, but it turns out that focused logic isn't a very natural language to actually program in. I owe a debt to Andrej and Matija for going through the legwork of making CBPV into an implemented toy language that it was possible to play around with.
2 Actually, this representation strategy might work fine not just for linear functions but for regular-old LF substitution functions with zero, one, two... arguments. When I tried to work this out with William Lovas, we tentatively concluded that you might need some sort of union-find structure to make sure that all the holes were effectively pointers to "the same hole." In this note, I only motivated linear functions in terms of adequately representing difference lists; the real beauty of using difference lists starts coming into play with the next two posts in this series.
[1] Robert J. Simmons, "Difference Lists in Functional Programming", CSTheory StackExchange.
[2] Wikibook on Prolog Difference Lists.
[3] Dale Miller's implementation of (classic, logic-programmey) difference lists: "Difference Lists provide access to the tail of a list".
[4] Frank Pfenning's course notes: "Difference Lists".
[5] Sandro Etalle and Jon Mountjoy. "The Lazy Functional Side of Logic Programming".
[6] Don Stewart's Data.DList library.
[7] Yves Bekkers and Paul Tarau. "The Monads of Difference Lists".

Thursday, August 11, 2011

Embracing and extending the Levy language

I started writing the post about the thing I really wanted to write about, but then I realized that the background was already a long post on its own. So, that's this post! It's about two things:
  1. Levy, an implementation of call-by-push-value that Andrej Bauer and Matija Pretnar wrote in OCaml for the PL Zoo. I made a fork of Levy that, among other small changes, reduces the need for parentheses by making operator precedence more like it is in OCaml.
  2. A more divergent fork of Levy that adds datatypes in a familiar-but-as-yet-unmotivated way. This diverges from the goals of Levy - as a PL Zoo project, Levy aims to be a clean and uncluttered language implementation, and that's not the case for this more divergent fork. I'll call this new version of Levy... let's see... Levy#. Sure, that works.
Other things have been written about Levy at Andrej's blog and LtU.

Basic concepts

The big idea of call-by-push-value, and, in turn, Levy, is that it is a programming language that is incredibly picky about the difference between code that builds data (I'll call this value code as shorthand) and code that does stuff (I'll call this computation code). Value code tends to be really boring: 3 builds an integer value and true builds a boolean value. Levy also builds in a couple of total binary operations on integers, so 3 + x and y < 12 are both viewed by Levy as value code. Variables are always value code.

Computation code is more interesting. If-statements are computation code: if v then c1 else c2 examines the value built by v and then either does the stuff described by c1 or else does the stuff described by c2. Functions are computation code: fun x: ty -> c receives a value and then does the stuff described by c, possibly using the value it received.

The computation code return v does a very simple thing: it takes the value built by v and says "look, I made you a value." If the value code v had value type ty, then the computation code return v has computation type F ty. So yeah: I hadn't talked about types yet, but pieces of value code are given value types and pieces of computation code are given computation types.1

Which brings us to an interesting kind of value code that I didn't mention before: thunk c wraps up (and suspends) the computation code c in a value; if the computation code c had computation type ty, then the value code thunk c has value type U ty.

The typical type of a function that would be written as int -> int in ML is U (int -> F int). It has to be a value type (ML functions are values), hence the "U", and the computation code in the function wants to take a value, do some stuff, and eventually return an integer. Here's an implementation of the absolute value function:
   val abs = thunk fun n: int -> 
if n < 0
then return 0 - n
else return n
In order to use the absolute value function, we have to "un-thunk" the computation code inside of the value: the computation code force v does this. Where we'd write (abs 4) in ML, we write force abs 4 in Levy (force binds more tightly than anything, including application).

Evaluation order

A consequence of being persnickety about the difference between value code and computation code is that Levy is very explicit about sequencing. The SML/Haskell/OCaml code "foo(abs 4,abs 5)" means "call foo with the absolute values of 4 and 5." One occasionally hard-learned lesson for people who have programmed in both SML and OCaml is that, if abs contains effects, this code might do different things: SML has a specified left-to-right evaluation order, but OCaml's evaluation order is unspecified, and in practice it's right-to-left.

Such underspecification is impossible in Levy, since calling a function is a computation, and computations can't be included directly in values! Instead, if I have computation code c of type F ty, we expect it to do some stuff and then say "look, I made you a value (of type ty)." The computation code c1 to x in c2 does the stuff described by c1, and when that stuff involves saying "look, I made you a value", that value is bound to x and used to do the stuff described by c2. Therefore, when translate our SML/Haskell/OCaml code above to Levy, we have to explicitly write either
   force abs 4 to x1 in
force abs 5 to x2 in
force foo x1 x2
   force abs 5 to x2 in
force abs 4 to x1 in
force foo x1 x2
That concludes the introduction of Levy, (my slightly de-parenthesized fork of) the language implemented by Bauer and Pretnar. There are two example files here (discusses basics) and here (discusses functions).


The first new feature in Levy# is the addition of a new kind of computation code, match v with | pat1 -> c1 ... that is a generalization of both let-expressions let x be v in c and if-expressions if v then c1 else c2 from Levy. Patterns like pat1 are a refinement of value code: constants are patterns, variables are patterns (variables in patterns are binding occurrences, like in most functional languages), but addition is not a pattern, and variables can only appear once in a pattern.

The simplest new thing that we can do with this language construct is write computation code that mimics a switch statement in C:
   val f = thunk fun x: int ->
match x with
| 1 -> return 4
| 2 -> return 3
| 3 -> return 2
| 4 -> return 1
| 5 -> return 0
| y -> return (x + y)
This function is included as an example in the "datatype" fork of Levy# on Github here; let's look at what it actually does in the interpreter:
   bash-3.2$ ./levy.byte switch.levy 
... snip ...
Levy. Press Ctrl-D to exit.
Levy> force f 1 ;;
comp F int = return 4
Levy> force f 3 ;;
comp F int = return 2
Levy> force f 5 ;;
comp F int = return 0
Levy> force f 7 ;;
comp F int = return 14
That's what I expected to happen; hopefully it's what you expected to happen too. Let's declare victory and move on.

Using datatypes

The main innovation of Levy# is the introduction of an inductive datatype mechanism, which is basically a bedrock component of every functional programming language (that isn't a Lisp) ever. This should have been a relatively small change, but parser generators are the worst and I'm bad at them so I basically had to rewrite the parser to have a new phase. Sigh.

I'll start with showing how datatypes are used, because that's a little more standard than the way I declare them. It would look exactly like the uses of ML datatypes, except that I curry them in a Haskell-ey fashion. The function sumlist, which sums up a list of integers, looks like this:
   val sumlist = thunk rec sumlist : list -> F int is 
fun xs : list ->
match xs with
| Nil -> return 0
| Cons x xs -> force sumlist xs to y in return x + y
The other thing that's new in the above example is the use of Levy's fixedpoint operator, but there's nothing interesting there from the perspective of reading the code. This and other examples are in datatype.levy, by the way.

Side note: I'm lazy but still wanted a realistic execution model *and* coverage checking (redundant/non-exhaustive matches raise warnings), so I limited the current version of Levy# to depth-1 pattern matching:
   Levy> match v4 with | Cons x1 (Cons x2 xs) -> return 0 ;;
Type error: Cons x2 xs not allowed in pattern (depth-1 pattern matching only)
If I was doing this for real I'd implement real coverage checking and pattern compilation as a code transformation, which would leave the depth-1 pattern matching interpreter unchanged.

So that's the use of datatypes; one upshot is that my pieces of value code are maybe a bit more interesting than they used to be: I can make big piece value code like Cons 4 (Cons 9 (Cons 16 (Cons (-1) Nil))) to create a big piece of data. I think that's kind of interesting, at least.

Declaring datatypes

Datatypes are just a way of telling the computer that you want to work with a particular set of inductively defined structures (at least this is what strictly positive datatypes, the only ones Levy# allows, are doing). A point that I've haltingly tried to make in a previous post was that, well, if you're going to write down a particular set of inductively defined terms, a canonical-forms based logical framework like Linear LF is a perfectly good language to write that in.

So our constants look basically like the declaration of constants in a Linear LF signature, except that we make the constants uppercase, following ML conventions:
   data Nil: list
| Cons: int -o list -o list
It's a little point, but it's kind of a big deal: we can understand all the values in Levy# as canonical forms in a simply-typed linear lambda calculus with integers and some arithmetic operations - kind of like Twelf with the integer constraint domain extension.2

The reason it's a big deal is that we can now observe that the two kinds of "function types" we have in this language are way different. There's a computational arrow ty1 -> ty2 that we inhereted from Levy, and then there's a Linear LF linear implication arrow, a value arrow ty1 -o ty2 that we use in value code to construct patterns and values. I didn't invent this idea, it's actually a subset of what Licata and Harper propose in "A Universe of Binding and Computation" (ICFP 2009). Dan Licata would call the Linear LF arrow a "representational arrow," which is probably a better name than "value arrow."

Of course, the values that we actually use are, without exception, at atomic type; if we try to even write down Cons 4, which seems like it would have type list -o list, we get a type error from Levy#: constructor Cons expects 2 arg(s), given 1. So by requiring that all the Linear LF terms (i.e. value code) be canonical (beta-normal and eta-long), I've effectively excluded any values of type ty1 -o ty2 from the syntax of the language.

What would it mean to introduce value code whose type was a linear implication to Levy#? Well, that's the thing I really wanted to write about.

1 Focusing and polarization-aware readers will notice that this has something to do with focusing. In particular: positive types are value types, negative types are computation types, F is "upshift," and U is "downshift."
2 Technical note: Boolean values, from the perspective of the value language, are just two defined constants, as explained here. That's important: we have Boolean constants that are values, but the elimination form for Booleans is computation code not value code. If we had an "if" statement at the level of values, all sorts of crazy (and interesting) stuff could potentially happen. At least for the moment, I'm not considering that.