OCaml Planet

July 30, 2015

Andrej Bauer

Intermediate truth values

I have not written a blog post in a while, so I decided to write up a short observation about truth values in intuitionistic logic which sometimes seems a bit puzzling.

Let $\Omega$ be the set of truth values (in Coq this would be the setoid whose underlying type is $\mathsf{Prop}$ and equality is equivalence $\leftrightarrow$, while in HoTT it is the h-propostions). Call a truth value $p : \Omega$ intermediate if it is neither true nor false, i.e., $p \neq \bot$ and $p \neq \top$. Such a “third” truth value $p$ is proscribed by excluded middle.

The puzzle is to explain how the following two facts fit together:

  1. “There is no intermediate truth value” is an intuitionistic theorem.
  2. There are models of intuitionistic logic with many truth values.

Mind you, excluded middle says “every truth value is either $\bot$ or $\top$” while we are saying that there is no truth value different from $\bot$ and $\top$:
$$\lnot \exists p : \Omega \,.\, (p \neq \top) \land (p \neq \bot).$$
Coq proves this:

Theorem no_intermediate: ~ exists p : Prop, ~ (p <-> True) /\ ~ (p <-> False).
Proof.
  firstorder.
Qed.

Note that we replaced $=$ with $\leftrightarrow$ because equality on $\Omega$ is equivalence on $\mathsf{Prop}$ in Coq (this would not be neccessary if we used HoTT where h-propositions enjoy an extensionality principle). You should also try proving the theorem by yourself, it is easy.

A model of intuitionistic mathematics with many truth values is a sheaf topos $\mathsf{Sh}(X)$ over a topological space $X$, so long as $X$ has more than two open sets. The global points of the sheaf of truth values $\Omega$ are the open subsets of $X$, and more generally the elements of $\Omega(U)$ are the open subsets of $U$.

So, if we take an intermediate open set $\emptyset \subset U \subset X$, should it not be an intermediate truth value? Before reading on you should stop and think for yourself.

Really, stop reading.

Let us calculate which open set (truth value) is
$$(U \neq \top) \land (U \neq \bot).$$
Because $U = \top$ is equivalent to $U$ and $U = \bot$ to $\lnot U$ our formula is equivalent to
$$\lnot U \land \lnot\lnot U.$$
Remembering that negation is topological exterior we get
$$\mathsf{Ext}(U) \cap \mathsf{Ext}(\mathsf{Ext}(U))$$
which is empty,
$$\emptyset.$$
Indeed, $U$ is not a counterexample!

We have here a typical distinction between internal and external language:

  • The mathematicians inside the topos think and speak locally. They ask not “Is this statement true?” but “Where is this statement true?” If you aks them a yes/no question they will answer by giving you an open subset of $X$. They will conclude that $U \neq \top$ holds on the exterior of $U$, and $U \neq \bot$ on the double exterior of $U$, and that nowhere are they true both together.
  • The mathematicians outside the topos (that’s us) understand $(U \neq \top) \land (U \neq \bot)$ differently: it is about comparing the open set $U$ to the open sets $X$ and $\emptyset$ as elements of the topology of $X$. For them “yes” and “no” are valid answers, and no other.

By the way, the mathematicians on the inside also think that “yes” and “no” are valid answers, and there is no other – that is precisely the statement “there is no intermediate truth value” – but they think of it as “holding everywhere on $X$”.

There are of course many situations where the difference between iternal and external language may create confusion. For example, if $X = \mathbb{N}$ is a countable discrete space then the object of natural numbers is the sheaf of all maps into $\mathbb{N}$, of which there are uncountably many. Thus on the outside we “see” that there are uncountably many natural numbers in $\mathsf{Sh}(X)$. Those on the inside would of course disagree. This is an amusing situation, sort of a reverse of Skolem’s paradox.

by Andrej Bauer at July 30, 2015 08:16 AM

GaGallium

Formally verifying the complexity of OCaml programs with CFML -- interlude

<style type="text/css"> .hll { background-color: #ffffcc } .c { color: #408080; font-style: italic } /* Comment */ .k { color: #008000; font-weight: bold } /* Keyword */ .o { color: #666666 } /* Operator */ .cm { color: #408080; font-style: italic } /* Comment.Multiline */ .cp { color: #BC7A00 } /* Comment.Preproc */ .c1 { color: #408080; font-style: italic } /* Comment.Single */ .cs { color: #408080; font-style: italic } /* Comment.Special */ .gd { color: #A00000 } /* Generic.Deleted */ .ge { font-style: italic } /* Generic.Emph */ .gr { color: #FF0000 } /* Generic.Error */ .gh { color: #000080; font-weight: bold } /* Generic.Heading */ .gi { color: #00A000 } /* Generic.Inserted */ .go { color: #888888 } /* Generic.Output */ .gp { color: #000080; font-weight: bold } /* Generic.Prompt */ .gs { font-weight: bold } /* Generic.Strong */ .gu { color: #800080; font-weight: bold } /* Generic.Subheading */ .gt { color: #0044DD } /* Generic.Traceback */ .kc { color: #008000; font-weight: bold } /* Keyword.Constant */ .kd { color: #008000; font-weight: bold } /* Keyword.Declaration */ .kn { color: #008000; font-weight: bold } /* Keyword.Namespace */ .kp { color: #008000 } /* Keyword.Pseudo */ .kr { color: #008000; font-weight: bold } /* Keyword.Reserved */ .kt { color: #B00040 } /* Keyword.Type */ .m { color: #666666 } /* Literal.Number */ .s { color: #BA2121 } /* Literal.String */ .na { color: #7D9029 } /* Name.Attribute */ .nb { color: #008000 } /* Name.Builtin */ .nc { color: #0000FF; font-weight: bold } /* Name.Class */ .no { color: #880000 } /* Name.Constant */ .nd { color: #AA22FF } /* Name.Decorator */ .ni { color: #999999; font-weight: bold } /* Name.Entity */ .ne { color: #D2413A; font-weight: bold } /* Name.Exception */ .nf { color: #0000FF } /* Name.Function */ .nl { color: #A0A000 } /* Name.Label */ .nn { color: #0000FF; font-weight: bold } /* Name.Namespace */ .nt { color: #008000; font-weight: bold } /* Name.Tag */ .nv { color: #19177C } /* Name.Variable */ .ow { color: #AA22FF; font-weight: bold } /* Operator.Word */ .w { color: #bbbbbb } /* Text.Whitespace */ .mb { color: #666666 } /* Literal.Number.Bin */ .mf { color: #666666 } /* Literal.Number.Float */ .mh { color: #666666 } /* Literal.Number.Hex */ .mi { color: #666666 } /* Literal.Number.Integer */ .mo { color: #666666 } /* Literal.Number.Oct */ .sb { color: #BA2121 } /* Literal.String.Backtick */ .sc { color: #BA2121 } /* Literal.String.Char */ .sd { color: #BA2121; font-style: italic } /* Literal.String.Doc */ .s2 { color: #BA2121 } /* Literal.String.Double */ .se { color: #BB6622; font-weight: bold } /* Literal.String.Escape */ .sh { color: #BA2121 } /* Literal.String.Heredoc */ .si { color: #BB6688; font-weight: bold } /* Literal.String.Interpol */ .sx { color: #008000 } /* Literal.String.Other */ .sr { color: #BB6688 } /* Literal.String.Regex */ .s1 { color: #BA2121 } /* Literal.String.Single */ .ss { color: #19177C } /* Literal.String.Symbol */ .bp { color: #008000 } /* Name.Builtin.Pseudo */ .vc { color: #19177C } /* Name.Variable.Class */ .vg { color: #19177C } /* Name.Variable.Global */ .vi { color: #19177C } /* Name.Variable.Instance */ .il { color: #666666 } /* Literal.Number.Integer.Long */ </style> <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script>

This small interlude post is not about Coq nor CFML: I present a data structure from Okasaki's Purely functional data structures, implemented in OCaml.

It comes as a preliminary explanation for an incoming formalization of it in CFML, including an complexity analysis!

The structure we decided to study is Okasaki's "binary random access list". It is a functional data structure, which features usual list operations (cons and uncons for adding and removing an element at the head of the list), but also random access lookup and update. That is, you can add or remove an element at the head of the list, but also modify or query the \(i^{th}\) element of the structure.

These four operations perform in worst-case \(O(log(n))\) steps, where \(n\) is the number of elements stored in the structure.

Let us see step by step how it is implemented in OCaml. The source code for the complete implementation can be found here.

Type definitions, implicit invariants

A binary random access list is a list of binary trees: we first define a tree OCaml type.

  type 'a tree = Leaf of 'a | Node of int * 'a tree * 'a tree

Notice that only the leaves store elements. Nodes contain an integer corresponding to the number of elements stored (in the leaves) in the tree, which makes a size function trivial to implement:

  let size = function
    | Leaf x -> 1
    | Node (w, _, _) -> w

Now, a binary random access list is a list of either a tree, either nothing. We consequently define an "tree option" type, here dubbed "digit".

  type 'a digit = Zero | One of 'a tree

The name "digit" comes of the similarity between a binary random access list and a list of bits, representing an integer - adding an element at the head being similar to incrementing the integer, etc. We'll see more of that later.

Finally, we define the type for the whole structure: the binary random access list.

  type 'a rlist = 'a digit list

A valid binary random access list should satisfy some additional invariants:

  • Trees are complete trees -- a tree of depth \(d\) always has \(2^d\) leaves;

  • Any rlist contains trees of increasing depth starting at some depth \(p\): if the \(i^{th}\) cell (indexing from \(0\)) contains a tree (is of the form One t), then this tree has depth \(p+i\).

  • A complete binary random access list is a rlist with starting depth \(p\) equal to \(0\): its first tree, if present, is only a leaf.

To sum up, a binary random access list is a list, which stores in its \(i^{th}\) cell either nothing, or a complete binary tree of depth \(i\).

As an example, a binary random access list storing the sequence \(1, 2, 3, 4, 5\) can be represented as:

binary random access list

binary random access list

Binary random access lists look like integers in binary

As I mentioned before, a binary random access list is in fact quite similar to an integer represented in binary - i.e. as a list of bits.

Actually, if you erase the trees from the implementation (type 'a digit = Zero | One of 'a tree becomes type digit = Zero | One), you obtain an implementation of integers, represented as a list of bits (least significant bit at the head of the list); the cons operation being incrementing, the uncons one decrementing (lookup and update do not make much sense, though).

How do you increment an integer? You look at the least significant bit; if it's a zero, you turn it into a one; if it's a one, you turn it into zero, and recursively continue to increment, starting with the next bit.

"Consing" to a binary random access list is similar, except that we have to handle a bit more information: the elements of the structure, stored in the trees.

Instead of adding \(1\), we add a tree t (more precisely, a digit One t): if the first element of the list is Zero, we turn it into One t. If it's a One t', we turn it into Zero, and recursively continue, but with a new tree: Node (size t + size t', t, t'). This corresponds to the link operation, which combines two trees of depth \(d\) into one of depth \(d+1\).

  let link t1 t2 =
    Node (size t1 + size t2, t1, t2)

The OCaml implementation follows:

  let rec cons_tree t = function
    | [] -> [One t]
    | Zero :: ts -> One t :: ts
    | One t' :: ts -> Zero :: cons_tree (link t t') ts

  let cons x ts =
    cons_tree (Leaf x) ts

The uncons operations follows the same idea: it's just like decrementing, except that you also have to invert the link operation to obtain trees of smaller depth.

  let rec uncons_tree = function
    | [] -> raise Empty
    | [One t] -> (t, [])
    | One t :: ts -> (t, Zero :: ts)
    | Zero :: ts ->
      match uncons_tree ts with
      | Node (_, t1, t2), ts' -> (t1, One t2 :: ts')
      | _ -> assert false

  let head ts =
    match uncons_tree ts with
    | (Leaf x, _) -> x
    | _ -> assert false

  let tail ts =
    let (_,ts') = uncons_tree ts in ts'

Unconsing a rlist of starting depth \(p\) always returns a tree of depth \(p\) (or fails if the list is empty). In particular, as the binary access list manipulated by the user always starts with depth \(0\), we can assume in the implementation of head that the unconsed tree is a leaf.

Random access

Random access lookup and update are quite easy to implement.

The idea is to walk through the list; faced with a Node (w, l, r) tree, we know how much elements it contains: it is exactly w. Knowing the index \(i\) of the element we want to visit, we can compare it to w to know whether we should explore this tree (if \(i < w\)), or look for the element \(i - w\) in the rest of the list.

  let rec lookup i = function
    | [] -> raise (Invalid_argument "lookup")
    | Zero :: ts -> lookup i ts
    | One t :: ts ->
      if i < size t
      then lookup_tree i t
      else lookup (i - size t) ts

If we have to walk through the tree, we can also do this without performing an exhaustive exploration: by comparing the index to half the size of the tree, we can decide whether we should explore the left or right subtree.

  let rec lookup_tree i = function
    | Leaf x -> if i = 0 then x else raise (Invalid_argument "lookup")
    | Node (w, t1, t2) ->
      if i < w/2
      then lookup_tree i t1
      else lookup_tree (i - w/2) t2

The update function works in a similar fashion.

  let rec update_tree i y = function
    | Leaf x -> if i = 0 then Leaf y else raise (Invalid_argument "update")
    | Node (w, t1, t2) ->
      if i < w/2
      then Node (w, update_tree i y t1, t2)
      else Node (w, t1, update_tree (i - w/2) y t2)

  let rec update i y = function
    | [] -> raise (Invalid_argument "update")
    | Zero :: ts -> Zero :: update i y ts
    | One t :: ts ->
      if i < size t
      then One (update_tree i y t) :: ts
      else One t :: update (i - size t) y ts

Next time

Final post incoming, where I present a formalization of this exact OCaml implementation, using CFML and the time credits-related functions I presented in the first post. Stay tuned!

by Armaël Guéneau at July 30, 2015 08:00 AM

July 27, 2015

Mindy Preston

Fun With Opam: Advice to My Past Self

Most instructions on how to get started with OCaml packages now advise the user to get started with opam, which is excellent advice. Getting up and running with opam is pretty easy, but I wasn’t sure where to go from there when I wanted to modify other people’s packages and use the modifications in my environment. I wish I’d realized that the documentation for making packages has a lot of applicable advice for that use case, as well as the apparent target (making your own packges from scratch).

Most of the documentation on opam seems to focus on its use as a dependency manager, but it’s also capable of providing many virtual environments via opam switch. Often buried in documentation is the very useful opam switch --alias-of, which one can use to make new virtual environments using a standard OCaml compiler. Most of my current switches are based off the 4.02.1 compiler (just a bit short of up-to-date), so I make new switches for projects with opam switch new_project --alias-of 4.02.1.

I wish, also, that I’d started writing opam files for my own projects and installing them locally via opam pin sooner. The earlier I do this, the less likely I am to publicize a repository with missing or broken dependencies, and it reduces the friction of sharing early-stage code with other people. Combined with the switch-per-project approach, having opam files ready means you can easily have fresh switches with the minimal amount of dependencies installed; you can also install a minimal set of reverse-dependencies (if any!), to limit the amount of time you spend waiting for downstream packages to recompile when you want to test a change. I have great sorrow when I think back on the number of times I waited through a complete recompile of all zillion packages that depend on mirage-types when I really only wanted to test tcpip; a switch that doesn’t have mirage-console-unix and friends installed won’t try to recompile them, and I won’t have to spend any minutes of my only life waiting for that compilation to finish.

Having a lot of switches does lead to some confusion, since executing opam switch results in a filesystem change in the user’s .opam directory but can’t enforce the change in all open terminals. The problem is compounded by the requirement to eval $(opam config env) after running opam switch — if the user forgets to do this, opam switch will report the expected branch, but any commands that try to use the configuration will use the values for the previous switch. After having been caught out by this a few times, I caved and customized my command prompt to report on the current actual switch (along with the current git branch):

function git-current-branch { git branch 2> /dev/null | sed -e '/^[^*]/d' -e 's/* \(.*\)/(\1) /' } function opam-sw { # this is pretty brittle but good enough # have to adjust last cut if .opam is not /home/username/.opam or similar depth echo $CAML_LD_LIBRARY_PATH|cut -d: -f 1|cut -d'/' -f 5 } export PS1="[\$(opam-sw)] \$(git-current-branch)$PS1"

It’s embarrassing to admit how many times I was confused by the apparently nonsensical output of make and opam install before I made this change, but at least that’s decreased since I started showing myself this information by default.

Something that still trips me up, though, is the setup.data file created by projects that use oasis to generate build files. setup.data contains cached information on how to build projects, including full paths to compilers and libraries, which isn’t updated if you use opam switch. When I began writing this post, my “solution” for this was compulsively removing setup.data every time there’s a weird-looking problem building a project that uses oasis, but writing I’ve been inspired to do better by making another adjustment:

function setup-data { grep -E "^standard_library=" setup.data 2>/dev/null|cut -d'/' -f5 } function opam-sw { setup=$(setup-data) switch=$(echo $CAML_LD_LIBRARY_PATH|cut -d: -f 1|cut -d'/' -f 5) if [ -n "$setup" ] && [ "$switch" != "$setup" ] then echo "(!!! setup.data $switch)" else echo $switch fi } export PS1="[\$(opam-sw)] \$(git-current-branch)$PS1"

Now my prompt will warn me when setup.data and my current switch conflict, as they do in the following example where I last built mirage-tcpip in the separate-arp-tcpip switch:

[4.02.1] user@computer:~$ cd mirage-tcpip [(!!! setup.data 4.02.1)] (separate_arp) user@computer:~/mirage-tcpip$

This hasn’t saved me any grief yet, but I’m sure it will soon enough.

July 27, 2015 09:13 AM

July 21, 2015

Github OCaml jobs

Full Time: Sr. Software Engineer-Java Scala at The Weather Channel in Atlanta, GA

The Weather Channel has two 6-month contracting and five full-time openings!

Our Java Development team is seeking a passionate programmer who is excited about the opportunity to use Scala and functional programming techniques in professional software development environment. Key responsibilities in this position include design, implementation, testing, and deployment of sophisticated backend software systems using Scala, Java, and Clojure. The incumbent will be part of a motivated team of developers who encourage sharing and advancing programming skills and passion for development. The ideal candidate will be proficient with Functional Programming using Scala, Clojure, ML, Lisp, Ocaml (or similar language), Object Oriented development using Java, C++, C#, Ruby (or similar language), Unit Testing and Version Control.

Essential Duties and Responsibilities

  • Participate in the design and implementation of software systems
  • Participate in code reviews- Reviews are held to ensure a high level of software quality and to share knowledge with team members
  • Participate in, and adhere to, professional software engineering practices using such tools and methodologies as Agile Software Development, Test Driven Design, Continuous Integration, Source Code Management, Jira, and Stash
  • Address production issues in a timely manner
  • Maintain a high level of proficiency with Computer Science/Software Engineering knowledge and contribute to the technical skills growth of other team members
  • Work well independently and as part of a team

Education, Experience, Certification Requirements

  • Bachelor's Degree in computer science or related field, or its equivalent.
  • 5 years of related experience in full cycle software development with demonstrated achievements and progressive responsibilities

Knowledge, Skills and Abilities

  • Highly proficient & demonstrated advanced technical level software design, debugging, documentation, and testing skills, requiring application of extensive technical knowledge & skills at the subject matter expert level to complete & document projects
  • Excellent Written/Verbal communication skills
  • Comprehensive knowledge of professional software development process and concepts, with the ability to lead and teach others.
  • Advanced level proficiency and breadth in required code development in area of specialization.
  • Developed and demonstrated proficiency to resolve a wide range of moderately complex business problems and opportunities where analysis of data requires evaluation of identifiable factors.
  • Comprehensive and functional network of senior-level internal/external personnel in own area of expertise.
  • Demonstrated proficiency and application of business software lifecycle development.
  • Comprehensive knowledge of Agile methodology.
  • Ability to communicate and coordinate projects in a team leadership capacity.

July 21, 2015 06:48 PM

Erik de Castro Lopo

Building the LLVM Fuzzer on Debian.

I've been using the awesome American Fuzzy Lop fuzzer since late last year but had also heard good things about the LLVM Fuzzer. Getting the code for the LLVM Fuzzer is trivial, but when I tried to use it, I ran into all sorts of road blocks.

Firstly, the LLVM Fuzzer needs to be compiled with and used with Clang (GNU GCC won't work) and it needs to be Clang >= 3.7. Now Debian does ship a clang-3.7 in the Testing and Unstable releases, but that package has a bug (#779785) which means the Debian package is missing the static libraries required by the Address Sanitizer options. Use of the Address Sanitizers (and other sanitizers) increases the effectiveness of fuzzing tremendously.

This bug meant I had to build Clang from source, which nnfortunately, is rather poorly documented (I intend to submit a patch to improve this) and I only managed it with help from the #llvm IRC channel.

Building Clang from the git mirror can be done as follows:

  mkdir LLVM
  cd LLVM/
  git clone http://llvm.org/git/llvm.git
  (cd llvm/tools/ && git clone http://llvm.org/git/clang.git)
  (cd llvm/projects/ && git clone http://llvm.org/git/compiler-rt.git)
  (cd llvm/projects/ && git clone http://llvm.org/git/libcxx.git)
  (cd llvm/projects/ && git clone http://llvm.org/git/libcxxabi)

  mkdir -p llvm-build
  (cd llvm-build/ && cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX=$(HOME)/Clang/3.8 ../llvm)
  (cd llvm-build/ && make install)

If all the above works, you will now have working clang and clang++ compilers installed in $HOME/Clang/3.8/bin and you can then follow the examples in the LLVM Fuzzer documentation.

July 21, 2015 10:08 AM

July 20, 2015

Functional Jobs

OCaml server-side developer at Ahrefs Research (Full-time)

Who we are

Ahrefs Research is a San Francisco branch of Ahrefs Pte Ltd (Singapore), which runs an internet-scale bot that crawls whole Web 24/7, storing huge volumes of information to be indexed and structured in timely fashion. On top of that Ahrefs is building analytical services for end-users.

Ahrefs Research develops a custom petabyte-scale distributed storage to accommodate all that data coming in at high speed, focusing on performance, robustness and ease of use. Performance-critical low-level part is implemented in C++ on top of a distributed filesystem, while all the coordination logic and communication layer, along with API library exposed to the developer is in OCaml.

We are a small team and strongly believe in better technology leading to better solutions for real-world problems. We worship functional languages and static typing, extensively employ code generation and meta-programming, value code clarity and predictability, constantly seek out to automate repetitive tasks and eliminate boilerplate, guided by DRY and following KISS. If there is any new technology that will make our life easier - no doubt, we'll give it a try. We rely heavily on opensource code (as the only viable way to build maintainable system) and contribute back, see e.g. https://github.com/ahrefs . It goes without saying that our team is all passionate and experienced OCaml programmers, ready to lend a hand or explain that intricate ocamlbuild rule.

Our motto is "first do it, then do it right, then do it better".

What we need

Ahrefs Research is looking for backend developer with deep understanding of operating systems, networks and taste for simple and efficient architectural designs. Our backend is implemented mostly in OCaml and some C++, as such proficiency in OCaml is very much appreciated, otherwise a strong inclination to intensively learn OCaml in a short term will be required. Understanding of functional programming in general and/or experience with other FP languages (F#,Haskell,Scala,Scheme,etc) will help a lot. Knowledge of C++ is a plus.

The candidate will have to deal with the following technologies on the daily basis:

  • networks & distributed systems
  • 4+ petabyte of live data
  • OCaml
  • C++
  • linux
  • git

The ideal candidate is expected to:

  • Independently deal with and investigate bugs, schedule tasks and dig code
  • Make argumented technical choice and take responsibility for it
  • Understand the whole technology stack at all levels : from network and userspace code to OS internals and hardware
  • Handle full development cycle of a single component, i.e. formalize task, write code and tests, setup and support production (devops)
  • Approach problems with practical mindset and suppress perfectionism when time is a priority

These requirements stem naturally from our approach to development with fast feedback cycle, highly-focused personal areas of responsibility and strong tendency to vertical component splitting.

What you get

We provide:

  • Competitive salary
  • Modern office in San Francisco SOMA (Embarcadero)
  • Informal and thriving atmosphere
  • First-class workplace equipment (hardware, tools)
  • No dress code

Get information on how to apply for this position.

July 20, 2015 08:07 AM

July 18, 2015

Jane Street

Introducing Incremental

I'm pleased to announce the release of Incremental (well commented mli here), a powerful library for building self-adjusting computations, i.e., computations that can be updated efficiently when their inputs change.

At its simplest, you can think of a self-adjusting computation as a fancy spreadsheet. In a spreadsheet, each cell contains either simple data, or an equation that describes how the value in this cell should be derived from values in other cells. Collectively, this amounts to a graph-structured computation, and one of the critical optimizations in Excel is that when some of the cells change, Excel only recomputes the parts of the graph that depend on those changed cells.

What makes self-adjusting computation (or SAC) different from a spreadsheet is its dynamism. The structure of the computational graph in an SAC can change at runtime, in response to the changing input data.

This dynamism gives you quite a bit of flexibility, which can be used in different ways. Here are a few.

On-line combinatorial algorithms. Incremental is based on work by Umut Acar et. al.., on self adjusting computations (that's where the term comes from), and there, they were mostly interested in building efficient on-line versions of various combinatoral algorithms. In many cases, they could match the asymptotic complexity of custom on-line algorithms by fairly simple incrementalizations of all-at-once algorithms.

Incremental GUI construction. One simple and natural way to model a GUI application is to structure your display as a function that geneates a view from some more abstract data model.

Having a function that constructs a view from scratch at every iteration is simple, but prohibitively expensive. But if you can write this function so that it produces an incrementalized computation, then you have a solution that is both simple and efficient. We've used this technique in a number of our UIs, to good effect.

This might remind you of how functional reactive programming (FRP) is used for construction of GUIs in languages like Elm. SAC and FRP have different semantics --- FRP is mostly concerned with time-like computations, and SAC is mostly about optimizing DAG-structured computations --- but they are nonetheless closely related, especially at the implementation level. You can see my post here for a description of the broader conceptual landscape that includes both FRP and SAC.

Configurable computations. An example that comes from our own work is risk calculations. Calculating measures of risk of a portfolio involves combining data from a complex collection of interdependent models. Each of these models is dependent both on live data from the markets, and on configurations determined by users.

A config change could merely tweak a coefficient, or it could change the overall structure of the computation, say by changing the list of factors used by a given model. Incremental allows you to build a computation that can update efficiently in response to both simple data changes as well as more structural config changes, in one unified framework.

A taste of Incremental

It's hard to give a compelling example of Incremental in action in just a few lines of code, because what makes Incremental really useful is how it helps you build large and complex computations. Nonetheless, small examples can give you a sense of how the library works.

To that end, let's walk through a few small examples. To begin, we need to instantiate the Incremental functor.

  1. open Core.Std
  2. module Inc = Incremental_lib.Incremental.Make ()

Each instance thus generated is its own independent computational world. The Incremental functor is generative, meaning it mints fresh types each time it's applied, which prevents values from different incremental worlds from being mixed accidentally.

An Incremental computation always starts at its variables. Modifications to variables are how updates to input data are communicated to Incremental.

Let's write down a few variables corresponding to the dimensions of a rectangular prism.

  1. module Var = Inc.Var
  2.  
  3. (* dimensions of a rectangular prism *)
  4. let width_v = Var.create 3.
  5. let depth_v = Var.create 5.
  6. let height_v = Var.create 4.

We can use Var.watch to get the (trivial) incremental computions associated with each variable.

  1. let width = Var.watch width_v
  2. let depth = Var.watch depth_v
  3. let height = Var.watch height_v

The following is an incremental computation of the base area of the prism, and of the volume.

  1. let base_area =
  2. Inc.map2 width depth ~f:( *. )
  3. let volume =
  4. Inc.map2 base_area height ~f:( *. )

In order to get information out of an incremental computation, we need to explicitly mark which nodes we want data from by creating observer nodes. Because it knows which nodes are observed, the framework can track what parts of the computation are still necessary to the results.

  1. let base_area_obs = Inc.observe base_area
  2. let volume_obs = Inc.observe volume

In order to force the computation to run, we need to explicitly call Inc.stabilize. Here's some code that uses stabilize to run the computation and then gets the information from the observers.

  1. let () =
  2. let v = Inc.Observer.value_exn in
  3. let display s =
  4. printf "%20s: base area: %F; volume: %F\n"
  5. s (v base_area_obs) (v volume_obs)
  6. in
  7. Inc.stabilize ();
  8. display "1st stabilize";
  9. Var.set height_v 10.;
  10. display "after set height";
  11. Inc.stabilize ();
  12. display "2nd stabilize"

If we run this, we'll se the following output:

1st stabilize: base area: 25.; volume: 125.
    after set height: base area: 25.; volume: 125.
       2nd stabilize: base area: 25.; volume: 250.

Note that setting the height isn't enough to change the observed values; we need a stabilization to make that happen.

That's a fairly trivial computation, and there certainly isn't much to incrementalize. Let's try something a little more complicated: a function for merging together an array of incrementals, using some commutative and associative operator like addition or max.

  1. let rec merge ar ~f =
  2. if Array.length ar <= 1 then ar.(0)
  3. else
  4. let len = Array.length ar in
  5. let len' = len / 2 + len % 2 in
  6. let ar' =
  7. Array.init len' ~f:(fun i ->
  8. if i * 2 + 1 >= len then ar.(i*2)
  9. else Inc.map2 ar.(i*2) ar.(i*2+1) ~f)
  10. in
  11. merge ar' ~f;;

Because this is done using a binary tree as the dependency graph, the complexity of updating an element is log(n), where n is the size of the array. We can use this for, computing an average:

  1. let average ar =
  2. let sum = merge ar ~f:(+.) in
  3. Inc.map sum ~f:(fun s -> s /. float (Array.length ar))

This works, but we can do better performance-wise, at least, if our merge operation has an inverse. In that case, maintaining the sum can in principle be done on constant time, by first, removing the old value before adding in the new. Incremental has a function for taking advantage of this structure.

  1. let sum ar =
  2. Inc.unordered_array_fold ~f:(+.) ~f_inverse:(-.) ar;;

Now, let's say we want to do something a little more dynamic: in particular, what if we want to compute the average of a prefix of the given array? For that, we need to use the bind function, which allows us to produce new incremental nodes within an incremental computation.

  1. let average_of_prefix ar length =
  2. Inc.bind length (fun length ->
  3. average (Array.init length ~f:(fun i -> ar.(i))))

The type of this function is float Inc.t array -> int Inc.t -> float Inc.t, so the length of the prefix is a fully fledged part of the incremental computation. As a result, the dependency structure of this computation changes dynamically, e.g., if the value of length is 7, then the computation only depends on length and the first 7 elements of the array.

Hopefully this gives you enough of a sense of what Incremental is about to start thinking about where it might be useful for you. Note that the overhead of incremental is not inconsiderable --- on my laptop, firing a single node takes on the order of 30ns, which is far more than, say, summing numbers together. Incremental tends to be useful when the computation that is put into a single node is large relative to that overhead, or when the computational graph is large relative to the sub-graph that needs to be recomputed. Our experience has been that there are plenty of applications in this space that can benefit from Incremental.

by Yaron Minsky at July 18, 2015 03:25 PM

GaGallium

Formally verifying the complexity of OCaml programs with CFML -- part 1

<style type="text/css"> .hll { background-color: #ffffcc } .c { color: #408080; font-style: italic } /* Comment */ .k { color: #008000; font-weight: bold } /* Keyword */ .o { color: #666666 } /* Operator */ .cm { color: #408080; font-style: italic } /* Comment.Multiline */ .cp { color: #BC7A00 } /* Comment.Preproc */ .c1 { color: #408080; font-style: italic } /* Comment.Single */ .cs { color: #408080; font-style: italic } /* Comment.Special */ .gd { color: #A00000 } /* Generic.Deleted */ .ge { font-style: italic } /* Generic.Emph */ .gr { color: #FF0000 } /* Generic.Error */ .gh { color: #000080; font-weight: bold } /* Generic.Heading */ .gi { color: #00A000 } /* Generic.Inserted */ .go { color: #888888 } /* Generic.Output */ .gp { color: #000080; font-weight: bold } /* Generic.Prompt */ .gs { font-weight: bold } /* Generic.Strong */ .gu { color: #800080; font-weight: bold } /* Generic.Subheading */ .gt { color: #0044DD } /* Generic.Traceback */ .kc { color: #008000; font-weight: bold } /* Keyword.Constant */ .kd { color: #008000; font-weight: bold } /* Keyword.Declaration */ .kn { color: #008000; font-weight: bold } /* Keyword.Namespace */ .kp { color: #008000 } /* Keyword.Pseudo */ .kr { color: #008000; font-weight: bold } /* Keyword.Reserved */ .kt { color: #B00040 } /* Keyword.Type */ .m { color: #666666 } /* Literal.Number */ .s { color: #BA2121 } /* Literal.String */ .na { color: #7D9029 } /* Name.Attribute */ .nb { color: #008000 } /* Name.Builtin */ .nc { color: #0000FF; font-weight: bold } /* Name.Class */ .no { color: #880000 } /* Name.Constant */ .nd { color: #AA22FF } /* Name.Decorator */ .ni { color: #999999; font-weight: bold } /* Name.Entity */ .ne { color: #D2413A; font-weight: bold } /* Name.Exception */ .nf { color: #0000FF } /* Name.Function */ .nl { color: #A0A000 } /* Name.Label */ .nn { color: #0000FF; font-weight: bold } /* Name.Namespace */ .nt { color: #008000; font-weight: bold } /* Name.Tag */ .nv { color: #19177C } /* Name.Variable */ .ow { color: #AA22FF; font-weight: bold } /* Operator.Word */ .w { color: #bbbbbb } /* Text.Whitespace */ .mb { color: #666666 } /* Literal.Number.Bin */ .mf { color: #666666 } /* Literal.Number.Float */ .mh { color: #666666 } /* Literal.Number.Hex */ .mi { color: #666666 } /* Literal.Number.Integer */ .mo { color: #666666 } /* Literal.Number.Oct */ .sb { color: #BA2121 } /* Literal.String.Backtick */ .sc { color: #BA2121 } /* Literal.String.Char */ .sd { color: #BA2121; font-style: italic } /* Literal.String.Doc */ .s2 { color: #BA2121 } /* Literal.String.Double */ .se { color: #BB6622; font-weight: bold } /* Literal.String.Escape */ .sh { color: #BA2121 } /* Literal.String.Heredoc */ .si { color: #BB6688; font-weight: bold } /* Literal.String.Interpol */ .sx { color: #008000 } /* Literal.String.Other */ .sr { color: #BB6688 } /* Literal.String.Regex */ .s1 { color: #BA2121 } /* Literal.String.Single */ .ss { color: #19177C } /* Literal.String.Symbol */ .bp { color: #008000 } /* Name.Builtin.Pseudo */ .vc { color: #19177C } /* Name.Variable.Class */ .vg { color: #19177C } /* Name.Variable.Global */ .vi { color: #19177C } /* Name.Variable.Instance */ .il { color: #666666 } /* Literal.Number.Integer.Long */ </style> <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script>

In a recent paper, Arthur Charguéraud and François Pottier present a formal proof of an OCaml implementation of Tarjan's union-find algorithm. The proof covers two aspects: first, functional correctness ("the algorithm is correct"), but also asymptotic complexity. For example, one of the results of the paper is that the link function runs in \(O(α(n))\) elementary steps, \(α\) being the inverse of the Ackermann function.

Actually, the complexity results of the paper are not presented using the "big-O" notation, commonly used in asymptotic complexity proofs: the various constants are explicit and manipulated through the proof.

In these two blog posts, I'll describe a tentative extension of CFML, the Coq library François and Arthur used for their proof. Its goal is to enable asymptotic reasoning, big-O notations and such to be used to prove complexity of programs.

CFML: verify OCaml programs in Coq

CFML is a Coq library useful to prove properties about OCaml programs. Its typical usage is as follows: (1) write some OCaml code; (2) use CFML's generator to parse the OCaml code and produce a "characteristic formula"; expressed as a Coq axiom, that describes the semantics of the program; (3) import this formula and write a specification in Coq for the program; (4) prove the specification using CFML tactics. The specifications are expressed using separation logic, a logical framework convenient when talking about mutable data.

Let's illustrate this by proving correctness of the incr function (a function that increments a reference).

OCaml code:

let incr r =
  r := !r + 1

After calling CFML's generator, step (2) produces a Coq source file containing the characteristic formula for incr. This formula has the same shape and structure as the OCaml program, which will be useful to read the goals produced by the CFML tactics.

Parameter incr_cf :
  tag tag_top_fun Label_default
    Body incr r =>
    (LetApp _x4 := ml_get r in
               App ml_set r (_x4 + 1);)

The user doesn't need to read the produced file nor the characteristic formula, though. We'll just need to Require it at the beginning of our proof.

Step (3) is writing a specification: a pair of a precondition, that must hold for the function to run, and a postcondition, describing the returned value and the state of the memory after executing the function. In our case, before running incr, r must be a reference containing some integer value i; after running incr, r must contain i+1.

Lemma incr_spec :
  Spec incr r |R>> forall (i: int),
    R (r ~~> i) (# r ~~> (i+1)).

Some hairy notations are involved, but basically, (r ~~> i) is the precondition, and (# r ~~> (i+1)) the postcondition. The ~~> notation denotes a "points-to" relation: here, r is a pointer to a memory cell containing the value i. The # means that incr returns unit.
Let us prove interactively this specification. To prove a goal of the form Spec foo ..., we first use the xcf tactic to introduce the characteristic formula of foo.

Lemma incr_spec :
  Spec incr r |R>> forall (i: int),
    R (r ~~> i) (# r ~~> (i+1)).
Proof.
  xcf. intros.

The resulting goal is of the form formula pre post, formula being a part of the characteristic formula, pre and post the precondition and postcondition for the corresponding piece of code.

r : loc
i : int
============================
 (LetApp _x4 := ml_get r in
   App ml_set r (_x4 + 1);) (r ~~> i \* \[]) (# r ~~> (i + 1))

We make the proof progress step by step, traversing the program shape, by applying tactics that match the head constructor of formula: here, it's a LetApp; we use the xapps tactic. If the goal started with a If, we would have used xif, etc. We need another xapp after that, for the App constructor. Finally, what remains to be proven is a "heap implication", stating that the final memory matches the post-condition.

============================
 # r ~~> (i + 1) \* \[] ===> # r ~~> (i + 1) \* Hexists H', H'

The hsimpl tactic does the job.

Lemma incr_spec :
  Spec incr r |R>> forall (i: int),
    R (r ~~> i) (# r ~~> (i+1)).
Proof.
  xcf. intros.
  xapps. xapp. hsimpl.
Qed.

CFML + time credits: prove algorithmic complexity

We rely on (and admit) the following property of the OCaml compiler: if one ignores the cost of garbage collection, counting the number of function calls and for/while loop iterations performed by the source program is an accurate measure, up to a constant factor, of the number of machine instructions executed by the compiled program.

We thus consider a new kind of heap resource (something that can be put in pre/postconditions): time credits. The program is now required to consume a time credit at each step of computation.

If we produce the characteristic formula for incr with the "-credits" option, its specification and proof becomes:

Lemma incr_spec :
  Spec incr r |R>> forall (i: int),
    R (\$ 1 \* r ~~> i) (# r ~~> (i+1)).
Proof.
  xcf. intros.
  xpay.
  xapps. xapp. hsimpl.
Qed.

What changed? We now have to give one credit (\$ 1) in the precondition of incr, as it performs one step of computation. The credit does not appear in the postcondition: it is consumed by the function. In the proof, we use the xpay tactic to justify that we are indeed able to pay for the computation step.

The complexity of a function is then represented as the number of credits contained in the precondition: that is, how much you have to pay for it to run. Interestingly, credits can also be stored in the heap for later consumption: this enables amortization (a standard reasoning in the literature - e.g. in Introduction to Algorithms by Cormen et al.).

Asymptotic reasoning on time credits

The proof of Union-Find states and tracks explicitly how many credits are needed at every step of the proof. For example, the precondition of find requires "\(α(N) + 2\) credits". We would like to be able to write instead "\(O(α(N))\) credits"; first to match usual informal algorithmic reasoning, but also to enable asymptotic reasoning, that is, proofs of properties true "for a big enough \(N\)".

Reasoning with big-Os is also more modular: the exact number of credits an auxiliary function needs may change without impacting proofs using it, as long as the asymptotic bound stays the same.

The prototype extension of CFML I developed tries to address this. Its features are as follows:

  • Given an explicit cost function (a "number of credits"), allow to prove an asymptotic "big-O" bound for it. The cost function may be defined after other functions asymptotically bounded by big-Os.
  • Allow various manipulations on big-O bounds: composition, parameter transformation, etc.

This does not include for the moment the proof of Swiss-army-knife theorems like the master theorem; but they can now be stated, using the new asymptotic notions.

This seems to be only a matter of writing down the right definitions and lemmas. This is partly true, but it also appears that the "big-O" notation is often used in a quite informal way, so formalizing it requires extra attention on some aspects.

For example, the "\(O\)" notation requires implicitly some notion of "going towards infinity". When manipulating cost functions with multiple parameters (e.g., "\(f(m,n)\) is \(O(g(m,n))\)"), which notion of "going to infinity" we should choose is not obvious, and all are not equivalent: it'll depend on later use.

In our example, we could say both \(m\) and \(n\) have to go to infinity. But maybe we will want to fix \(m\) afterwards, and still have \(f(m,n)\) be a \(O(g(m, n))\); in this case we need an other notion of "going to infinity": \(m \times n + n\) is indeed a \(O(m \times n)\) for both \(m\) and \(n\) growing to infinity, but it is clearly not the case when fixing \(m = 0\)...

New notions introduced in CFML + asymptotic credits

To formalize these notions, and be able to talk more clearly about this little dilemma, we reuse the notion of filter from the literature.

The generic notion of filter describes a way of "going to infinity" in a set (\(\mathbb{Z}, \mathbb{Z}^2, \ldots\)). On \(\mathbb{Z}\), we will have one obvious filter, that will work in any situation, so the benefits of the method are not clear. However, as illustrated above, on \(\mathbb{Z}^2\) (i.e. for functions with two parameters), which notion of "going to infinity" should be used is not clear, and moreover depends on later usage of the specification. Filters allow to formally describe this situation: one can define various filters for \(\mathbb{Z}^2\), and clearly state which is used in a \(O()\).

Following are the new predicates introduced in CFML: filter, and more. They are all needed to understand what's going on during the proofs, however in simple scripts most of them needn't be manipulated directly by the user.

Definition filter A := (A -> Prop) -> Prop.
Class Filter {A} (ultimately: filter A).

A filter is a set of sets (a set of elements of type A being of type A -> Prop), that represents a "set of neighborhoods"; in our case, neighborhoods of infinity. The Filter class bundles additional properties that must be satisfied, like e.g. stability by intersection. This definition allows us to write ultimately P, when ultimately is a filter, meaning that P is true at some point "when going towards infinity".

Definition dominated (ultimately: filter A) (f g: A -> B).

The textbook definition of "\(f\) is \(O(g)\)", using filters: more or less, exists k, ultimately (fun x => |f x| ≤ k |g x|).

Definition monotonic (leA: A -> A -> Prop) (leB: B -> B -> Prop) (f: A -> B).
Definition monotonic_after leA leB (f: A -> B) (a0: A).

Monotonicity of f wrt. the relations leA on the domain and leB on the codomain. monotonic_after means "monotonic after a0"; usually combined with a filter to have a notion of asymptotic monotonicity.

Definition idominated (ultimately: filter A) leA (f g: A -> B).

A variant of dominated, that we'll use in practice: it allows more lemmas to be automatic (without side-conditions to prove). idominated _ _ f g basically means that either f and g are \(O(1)\), either dominated _ f g (\(f\) is \(O(g)\)); and g is asymptotically monotonic.

Definition SpecO (ultimately: filter A) leA (g: A -> Z)
                   (spec: (A -> Z) -> Prop) :=
  exists (f: A -> Z),
    (forall x, 0 <= f x) /\
    monotonic _ _ f      /\
    idominated _ _ f g   /\
    spec f.

A wrapper useful to state specifications using big-Os: it basically means "there exists some f function, which is a \(O(g)\), and spec f is true" (plus some necessary additional facts). Usually, spec is of the form Spec .. |R>> ..., i.e. a standard CFML specification.

As an appetizer for the second part, I'll leave you with two specifications that use big-Os. The first is for our incr function, the second is for a mktree recursive function, that builds a complete tree of depth \(n\) is \(O(2^n)\) time.

Lemma incr_spec :
  SpecO1 (fun F =>
    Spec incr r |R>> forall (i: int),
      R (\$ F tt \* r ~~> i) (# r ~~> (i+1))).
Lemma mktree_spec :
  SpecO (fun n => 2 ^ n) (fun F =>
    Spec mktree (depth: int) (x: a) |R>>
    0 <= depth ->
    R (\$ F depth) (fun (t: tree a) => \[])).

by Armaël Guéneau at July 18, 2015 08:00 AM