OCaml Planet

July 24, 2014

OCaml Labs compiler hacking

Seventh OCaml compiler hacking session (at Citrix)

For the seventh Cambridge OCaml compiler-hacking session we'll be meeting at the Citrix office in the Cambridge Science Park on 6.30pm Friday 1st August. Thanks to Citrix for supporting and hosting the session!

We'll kick off with a demo from Frédéric Bour of modular implicits, an OCaml extension that adds support for overloading.

If you're planning to come along, it'd be helpful if you could indicate interest via Doodle and sign up to the mailing list to receive updates:

Where: Citrix Systems Research & Development Ltd.
Building 101
Cambridge Science Park
Milton Road
Cambridge, CB4 0FY
United Kingdom

When: 6.30pm, Friday 1st August

Who: anyone interested in improving OCaml. Knowledge of OCaml programming will obviously be helpful, but prior experience of working on OCaml internals isn't necessary.

What: fixing bugs, implementing new features, learning about OCaml internals

Wiki: https://github.com/ocamllabs/compiler-hacking/wiki

We're defining "compiler" pretty broadly, to include anything that's part of the standard distribution, which means at least the standard library, runtime, tools (ocamldep, ocamllex, ocamlyacc, etc.), ocamlbuild, the documentation, and the compiler itself. We'll have suggestions for mini-projects for various levels of experience (see also some things we've worked on in previous sessions), but feel free to come along and work on whatever you fancy.

We'll also be ordering pizza, so if you want to be counted for food you should aim to arrive by 6.45pm.

by OCaml Labs (cl-ocamllabs@lists.cam.ac.uk) at July 24, 2014 09:00 AM

July 23, 2014

Richard Jones

libguestfs now works on 64 bit ARM


Pictured above is my 64 bit ARM server. It’s under NDA so I cannot tell you who supplied it or even show you a proper photo.

However it runs Fedora 21 & Rawhide:

Linux arm64.home.annexia.org 3.16.0-0.rc6.git1.1.efirtcfix1.fc22.aarch64 #1 SMP Wed Jul 23 12:15:58 BST 2014 aarch64 aarch64 aarch64 GNU/Linux

libvirt and libguestfs run fine, with full KVM acceleration, although right now you have to use qemu from git as the Rawhide version of qemu is not new enough.

Also OCaml 4.02.0 beta works (after we found and fixed a few bugs in the arm64 native code generator last week).

by rich at July 23, 2014 08:49 PM

July 22, 2014

Open Mirage

Mirage v2.0: a recap of the new features

The first release of Mirage back in December 2013 introduced the prototype of the unikernel concept, which realised the promise of a safe, flexible mechanism to build highly optimized software stacks purpose-built for deployment in the public cloud (more background on this). Since then, we've been hard at work using and extending Mirage for real projects and the community has been steadily growing.

We're thrilled to announce the release of Mirage OS v2.0 today! Over the past few weeks the team has been hard at work blogging about all the new features in this latest release, coordinated by the tireless Amir Chaudhry:

All the libraries required for these new features are regularly released into the OPAM package manager, so just follow the installation instructions to give them a spin. A release this size probably introduces minor hiccups that may cause build failures, so we very much encourage bug reports on our issue tracker or questions to our mailing lists. Don't be shy: no question is too basic, and we'd love to hear of any weird and wacky uses you put this new release to! And finally, the lifeblood of Mirage is about sharing and publishing libraries that add new functionality to the framework, so do get involved and open-source your own efforts.

Breaking news: Richard Mortier and I will be speaking at OSCON this week on Thursday morning about the new features in F150 in the Cloud Track. Come along if you are in rainy Portland at the moment!

by Anil Madhavapeddy (anil@recoil.org) at July 22, 2014 11:00 AM

Building an ARMy of Xen unikernels

Mirage has just gained the ability to compile unikernels for the Xen/arm32 platform, allowing Mirage guests to run under the Xen hypervisor on ARM devices such as the Cubieboard 2 and CubieTruck.


The ARMv7 architecture introduced the (optional) Virtualization Extensions, providing hardware support for running virtual machines on ARM devices, and Xen's ARM Hypervisor uses this to support hardware accelerated ARM guests.

Mini-OS is a tiny OS kernel designed specifically for running under Xen. It provides code to initialise the CPU, display messages on the console, allocate memory (malloc), and not much else. It is used as the low-level core of Mirage's Xen implementation.

Mirage v1 was built on an old version of Mini-OS which didn't support ARM. For Mirage v2, we have added ARM support to the current Mini-OS (completing Karim Allah Ahmed's initial ARM port) and made Mirage depend on it as an external library. This means that Mirage will automatically gain support for other architectures that get added later. We are currently working with the Xen developers to get our Mini-OS fork upstreamed.

In a similar way, we have replaced Mirage v1's bundled maths library with a dependency on the external OpenLibm, which we also extended with ARM support (this was just a case of fixing the build system; the code is from FreeBSD's libm, which already supported ARM).

Mirage v1 also bundled dietlibc to provide its standard C library. A nice side-effect of this work came when we were trying to separate out the dietlibc headers from the old Mini-OS headers in Mirage. These had rather grown together over time and the work was proving difficult, until we discovered that we no longer needed a libc at all, as almost everything that used it had been replaced with pure OCaml versions! The only exception was the printf code for formatting floating point numbers, which OCaml uses in its printf implementation. We replaced that by taking the small fmt_fp function from musl libc.

Here's the final diffstat of the changes to mirage-platform adding ARM support:

778 files changed, 1949 insertions(+), 59689 deletions(-)

Trying it out

You'll need an ARM device with the Virtualization Extensions. I've been testing using the Cubieboard 2 (and CubieTruck):


The first step is to install Xen. Running Xen on the Cubieboard2 documents the manual installation process, but you can now also use mirage/xen-arm-builder to build an SDcard image automatically. Copy the image to the SDcard, connect the network cable and power, and the board will boot Xen.

Once booted you can ssh to Dom0, the privileged Linux domain used to manage the system, install Mirage, and build your unikernel just as on x86. Currently, you need to select the Git versions of some components. The following commands will install the necessary versions if you're using the xen-arm-builder image:

$ opam init
$ opam install mirage-xen-minios
$ opam pin mirage https://github.com/talex5/mirage.git#link_c_stubs
$ opam pin mirage-xen https://github.com/mirage/mirage-platform
$ opam pin tcpip https://github.com/talex5/mirage-tcpip.git#checksum
$ opam install mirage

Technical details

One of the pleasures of unikernels is that you can comprehend the whole system with relatively little effort, and those wishing to understand, debug or contribute to the ARM support may find the following technical sections interesting. However, you don't need to know the details of the ARM port to use it, as Mirage abstracts away the details of the underlying platform.

The boot process

An ARM Mirage unikernel uses the Linux zImage format, though it is not actually compressed. Xen will allocate some RAM for the image and load the kernel at the offset 0x8000 (32 KB).

Execution begins in arm32.S, with the r2 register pointing to a Flattened Device Tree (FDT) describing details of the virtual system. This assembler code performs a few basic boot tasks:

  1. Configuring the MMU, which maps virtual addresses to physical addresses (see next section).
  2. Turning on caching and branch prediction.
  3. Setting up the exception vector table (this says how to handle interrupts and deal with various faults, such as reading from an invalid address).
  4. Setting up the stack pointer and calling the C function arch_init.

arch_init makes some calls to the hypervisor to set up support for the console and interrupt controller, and then calls start_kernel.

start_kernel (in libxencaml) sets up a few more features (events, malloc, time-keeping and grant tables), then calls caml_startup.

caml_startup (in libocaml) initialises the garbage collector and calls caml_program, which is your application's main.ml.

The address space

With the Virtualization Extensions, there are two stages to converting a virtual memory address (used by application code) to a physical address in RAM. The first stage is under the control of the guest VM, mapping the virtual address to what the guest believes is the physical address (this address is referred to as the Intermediate Physical Address or IPA). The second stage, under the control of Xen, maps the IPA to the real physical address. The tables holding these mappings are called translation tables.

Mirage's memory needs are simple: most of the RAM should be used for the garbage-collected OCaml heap, with a few pages used for interacting with Xen (these don't go on the OCaml heap because they must be page aligned and must not move around).

Xen does not commit to using a fixed address as the IPA of the RAM, but the C code needs to run from a known location. To solve this problem the assembler code in arm32.S detects where it is running from and sets up a virtual-to-physical mapping that will make it appear at the expected location, by adding a fixed offset to each virtual address. For example, on Xen/unstable, we configure the beginning of the virtual address space to look like this (on Xen 4.4, the physical addresses would start at 80000000 instead):

Virtual addressPhysical address (IPA)Purpose
40000040000000Stack (16 KB)
40400040004000Translation tables (16 KB)
40800040008000Kernel image

The physical address is always at a fixed offset from the virtual address and the addresses wrap around, so virtual address c0400000 maps back to physical address 0 (in this example).

The stack, which grows downwards, is placed at the start of RAM so that a stack overflow will trigger a fault rather than overwriting other data.

The 16 KB translation table is an array of 4-byte entries each mapping 1 MB of the virtual address space, so the 16 KB table is able to map the entire 32-bit address space (4 GB). Each entry can either give the physical section address directly (which is what we do) or point to a second-level table mapping individual 4 KB pages. By using only the top-level table we reduce possible delays due to TLB misses.

After the kernel code comes the data (constants and global variables), then the bss section (data that is initially zero, and therefore doesn't need to be stored in the kernel image), and finally the rest of the RAM, which is handed over to the malloc system.


The current version seems to be working well on Xen 4.4 (stable) and the 4.5 development version, but has only been lightly tested. If you have any problems or questions, or get it working on other devices, please let us know!

by Thomas Leonard (talex5@gmail.com) at July 22, 2014 10:00 AM

July 19, 2014

Shayne Fletcher

Merge sort

Merge sort

Here's another divide and conquer algorithm. This one is for sorting a sequence.

Conceptually, a merge sort works like this (see http://en.wikipedia.org/wiki/Merge_sort):

  • Divide the unsorted list into n sub-lists, each containing one element (a list of one element is trivially sorted);
  • Repeatedly merge sublists to produce new sorted sub-lists until there is only one sub-list remaining : this will be the sorted list.

In the following OCaml, we are drawing on inspiration from the the Haskell standard prelude for the part of the algorithm concerned with dividing the unsorted list : functions take, drop and split.

(**{b Merge sort}, an {i O(n log n)} comparison based sorting
algorithm *)
module type MERGESORT = sig

(**[take n] applied to a list [xs], returns the prefix of [xs] of
length [n] or [xs] itself if [n > List.length xs] e.g. [take 2
[1; 2; 3]] {i = } [[1; 2]]*)
val take : int -> 'a list -> 'a list

(**[drop n] applied to a list [xs], returns the suffix of [xs]
after the first [n] elements or, [[]] if [n > List.length xs]
e.g. [drop 2 [1; 2; 3]] {i = } [[3]]*)
val drop : int -> 'a list -> 'a list

(**[split n xs] is equivalent to [take n xs, drop n xs] e.g.
[split 2 [1; 2; 3]] {i = } [([1; 2], [3])]*)
val split : int -> 'a list -> 'a list * 'a list

(**[merge] given two {b sorted} sequences [xs] and [ys] returns a
single sorted sequence of the elements of [xs] and [ys]
e.g. [merge [1; 3] [2; 4]] {i = } [[1; 2; 3; 4]]*)
val merge : 'a list -> 'a list -> 'a list

(**[merge_sort] works by splitting a sequence into two parts,
recursively sorting the two parts and merging the results into
a single sorted sequence e.g. [merge_sort [1; 2; -1; 0; 3]]
{i = } [[-1; 0; 1; 2; 3]]*)
val merge_sort : 'a list -> 'a list

module Merge_sort : MERGESORT = struct

let rec take k l =
match (k, l) with
| n, _ when n <= 0 -> []
| _, [] -> []
| n, (x :: xs) -> x :: take (n - 1) xs

let rec drop k l =
match (k, l) with
| n, xs when n <= 0 -> xs
| _, [] -> []
| n, (_ :: xs) -> drop (n - 1) xs

let rec split k l = take k l, drop k l

let rec merge l m =
match (l, m) with
| [], ys -> ys
| xs, [] -> xs
| ((x :: xs) as t), ((y :: ys) as s) ->
if x <= y then x :: (merge xs s) else y :: (merge t ys)

let rec merge_sort l =
let i = (List.length l) / 2 in
if i = 0 then l
let u, v = split i l in
let xs, ys = merge_sort u, merge_sort v in
merge xs ys


We can test our little program in the top-level like this:

# let l = Merge_sort.merge_sort [-1; 2; 0; 4; 3; 5];;
val l : int list = [-1; 0; 2; 3; 4; 5]

Here are the functions in C++ phrased as range algorithms.

//cl /EHsc /Femerge.exe /I c:/project/boost_1_55_0 merge.cpp

#include <boost/next_prior.hpp>
#include <boost/range.hpp>
#include <boost/range/algorithm/copy.hpp>

#include <vector>
#include <cstdlib>

namespace algo

template <class S, class D>
D take (std::size_t n, S src, D dst)
typedef boost::range_iterator<S>::type it_t;
it_t curr = boost::begin (src), last = boost::end (src);
if (n <= 0) return dst;
if (boost::empty (src)) return dst;
take (n-1, S (boost::next (curr), last), *dst++ = *curr);
return dst;


template <class S, class D>
D drop (std::size_t n, S src, D dst)
typedef boost::range_iterator<S>::type it_t;
it_t curr = boost::begin (src), last = boost::end (src);
if (n <= 0) return boost::range::copy (src, dst);
if (boost::empty (src))return dst;
drop (n-1, S (boost::next (curr), last), dst);
return dst;


template <class S, class D1, class D2>
void split (int n, S src, D1 dst1, D2 dst2)
{ take (n, src, dst1); drop (n, src, dst2); }


template <class S1, class S2, class D>
D merge (S1 src1, S2 src2, D dst)
typedef boost::range_iterator<S1>::type it1_t;
typedef boost::range_iterator<S2>::type it2_t;
typedef std::iterator_traits<it1_t>::value_type value1_t;
typedef std::iterator_traits<it2_t>::value_type value2_t;
if (boost::empty (src1)) return boost::copy (src2, dst);
if (boost::empty (src2)) return boost::copy (src1, dst);
it1_t curr1 = boost::begin (src1), last1 = boost::end (src1);
it2_t curr2 = boost::begin (src2), last2 = boost::end (src2);
value1_t x = *curr1;
value2_t y = *curr2;
if (x <= y)
merge (S1 (boost::next (curr1), last1), src2, *dst++ = x);
merge (src1, S2 (boost::next (curr2), last2), *dst++ = y);
return dst;


template <class S, class D>
D merge_sort (S src, D dst)
typedef boost::range_iterator<S>::type it_t;
typedef std::iterator_traits<it_t>::value_type value_t;
std::size_t i = boost::size (src)/2;
if (i == 0) return boost::range::copy (src, dst);
std::vector<value_t> u, v;
split (i, src, std::back_inserter (u), std::back_inserter (v));
std::vector<value_t> xs, ys;
merge_sort (u, std::back_inserter (xs));
merge_sort (v, std::back_inserter (ys));
merge (xs, ys, dst);
return dst;

}//namespace algo

The following usage example provides a lightweight test.

#include <boost/assign/list_of.hpp>

#include <utility>
#include <iterator>
#include <iostream>

int main ()
using std::pair;
using std::make_pair;
using std::ostream_iterator;
using boost::assign::list_of;

int ord[] = {1, 2, 3, 4};

auto src=make_pair(ord, ord + 4);
auto dst=ostream_iterator<int>(std::cout, ", ");

std::cout << "\ntake ():\n";

algo::take (0u, src, dst); std::cout << std::endl;
algo::take (1u, src, dst); std::cout << std::endl;
algo::take (2u, src, dst); std::cout << std::endl;
algo::take (3u, src, dst); std::cout << std::endl;
algo::take (4u, src, dst); std::cout << std::endl;
algo::take (5u, src, dst); std::cout << std::endl;

std::cout << "\ndrop ():\n";

algo::drop (5u, src, dst); std::cout << std::endl;
algo::drop (4u, src, dst); std::cout << std::endl;
algo::drop (3u, src, dst); std::cout << std::endl;
algo::drop (2u, src, dst); std::cout << std::endl;
algo::drop (1u, src, dst); std::cout << std::endl;
algo::drop (0u, src, dst); std::cout << std::endl;

std::cout << "\nsplit ():\n\n";

algo::split (0u, src, dst, dst); std::cout << std::endl;
algo::split (1u, src, dst, dst); std::cout << std::endl;
algo::split (2u, src, dst, dst); std::cout << std::endl;
algo::split (3u, src, dst, dst); std::cout << std::endl;
algo::split (4u, src, dst, dst); std::cout << std::endl;
algo::split (5u, src, dst, dst); std::cout << std::endl;

std::cout << "\nmerge ():\n\n";

std::vector <int> l=list_of(-1)(2), r=list_of(0)(1);
algo::merge (l, r, dst); std::cout << std::endl;

std::cout << "\nmerge_sort ():\n\n";

int unord[] = {-1, 2, 0, 4, 3, 5};
algo::merge_sort (make_pair (unord, unord + 6), dst);

return 0;
The above program produces the following output.

take ():

1, 2,
1, 2, 3,
1, 2, 3, 4,
1, 2, 3, 4,

drop ():

3, 4,
2, 3, 4,
1, 2, 3, 4,

split ():

1, 2, 3, 4,
1, 2, 3, 4,
1, 2, 3, 4,
1, 2, 3, 4,
1, 2, 3, 4,
1, 2, 3, 4,

merge ():

-1, 0, 1, 2,

merge_sort ():

-1, 0, 2, 3, 4, 5,

by Shayne Fletcher (noreply@blogger.com) at July 19, 2014 05:05 PM

Andrej Bauer

Reductions in computability theory from a constructive point of view

Here are the slides from my Logic Coloquium 2014 talk in Vienna. This is joint work with Kazuto Yoshimura from Japan Advanced Institute for Science and Technology.

Abstract: In constructive mathematics we often consider implications between non-constructive reasoning principles. For instance, it is well known that the Limited principle of omniscience implies that equality of real numbers is decidable. Most such reductions proceed by reducing an instance of the consequent to an instance of the antecedent. We may therefore define a notion of instance reducibility, which turns out to have a very rich structure. Even better, under Kleene’s function realizability interpretation instance reducibility corresponds to Weihrauch reducibility, while Kleene’s number realizability relates it to truth-table reducibility. We may also ask about a constructive treatment of other reducibilities in computability theory. I shall discuss how one can tackle Turing reducibility constructively via Kleene’s number realizability.

Slides with talk notes:  lc2014-slides-notes.pdf

by Andrej Bauer at July 19, 2014 12:50 PM

July 18, 2014

Jane Street

Simple top-down development in OCaml

Often when writing a new module, I want to write the interface first and save the implementation for later. This lets me use the module as a black box, extending the interface as needed to support the rest of the program. When everything else is finished, I can fill in the implementation, knowing the full interface I need to support. Of course sometimes the implementation needs to push back on the interface – this pattern isn't an absolute – but it's certainly a useful starting point. The trick is getting the program to compile at intermediate stages when the implementation hasn't been filled in.

The longhand way I've used previously is update the .mli file as I go with the interface, and fill in stubs for each definition in the .ml file. For example, let's say I'm writing a stack module:

(* stack.mli *)
type 'a t
val empty : 'a t
val push : 'a t -> 'a -> 'a t

(* stack.ml *)
type 'a t
let empty = failwith "unimplemented"
let push = failwith "unimplemented"

If I want to add a new value, I can just add a new stub:

(* stack.mli *)
type 'a t
val empty : 'a t
val push : 'a t -> 'a -> 'a t
val pop : 'a t -> ('a t * 'a) option

(* stack.ml *)
type 'a t
let empty = failwith "unimplemented"
let push = failwith "unimplemented"
let pop = failwith "unimplemented"

This works: I haven't had to commit to a representation for stacks or an implementation for the operations, but my whole program can use the stack module and will still compile. Of course, nothing will run until I get rid of those exceptions. On the other hand, this is still more work than I'd like. I have to fill in stubs for every definition; in many cases, the stubs themselves are longer to write than their types in the .mli file. As my interface changes, I have to do as much work in the .ml file as I do in the .mli to keep the two in sync. It turns out I can do better. By rearranging the files, I can just update the interface as I go:

(* stack_intf.ml *)
module type S = sig
  type 'a t
  val empty : 'a t
  val push : 'a t -> 'a -> 'a t
  val pop : 'a t -> ('a t * 'a) option

(* stack.mli *)
include Stack_intf.S

(* stack.ml *)
include (val (failwith "unimplemented") : Stack_intf.S)

Now I can just add to stack_intf.ml as I go. Both stack.mli and stack.ml will take on the right module types automatically, without the need to add explicit stubs for type or value definitions. Once I'm done filling in the interface, I can paste the module type into stack.mli, remove stack_intf.ml, and start filling in stack.ml with real definitions. But the initial phase is much easier for only having to add exactly what I need.

by Carl Eastlund at July 18, 2014 09:49 PM


Coq is hiring a specialized engineer for 2 years

As part of the ADT Coq-API, we are now offering a 2-year position for a specialized engineer.
For more details, see http://www.pps.univ-paris-diderot.fr/~letouzey/coq-api-position.html

by letouzey at July 18, 2014 09:41 AM

July 16, 2014


OCamlPro Highlights: May-June 2014

Here is a short report on some of our public activities in May and June 2014.

Towards OPAM 1.2

After a lot of discussions and work on OPAM itself, we are now getting to a clear workflow for OCaml developpers and packagers: the preliminary document for OPAM 1.2 is available here. The idea is that you can now easily create and test the metadata locally, before having to get your package included in any repo: there is less administrative burden and it's much quicker to start, fix it, test it and get it right.

Things getting pretty stable, we are closing the last few bugs and should be releasing 1.2~beta very shortly.

OCaml Hacking Session

We participated in the first OCaml hacking session in Paris organized by Thomas Braibant and supervized by Gabriel Scherer, who had kindly prepared in advance a selection of tasks. In particular, he came with a list of open bugs in Mantis that makes for good first descents in the compiler's code.

It was the first event of this kind for the OCaml Users in Paris (OUPS) meetup group. It was a success since everybody enjoyed it and some work has actually been achieved. We'll have to wait for the next one to confirm that !

On our front, Fabrice started working (with others) on a good, consensual Emacs profile; Pierre worked on building cross-compilers using Makefile templates; Benjamin wanted to evaluate the feasability of handling ppx extension nodes correctly inside Emacs, and it turns out that elisp tools exist for the task! You can see a first experiment running in the following screen capture, or even try the code (just open it in emacs, do a M-x eval-buffer on it and then a M-x tuareg-mode-with-ppx on an OCaml file). But beware, it's not yet very mainstream and can make your Emacs crash.

Emacs screenshot

Alt-Ergo Development

During the last two months, we participated in the supervision of an intern, Albin Coquereau --- a graduted student at University Paris-Sud --- in the VALS team who worked on a conservative extension of the SMT2 standard input language with prenex polymorphism a la ML and overloading. First results are promising. In the future, we plan to replace Alt-Ergo's input language with our extension of SMT2 in order to get advantage from SMT2's features and polymorphism's expressiveness.

Recenlty, we have also published an online Javascript-based version of Alt-Ergo (based on private release 0.99).

OCaml Adventures in Scilab Land

We are currently working on the proper integration of our Scilab tools in the Scilab world, respecting its ways and conventions. For this, we built a Scilab module respecting the standard ATOMS interface. This module can embed an OCaml program inside the run-time environment of Scilab, so that OCaml functions can be called as external primitives. (Dyn)linking Scilab's various components, LLVM's and the OCaml run-time together was not that easy.

Symmetrically, we built an OCaml library to manipulate and build Scilab values from OCaml, so that our tools can introspect the dynamic envrionment of Scilab's interprete. We also worked with the Scilab team to defined an AST interchange mechanism.

We plan to use this module as an entry point for our JIT oriented type system, as well as to integrate Scilint, our style checker, so that a Scilab user can check their functions without leaving the Scilab toplevel.

Experiment with Bytes and Backward Compatibility

As announced by a long discussion in the caml-list, OCaml 4.02 introduces the first step to eliminate a long known OCaml problem: String Mutability. The main difficulty being that resolving that problem necessarilly breaks backward compatibility.

To achieve this first step, OCaml 4.02 comes with a new bytes type and a corresponding Bytes module, similar to OCaml 4.01 String module, but using the bytes type. The type of a few functions from the String module changed to use the bytes type (like String.set, String.blit... ). By default the string and bytes types are equal, hence ensuring backward compatibility for this release, but a new argument "-safe-string" to the compiler can be used to remove this equality, and will probably become the default in some future release.

# let s = "foo";;
val s : string = "foo"
# s.[0] <- 'a';;
Characters 0-1:
  s.[0] <- 'a';;
Error: This expression has type string but an expression was expected of type bytes

Notice that even when using -safe-string you shouldn't rely on strings being immutable. For instance even if you compile that file with -safe-string, the assertion in the function g does not necessarilly hold:

If the following file a.ml is compiled with -safe-string

let s = "foo"
let f () = s
let g () = assert(s = "foo")

and the following file b.ml is compiled without -safe-string`

let s = A.f () in
  s.[0] <- 'a';
  A.g ()

In b.ml the equality holds, so modifying the string is possible, and the assertion from A.g will fail.

So you should consider that for now -safe-string is only a compiler-enforced good practice. But this may (and should) change in future versions. The ocamlc man page says:

   Enforce the separation between types string and bytes, thereby
   making strings read-only. This will become the default in a
   future version of OCaml.

In other words if you want your current code to be forward-compatible, your code should start using bytes instead of string as soon as possible.

Maintaining Compatibility between 4.01 and 4.02

In our experiments, we found a convenient solution to start using the bytes type while still providing compatibility with 4.01: we use a small StringCompat module that we open at the beginning of all our files making use of bytes. Depending on the version of OCaml used to build the program, we provide two different implementations of stringCompat.ml.

  • Before 4.02, our stringCompat.ml file provides a bytes type and a Bytes module, including the String module plus an often used Bytes.to_string equivalent:

type bytes = string
module Bytes = struct
  include String
  let to_string t = t
  • After 4.02, our stringCompat.ml file is much simpler:

type t = bytes
type bytes = t
module Bytes = Bytes

You might actually even wonder why it is not empty ? In fact, it is also a good practice to compile with a warning for unused open, and an empty stringCompat.ml would always trigger a warning in 4.02 for not being useful. Instead, this simple implementation is always seen as useful, as any use of bytes or Bytes will use the (virtual) indirection through StringCompat.

We plan to upload this module as a string-compat package in OPAM, so that everybody can use this trick. If you have a better solution, we'll be pleased to discuss it via the pull on opam-repository.

Testing whether your project correctly builds with "-safe-string"

When your code have been adapted to use the bytes whenever you need to modify a string, you can test if you didn't miss a case using OCaml 4.02 without changing your build system. To do that, you can just set the environment variable OCAMLPARAM to "safe-string=1,_". Notice that "OCAMLPARAM" should only be used for testing purpose, never set it in your build system or tools, this would prevent testing new compiler options on your package (and you will receive complaints when the core developers can't desactivate the "-w A -warn-error" generated by your build system) !

If your project passes this test and you don't use "-warn-error", your package should continue to build without modification in the near and the not-so-near future (unless you link with the compiler internals of course).

by Fabrice Le Fessant (fabricel@ocamlpro.com) at July 16, 2014 12:00 AM