OCaml Planet

July 28, 2014

Github OCaml jobs

Full Time: Software Developer (Functional Programming) at Jane Street in New York, NY; London, UK; Hong Kong

Software Developer (Functional Programming)

Jane Street is looking to hire great software developers with an interest in functional programming. OCaml, a statically typed functional programming with similarities to Haskell, Scheme, Erlang, F# and SML, is our language of choice. We've got the largest team of OCaml developers in any industrial setting, and probably the world's largest OCaml codebase. We use OCaml for running our entire business, supporting everything from research to systems administration to trading systems. If you're interested in seeing how functional programming plays out in the real world, there's no better place.

The atmosphere is informal and intellectual. There is a focus on education, and people learn about software and trading, both through formal classes and on the job. The work is challenging, and you get to see the practical impact of your efforts in quick and dramatic terms. Jane Street is also small enough that people have the freedom to get involved in many different areas of the business. Compensation is highly competitive, and there's a lot of room for growth.

You can learn more about Jane Street and our technology from our main site, janestreet.com. You can also look at a a talk given at CMU about why Jane Street uses functional programming (http://ocaml.janestreet.com/?q=node/61), and our programming blog (http://ocaml.janestreet.com).

We also have extensive benefits, including:

  • 90% book reimbursement for work-related books
  • 90% tuition reimbursement for continuing education
  • Excellent, zero-premium medical and dental insurance
  • Free lunch delivered daily from a selection of restaurants
  • Catered breakfasts and fresh brewed Peet's coffee
  • An on-site, private gym in New York with towel service
  • Kitchens fully stocked with a variety of snack choices
  • Full company 401(k) match up to 6% of salary, vests immediately
  • Three weeks of paid vacation for new hires in the US
  • 16 weeks fully paid maternity/paternity leave for primary caregivers, plus additional unpaid leave

More information at http://janestreet.com/culture/benefits/

July 28, 2014 12:43 PM

July 27, 2014

OCamlCore Forge News

Release of OCaml-safepass 1.3

This version features a number of patches submitted by Jonathan Curran, bringing OCaml-safepass up-to-date with the latest Bcrypt sources and enabling GCC optimisations for compiling the C portions.

by Dario Teixeira at July 27, 2014 09:32 AM

July 26, 2014

Shayne Fletcher

Cartesian product (redux)

Cartesian product (redux)

In this blog post we looked at programs to compute Cartesian Products. One algorithm (given here in OCaml) if you recall is

let prod l r =
let g acc a =
let f acc x =
(a, x) :: acc
in List.fold_left f acc r
in List.fold_left g [] l |> List.rev

In that earlier post I translated the above program into C++. This time I'm doing the same straightforward translation but using the Boost.MPL library to compute such products at compile time. It looks like this:


#include <boost/mpl/pair.hpp>
#include <boost/mpl/vector.hpp>
#include <boost/mpl/push_front.hpp>
#include <boost/mpl/fold.hpp>
#include <boost/mpl/reverse.hpp>
#include <boost/mpl/placeholders.hpp>

using namespace boost::mpl::placeholders;

template <class L, class R>
struct product
{
struct g {
template <class AccT, class A>
struct apply {
typedef typename boost::mpl::fold <
R, AccT
, boost::mpl::push_front<_1, boost::mpl::pair<A, _2> >
>::type type;
};
};

typedef typename boost::mpl::reverse <
typename boost::mpl::fold <
L, boost::mpl::vector<>, g>::type>::type type;
};
The translation proceeds almost mechanically! Does it work though? You betcha! Here's a test we can convince ourselves with:

#include <boost/mpl/equal.hpp>
#include <boost/mpl/for_each.hpp>
#include <boost/mpl/int.hpp>

#include <iostream>

using namespace boost::mpl;

typedef vector<int_<1>, int_<2> > A;
typedef vector<int_<3>, int_<4>, int_<5> > B;
typedef product <A, B>::type result;

BOOST_MPL_ASSERT((
equal<
result
, vector<
pair<int_<1>, int_<3> >
, pair<int_<1>, int_<4> >
, pair<int_<1>, int_<5> >
, pair<int_<2>, int_<3> >
, pair<int_<2>, int_<4> >
, pair<int_<2>, int_<5> >
>
> ));

struct print_class_name {
template <typename T>
void operator()( T t ) const {
std::cout << typeid(t).name() << " ";
}
};

int main ()
{
boost::mpl::for_each<result>(print_class_name());

return 0;
}
The takeaway is, learning yourself some functional programming is a great way to improve your template meta-programming fu! (That of course should come as no surprise... )

by Shayne Fletcher (noreply@blogger.com) at July 26, 2014 02:50 PM

July 25, 2014

Mindy Preston

Doing Nothing in Mirage

It’s Northern Hemisphere summer right now, and in Wisconsin we’re having one of the loveliest ones I can remember. Today the temperature is hovering right at pleasant, there are high clouds blowing across the sky, the breeze is soothing, and birds are singing all over the place. It is not, in short, programming weather. It is sit-outside, read-a-novel, do-nothing weather.

We don’t often let our programs slack off, even when we let ourselves take a peaceful day. I got to wondering (staring off into space, watching the shadows cast by sun-dappled leaves) what the most trivial, do-nothing Mirage project would look like, and how it could be constructed with a minimum of activity and a maximum of understanding.

[] dothraki@iBook:~$ mkdir trivial [] dothraki@iBook:~$ cd trivial/ [] dothraki@iBook:~/trivial$ ls -alh total 16K drwxrwxr-x 2 dothraki dothraki 4.0K Jul 23 13:17 . drwxr-xr-x 161 dothraki dothraki 12K Jul 23 13:17 .. [] dothraki@iBook:~/trivial$ mirage configure --xen [ERROR] No configuration file config.ml found. You'll need to create one to let Mirage know what to do.

Okay, we’ll have to do at least one thing to make this work. Mirage uses config.ml to programmatically generate a Makefile and main.ml when you invoke mirage --configure. main.ml uses instructions from config.ml to satisfy module types representing driver requirements for your application, then begins running the threads you requested that it run. That all sounds an awful lot like work; maybe we can get away with not asking for anything.

[] dothraki@iBook:~/trivial$ touch config.ml [] dothraki@iBook:~/trivial$ mirage configure --xen Mirage Using scanned config file: config.ml Mirage Processing: /home/dothraki/trivial/config.ml Mirage => rm -rf /home/dothraki/trivial/_build/config.* Mirage => cd /home/dothraki/trivial && ocamlbuild -use-ocamlfind -tags annot,bin_annot -pkg mirage config.cmxs empty Using configuration: /home/dothraki/trivial/config.ml empty 0 jobs [] empty => ocamlfind printconf path empty Generating: main.ml empty Now run 'make depend' to install the package dependencies for this unikernel. [] dothraki@iBook:~/trivial$ ls _build config.ml empty.xl log main.ml Makefile

That seems like a great start! Maybe we can trivially achieve our dream of doing nothing.

[] dothraki@iBook:~/trivial$ make depend opam install mirage-xen --verbose [NOTE] Package mirage-xen is already installed (current version is 1.1.1).

Resting on our laurels. Excellent. (In keeping with the lazy theme of this post, I’ll elide the make depend step from future examples, but if you’re playing along at home you may discover that you need to run it when you introduce new complexity in pursuit of perfect non-action.)

[] dothraki@iBook:~/trivial$ make ocamlbuild -classic-display -use-ocamlfind -pkgs lwt.syntax,mirage-types.lwt -tags "syntax(camlp4o),annot,bin_annot,strict_sequence,principal" -cflag -g -lflags -g,-linkpkg,-dontlink,unix main.native.o ocamlfind ocamldep -package mirage-types.lwt -package lwt.syntax -syntax camlp4o -modules main.ml > main.ml.depends ocamlfind ocamlc -c -g -annot -bin-annot -principal -strict-sequence -package mirage-types.lwt -package lwt.syntax -syntax camlp4o -o main.cmo main.ml + ocamlfind ocamlc -c -g -annot -bin-annot -principal -strict-sequence -package mirage-types.lwt -package lwt.syntax -syntax camlp4o -o main.cmo main.ml File "main.ml", line 8, characters 2-13: Error: Unbound module OS Command exited with code 2. make: *** [main.native.o] Error 10 [] dothraki@iBook:~/trivial$

Oh, bother.

Let’s have a look at the main.ml generated by mirage configure --xen with our empty config.ml.

``` ( Generated by Mirage (Wed, 23 Jul 2014 18:21:24 GMT). )

open Lwt

let _ = Printexc.record_backtrace true

let () = OS.Main.run (join []) ```

This looks like the right general idea – OS.Main.run to invoke a thread, join [] (from Lwt) to operate on an empty list of work to do. We have no work to do. It’s a nice day.

Unfortunately, we can’t accomplish our goal of doing nothing if we can’t do anything (in other words, we can’t run our zero threads if we have no idea how to run anything), so we’ll have to do a little more work first.

If we can figure out how to get OS.Main.run included, we’ll be off to the races. Let’s have a look at the Makefile, which is also programmatically generated by mirage configure and shows which libraries the invocation of make will link against.

[] dothraki@iBook:~/trivial$ grep ^LIBS Makefile LIBS = -pkgs lwt.syntax,mirage-types.lwt

Mirage applications will always get mirage-types.lwt and lwt.syntax, and might get more libraries included if they’re requested in config.ml or required by a driver requested there. Unfortunately, neither of these gets us OS.

Some snooping on utop with strace reveals that an implementation for OS lives in the mirage-xen and mirage-unix libraries. If we add mirage-xen to LIBS manually in the Makefile (or mirage-unix, if we generated the Makefile with mirage configure --unix) and then make, we do get something runnable. But that’s not good enough! We don’t want to manually edit files that a computer could generate for us! That sounds like work, and that’s not what today is all about.

One possible solution is to do is write a few lines into our config.ml to request that an appropriate library be included in LIBS. We could ask for mirage-unix directly, but that wouldn’t work when we want to do nothing in Xen, and mirage-xen has the same problem when we want to do nothing in Unix. Every time we wanted to do nothing on a different platform, we’d have to change our config.ml! No way.

Instead, let’s request a driver that imports the OS interface. Numerous drivers are available, but the simplest one is CONSOLE, which provides basic text output on a screen. Let’s pick that one.

In order to use CONSOLE, we’ll have to add some code to our config.ml, disturbing its state of complete relaxation. Minimally, we need a call to register, which is defined in mirage.ml:

ocaml val register: string -> job impl list -> unit (** [register name jobs] registers the application named by [name] which will executes the given [jobs]. *)

It appears we’ll have to define at least one job impl, where we’ll do the hard work of defining doing nothing, then register it with some name. Say, out_to_lunch. Mirage expects us to define the job in another file, then tell config.ml where to find it with foreign (source):

ocaml val foreign: string -> ?libraries:string list -> ?packages:string list -> 'a typ -> 'a impl (** [foreign name libs packs constr typ] states that the module named by [name] has the module type [typ]. If [libs] is set, add the given set of ocamlfind libraries to the ones loaded by default. If [packages] is set, add the given set of OPAM packages to the ones loaded by default. *)

So we’ll have to make another file, say maybe loaf.ml, with our top-secret and highly-valuable instructions for slacking off. (Before we get too high and mighty about all the instructions our project needs just to not do anything, remember that sometimes humans need help with this too.) loaf.ml needs to have at least one module specified that can take a CONSOLE module argument; traditionally unikernels call this module Main, but we can call it anything. Say, for example, Relax.

Modules in OCaml can’t be completely empty, and we’ll need a start function for the program to link, so we’ll just get all that hard work out of the way and define it right now. Since Mirage will want to pass information on how to use the specific instance of a console to the program, start will have to accept an argument of type console_impl. Luckily, there’s no requirement that we actually do anything of consequence in there, so we can finally define what it means to do nothing – Lwt.return (). A full explanation of Lwt is outside the scope of our laziness today, but there is great documentation both through the Mirage website and at the Ocsigen site for Lwt.

```ocaml module Relax (C: V1_LWT.CONSOLE) = struct let start console =

  Lwt.return ()

end ```

We define Relax as a module parameterized by a V1_LWT.CONSOLE module (which we have no plan to use, but config.ml won’t be able to make a sensible program for us unless we’re ready to accept it). start takes a console argument representing a particular implementation of a console. If we had any plans to output anything, we’d do it by calling a function with console as an argument, but we’re not going to do that. We’re just going to relax today.

Now that we know how to relax, we can point config.ml at this code with foreign, so that we’ll eventually be able to register it. The code we want to run is in the file loaf.ml, in a module called Relax, so our first argument to foreign will be "Loaf.Relax". (This works so simply because config.ml and loaf.ml are in the same directory – if you structure things differently, you’ll have to work harder, so I recommend against it.)

Our second argument to foreign, according to the type signature, should be a 'a typ, and if we do so, the function will return a 'a impl. In order to call register, we need at least one job impl to put in the list, so we need to somehow make 'a be job in our call to foreign. Mirage provides a job impl called, simply, job.

We can’t just pass the results of foreign "Loaf.Relax" job off to register, though, because if we do that, mirage configure won’t know that we wanted a console, and we’ll still have no libraries that know about OS.Main.run in our Makefile. What we really need is a (console -> job) typ, so we can get a (console -> job) impl, and then pass it a console impl to get a job impl out.

Mirage provides two combinators for making this work – @-> for composing a 'a typ and 'b typ into a ('a -> 'b) typ, and a combinator $ for applying a ('a -> 'b) impl to a 'a impl and getting a 'b impl back. We can use foreign "Loaf.Relax" (console @-> job) to get a (console -> job) typ, let-bind this value to a variable (say, relax), and use this along with a Mirage-provided console impl as an argument to register.

```ocaml open Mirage

let () =
let relax = foreign “Loaf.Relax” (console @–> job) in
register “out_to_lunch” [ relax $ default_console ] ```

A mirage configure --xen and make get us a small, very lazy unikernel named mir-out_to_lunch.xen and an automatically generated configuration file for Xen named out_to_lunch.xl. (Both names are taken from the string we pass as the first argument to register.) If we start it up this unikernel with sudo xm create -c out_to_lunch.xl, we get the following (somewhat disappointing) output:

[] dothraki@iBook:~/trivial$ sudo xm create -c out_to_lunch.xl [sudo] password for dothraki: Using config file "./out_to_lunch.xl". Started domain out_to_lunch (id=2) Domain has already finished Could not start console

Our unikernel finishes so quickly that Xen can’t attach a console to it. If we start it paused and then unpause it after our console attaches, we can watch it boot:

[] dothraki@iBook:~/trivial$ sudo xm create -c out_to_lunch.xl -p Using config file "./out_to_lunch.xl". Started domain out_to_lunch (id=3)

and, in another window, sudo xm unpause out_to_lunch, to let out_to_lunch start booting. We then see the output sent to the console by Mirage as it’s booting, before control is handed over to our user program – in this case, return (), which executes immediately, returns from the main program, and informs Xen that the virtual machine is shutting down.

Started domain out_to_lunch (id=3) kernel.c: Mirage OS! kernel.c: start_info: 0x11c1000(VA) kernel.c: nr_pages: 0x10000 ... x86_mm.c: Demand map pfns at 10001000-2010001000. Initialising timer interface main returned 0 [] dothraki@iBook:~/trivial$

Doing Nothing with Different Drivers

We’ve succesfully done nothing with this simple config.ml:

```ocaml open Mirage

let () = let relax = foreign “Loaf.Relax” (console @–> job) in register “out_to_lunch” [ relax $ default_console ] ```

What if we want to do nothing over the network? Or do nothing with a filesystem? Or do nothing with all of these things?

Including support for a driver requires us to have two things: a typ representation to include in the arguments to foreign, and an impl to include in the list of arguments to register. The available typs and impls are defined in mirage.mli; some typs have more than one corresponding impl, or have parameterized constructors for impls.

For example, there are two ways to get a network impl, the impl representing raw device-level access to a network card:

  • val tap0: network impl
  • val netif: string -> network impl

tap0 just returns the first available network interface; netif takes a string argument and attempts to find a matching network interface, then makes that interface available in the returned value. The same network typ applies for both:

``` open Mirage

let () = let relax = foreign “Loaf.Relax” (network @–> network @–> job) in register “out_to_lunch” [ relax $ tap0 $ netif “1” ] ```

This config.ml will build against a loaf.ml that has a Relax module parameterized by two V1_LWT.NETWORK modules, and a start function that expects two network impl arguments:

``` module Relax (IGNORED: V1_LWT.NETWORK) (ALSO_IGNORED: V1_LWT.NETWORK) = struct let start default string_parameterized =

Lwt.return ()

end ```

(To actually run this in Xen, we need to make a couple alterations to the autogenerated out_to_lunch.xl so that two network interfaces are actually provided, but this can be done once and saved off somewhere so as not to be overwritten by subsequent rebuilds.)

We can do nothing with multiple kinds of drivers, too:

``` open Mirage

let () =
let relax = foreign “Loaf.Relax” (network @–> console @–> job) in
register “out_to_lunch” [ relax $ tap0 $ default_console ]
```

If we modify loaf.ml have a Relax module parameterized by a V1_LWT.NETWORK and V1_LWT.CONSOLE, and provide a start with network impl and console impl parameters, we’re off to the races.

For The Ambitious

The start function in Loaf.Relax can do more than just immediately return, of course. Programmers with ambition, gumption, and Tasks To Accomplish can define programs that Get Things Done, and launch them from start. Programs that plan to use, say, a network interface, can call functions provided by V1_LWT.NETWORK on the network impl provided to start to generate and receive network traffic; console programs can write to their provided console impl. The “Hello Mirage World” tutorial gives great examples and instructions for making things happen – perfect for a rainy day.

Today’s more of a return () type of day, though.

July 25, 2014 06:30 PM

July 24, 2014

OCaml Labs compiler hacking

Seventh OCaml compiler hacking session (at Citrix)

For the seventh Cambridge OCaml compiler-hacking session we'll be meeting at the Citrix office in the Cambridge Science Park on 6.30pm Friday 1st August. Thanks to Citrix for supporting and hosting the session!

We'll kick off with a demo from Frédéric Bour of modular implicits, an OCaml extension that adds support for overloading.

If you're planning to come along, it'd be helpful if you could indicate interest via Doodle and sign up to the mailing list to receive updates:

Where: Citrix Systems Research & Development Ltd.
Building 101
Cambridge Science Park
Milton Road
Cambridge, CB4 0FY
United Kingdom

When: 6.30pm, Friday 1st August

Who: anyone interested in improving OCaml. Knowledge of OCaml programming will obviously be helpful, but prior experience of working on OCaml internals isn't necessary.

What: fixing bugs, implementing new features, learning about OCaml internals

Wiki: https://github.com/ocamllabs/compiler-hacking/wiki

We're defining "compiler" pretty broadly, to include anything that's part of the standard distribution, which means at least the standard library, runtime, tools (ocamldep, ocamllex, ocamlyacc, etc.), ocamlbuild, the documentation, and the compiler itself. We'll have suggestions for mini-projects for various levels of experience (see also some things we've worked on in previous sessions), but feel free to come along and work on whatever you fancy.

We'll also be ordering pizza, so if you want to be counted for food you should aim to arrive by 6.45pm.

by OCaml Labs (cl-ocamllabs@lists.cam.ac.uk) at July 24, 2014 09:00 AM

July 23, 2014

Richard Jones

libguestfs now works on 64 bit ARM

arm

Pictured above is my 64 bit ARM server. It’s under NDA so I cannot tell you who supplied it or even show you a proper photo.

However it runs Fedora 21 & Rawhide:

Linux arm64.home.annexia.org 3.16.0-0.rc6.git1.1.efirtcfix1.fc22.aarch64 #1 SMP Wed Jul 23 12:15:58 BST 2014 aarch64 aarch64 aarch64 GNU/Linux

libvirt and libguestfs run fine, with full KVM acceleration, although right now you have to use qemu from git as the Rawhide version of qemu is not new enough.

Also OCaml 4.02.0 beta works (after we found and fixed a few bugs in the arm64 native code generator last week).


by rich at July 23, 2014 08:49 PM

July 22, 2014

Open Mirage

Mirage v2.0: a recap of the new features

The first release of Mirage back in December 2013 introduced the prototype of the unikernel concept, which realised the promise of a safe, flexible mechanism to build highly optimized software stacks purpose-built for deployment in the public cloud (more background on this). Since then, we've been hard at work using and extending Mirage for real projects and the community has been steadily growing.

We're thrilled to announce the release of Mirage OS v2.0 today! Over the past few weeks the team has been hard at work blogging about all the new features in this latest release, coordinated by the tireless Amir Chaudhry:

All the libraries required for these new features are regularly released into the OPAM package manager, so just follow the installation instructions to give them a spin. A release this size probably introduces minor hiccups that may cause build failures, so we very much encourage bug reports on our issue tracker or questions to our mailing lists. Don't be shy: no question is too basic, and we'd love to hear of any weird and wacky uses you put this new release to! And finally, the lifeblood of Mirage is about sharing and publishing libraries that add new functionality to the framework, so do get involved and open-source your own efforts.

Breaking news: Richard Mortier and I will be speaking at OSCON this week on Thursday morning about the new features in F150 in the Cloud Track. Come along if you are in rainy Portland at the moment!

by Anil Madhavapeddy (anil@recoil.org) at July 22, 2014 11:00 AM

Building an ARMy of Xen unikernels

Mirage has just gained the ability to compile unikernels for the Xen/arm32 platform, allowing Mirage guests to run under the Xen hypervisor on ARM devices such as the Cubieboard 2 and CubieTruck.

Introduction

The ARMv7 architecture introduced the (optional) Virtualization Extensions, providing hardware support for running virtual machines on ARM devices, and Xen's ARM Hypervisor uses this to support hardware accelerated ARM guests.

Mini-OS is a tiny OS kernel designed specifically for running under Xen. It provides code to initialise the CPU, display messages on the console, allocate memory (malloc), and not much else. It is used as the low-level core of Mirage's Xen implementation.

Mirage v1 was built on an old version of Mini-OS which didn't support ARM. For Mirage v2, we have added ARM support to the current Mini-OS (completing Karim Allah Ahmed's initial ARM port) and made Mirage depend on it as an external library. This means that Mirage will automatically gain support for other architectures that get added later. We are currently working with the Xen developers to get our Mini-OS fork upstreamed.

In a similar way, we have replaced Mirage v1's bundled maths library with a dependency on the external OpenLibm, which we also extended with ARM support (this was just a case of fixing the build system; the code is from FreeBSD's libm, which already supported ARM).

Mirage v1 also bundled dietlibc to provide its standard C library. A nice side-effect of this work came when we were trying to separate out the dietlibc headers from the old Mini-OS headers in Mirage. These had rather grown together over time and the work was proving difficult, until we discovered that we no longer needed a libc at all, as almost everything that used it had been replaced with pure OCaml versions! The only exception was the printf code for formatting floating point numbers, which OCaml uses in its printf implementation. We replaced that by taking the small fmt_fp function from musl libc.

Here's the final diffstat of the changes to mirage-platform adding ARM support:

778 files changed, 1949 insertions(+), 59689 deletions(-)

Trying it out

You'll need an ARM device with the Virtualization Extensions. I've been testing using the Cubieboard 2 (and CubieTruck):

Cubieboard2

The first step is to install Xen. Running Xen on the Cubieboard2 documents the manual installation process, but you can now also use mirage/xen-arm-builder to build an SDcard image automatically. Copy the image to the SDcard, connect the network cable and power, and the board will boot Xen.

Once booted you can ssh to Dom0, the privileged Linux domain used to manage the system, install Mirage, and build your unikernel just as on x86. Currently, you need to select the Git versions of some components. The following commands will install the necessary versions if you're using the xen-arm-builder image:

$ opam init
$ opam install mirage-xen-minios
$ opam pin mirage https://github.com/talex5/mirage.git#link_c_stubs
$ opam pin mirage-xen https://github.com/mirage/mirage-platform
$ opam pin tcpip https://github.com/talex5/mirage-tcpip.git#checksum
$ opam install mirage

Technical details

One of the pleasures of unikernels is that you can comprehend the whole system with relatively little effort, and those wishing to understand, debug or contribute to the ARM support may find the following technical sections interesting. However, you don't need to know the details of the ARM port to use it, as Mirage abstracts away the details of the underlying platform.

The boot process

An ARM Mirage unikernel uses the Linux zImage format, though it is not actually compressed. Xen will allocate some RAM for the image and load the kernel at the offset 0x8000 (32 KB).

Execution begins in arm32.S, with the r2 register pointing to a Flattened Device Tree (FDT) describing details of the virtual system. This assembler code performs a few basic boot tasks:

  1. Configuring the MMU, which maps virtual addresses to physical addresses (see next section).
  2. Turning on caching and branch prediction.
  3. Setting up the exception vector table (this says how to handle interrupts and deal with various faults, such as reading from an invalid address).
  4. Setting up the stack pointer and calling the C function arch_init.

arch_init makes some calls to the hypervisor to set up support for the console and interrupt controller, and then calls start_kernel.

start_kernel (in libxencaml) sets up a few more features (events, malloc, time-keeping and grant tables), then calls caml_startup.

caml_startup (in libocaml) initialises the garbage collector and calls caml_program, which is your application's main.ml.

The address space

With the Virtualization Extensions, there are two stages to converting a virtual memory address (used by application code) to a physical address in RAM. The first stage is under the control of the guest VM, mapping the virtual address to what the guest believes is the physical address (this address is referred to as the Intermediate Physical Address or IPA). The second stage, under the control of Xen, maps the IPA to the real physical address. The tables holding these mappings are called translation tables.

Mirage's memory needs are simple: most of the RAM should be used for the garbage-collected OCaml heap, with a few pages used for interacting with Xen (these don't go on the OCaml heap because they must be page aligned and must not move around).

Xen does not commit to using a fixed address as the IPA of the RAM, but the C code needs to run from a known location. To solve this problem the assembler code in arm32.S detects where it is running from and sets up a virtual-to-physical mapping that will make it appear at the expected location, by adding a fixed offset to each virtual address. For example, on Xen/unstable, we configure the beginning of the virtual address space to look like this (on Xen 4.4, the physical addresses would start at 80000000 instead):

Virtual addressPhysical address (IPA)Purpose
40000040000000Stack (16 KB)
40400040004000Translation tables (16 KB)
40800040008000Kernel image

The physical address is always at a fixed offset from the virtual address and the addresses wrap around, so virtual address c0400000 maps back to physical address 0 (in this example).

The stack, which grows downwards, is placed at the start of RAM so that a stack overflow will trigger a fault rather than overwriting other data.

The 16 KB translation table is an array of 4-byte entries each mapping 1 MB of the virtual address space, so the 16 KB table is able to map the entire 32-bit address space (4 GB). Each entry can either give the physical section address directly (which is what we do) or point to a second-level table mapping individual 4 KB pages. By using only the top-level table we reduce possible delays due to TLB misses.

After the kernel code comes the data (constants and global variables), then the bss section (data that is initially zero, and therefore doesn't need to be stored in the kernel image), and finally the rest of the RAM, which is handed over to the malloc system.

Contact

The current version seems to be working well on Xen 4.4 (stable) and the 4.5 development version, but has only been lightly tested. If you have any problems or questions, or get it working on other devices, please let us know!

by Thomas Leonard (talex5@gmail.com) at July 22, 2014 10:00 AM

July 19, 2014

Shayne Fletcher

Merge sort

Merge sort

Here's another divide and conquer algorithm. This one is for sorting a sequence.

Conceptually, a merge sort works like this (see http://en.wikipedia.org/wiki/Merge_sort):

  • Divide the unsorted list into n sub-lists, each containing one element (a list of one element is trivially sorted);
  • Repeatedly merge sublists to produce new sorted sub-lists until there is only one sub-list remaining : this will be the sorted list.

In the following OCaml, we are drawing on inspiration from the the Haskell standard prelude for the part of the algorithm concerned with dividing the unsorted list : functions take, drop and split.


(**{b Merge sort}, an {i O(n log n)} comparison based sorting
algorithm *)
module type MERGESORT = sig

(**[take n] applied to a list [xs], returns the prefix of [xs] of
length [n] or [xs] itself if [n > List.length xs] e.g. [take 2
[1; 2; 3]] {i = } [[1; 2]]*)
val take : int -> 'a list -> 'a list

(**[drop n] applied to a list [xs], returns the suffix of [xs]
after the first [n] elements or, [[]] if [n > List.length xs]
e.g. [drop 2 [1; 2; 3]] {i = } [[3]]*)
val drop : int -> 'a list -> 'a list

(**[split n xs] is equivalent to [take n xs, drop n xs] e.g.
[split 2 [1; 2; 3]] {i = } [([1; 2], [3])]*)
val split : int -> 'a list -> 'a list * 'a list

(**[merge] given two {b sorted} sequences [xs] and [ys] returns a
single sorted sequence of the elements of [xs] and [ys]
e.g. [merge [1; 3] [2; 4]] {i = } [[1; 2; 3; 4]]*)
val merge : 'a list -> 'a list -> 'a list

(**[merge_sort] works by splitting a sequence into two parts,
recursively sorting the two parts and merging the results into
a single sorted sequence e.g. [merge_sort [1; 2; -1; 0; 3]]
{i = } [[-1; 0; 1; 2; 3]]*)
val merge_sort : 'a list -> 'a list
end

module Merge_sort : MERGESORT = struct

let rec take k l =
match (k, l) with
| n, _ when n <= 0 -> []
| _, [] -> []
| n, (x :: xs) -> x :: take (n - 1) xs

let rec drop k l =
match (k, l) with
| n, xs when n <= 0 -> xs
| _, [] -> []
| n, (_ :: xs) -> drop (n - 1) xs

let rec split k l = take k l, drop k l

let rec merge l m =
match (l, m) with
| [], ys -> ys
| xs, [] -> xs
| ((x :: xs) as t), ((y :: ys) as s) ->
if x <= y then x :: (merge xs s) else y :: (merge t ys)

let rec merge_sort l =
let i = (List.length l) / 2 in
if i = 0 then l
else
let u, v = split i l in
let xs, ys = merge_sort u, merge_sort v in
merge xs ys

end

We can test our little program in the top-level like this:


# let l = Merge_sort.merge_sort [-1; 2; 0; 4; 3; 5];;
val l : int list = [-1; 0; 2; 3; 4; 5]

Here are the functions in C++ phrased as range algorithms.


//"c:/users/fletch/desktop/setenv.cmd"
//cl /EHsc /Femerge.exe /I c:/project/boost_1_55_0 merge.cpp

#include <boost/next_prior.hpp>
#include <boost/range.hpp>
#include <boost/range/algorithm/copy.hpp>

#include <vector>
#include <cstdlib>

namespace algo
{
//take

template <class S, class D>
D take (std::size_t n, S src, D dst)
{
typedef boost::range_iterator<S>::type it_t;
it_t curr = boost::begin (src), last = boost::end (src);
if (n <= 0) return dst;
if (boost::empty (src)) return dst;
take (n-1, S (boost::next (curr), last), *dst++ = *curr);
return dst;
}

//drop

template <class S, class D>
D drop (std::size_t n, S src, D dst)
{
typedef boost::range_iterator<S>::type it_t;
it_t curr = boost::begin (src), last = boost::end (src);
if (n <= 0) return boost::range::copy (src, dst);
if (boost::empty (src))return dst;
drop (n-1, S (boost::next (curr), last), dst);
return dst;
}

//split

template <class S, class D1, class D2>
void split (int n, S src, D1 dst1, D2 dst2)
{ take (n, src, dst1); drop (n, src, dst2); }

//merge

template <class S1, class S2, class D>
D merge (S1 src1, S2 src2, D dst)
{
typedef boost::range_iterator<S1>::type it1_t;
typedef boost::range_iterator<S2>::type it2_t;
typedef std::iterator_traits<it1_t>::value_type value1_t;
typedef std::iterator_traits<it2_t>::value_type value2_t;
if (boost::empty (src1)) return boost::copy (src2, dst);
if (boost::empty (src2)) return boost::copy (src1, dst);
it1_t curr1 = boost::begin (src1), last1 = boost::end (src1);
it2_t curr2 = boost::begin (src2), last2 = boost::end (src2);
value1_t x = *curr1;
value2_t y = *curr2;
if (x <= y)
merge (S1 (boost::next (curr1), last1), src2, *dst++ = x);
else
merge (src1, S2 (boost::next (curr2), last2), *dst++ = y);
return dst;
}

//merge_sort

template <class S, class D>
D merge_sort (S src, D dst)
{
typedef boost::range_iterator<S>::type it_t;
typedef std::iterator_traits<it_t>::value_type value_t;
std::size_t i = boost::size (src)/2;
if (i == 0) return boost::range::copy (src, dst);
std::vector<value_t> u, v;
split (i, src, std::back_inserter (u), std::back_inserter (v));
std::vector<value_t> xs, ys;
merge_sort (u, std::back_inserter (xs));
merge_sort (v, std::back_inserter (ys));
merge (xs, ys, dst);
return dst;
}

}//namespace algo

The following usage example provides a lightweight test.


#include <boost/assign/list_of.hpp>

#include <utility>
#include <iterator>
#include <iostream>

int main ()
{
using std::pair;
using std::make_pair;
using std::ostream_iterator;
using boost::assign::list_of;

int ord[] = {1, 2, 3, 4};

auto src=make_pair(ord, ord + 4);
auto dst=ostream_iterator<int>(std::cout, ", ");

std::cout << "\ntake ():\n";

algo::take (0u, src, dst); std::cout << std::endl;
algo::take (1u, src, dst); std::cout << std::endl;
algo::take (2u, src, dst); std::cout << std::endl;
algo::take (3u, src, dst); std::cout << std::endl;
algo::take (4u, src, dst); std::cout << std::endl;
algo::take (5u, src, dst); std::cout << std::endl;

std::cout << "\ndrop ():\n";

algo::drop (5u, src, dst); std::cout << std::endl;
algo::drop (4u, src, dst); std::cout << std::endl;
algo::drop (3u, src, dst); std::cout << std::endl;
algo::drop (2u, src, dst); std::cout << std::endl;
algo::drop (1u, src, dst); std::cout << std::endl;
algo::drop (0u, src, dst); std::cout << std::endl;

std::cout << "\nsplit ():\n\n";

algo::split (0u, src, dst, dst); std::cout << std::endl;
algo::split (1u, src, dst, dst); std::cout << std::endl;
algo::split (2u, src, dst, dst); std::cout << std::endl;
algo::split (3u, src, dst, dst); std::cout << std::endl;
algo::split (4u, src, dst, dst); std::cout << std::endl;
algo::split (5u, src, dst, dst); std::cout << std::endl;

std::cout << "\nmerge ():\n\n";

std::vector <int> l=list_of(-1)(2), r=list_of(0)(1);
algo::merge (l, r, dst); std::cout << std::endl;

std::cout << "\nmerge_sort ():\n\n";

int unord[] = {-1, 2, 0, 4, 3, 5};
algo::merge_sort (make_pair (unord, unord + 6), dst);

return 0;
}
The above program produces the following output.

take ():

1,
1, 2,
1, 2, 3,
1, 2, 3, 4,
1, 2, 3, 4,

drop ():


4,
3, 4,
2, 3, 4,
1, 2, 3, 4,

split ():

1, 2, 3, 4,
1, 2, 3, 4,
1, 2, 3, 4,
1, 2, 3, 4,
1, 2, 3, 4,
1, 2, 3, 4,

merge ():

-1, 0, 1, 2,

merge_sort ():

-1, 0, 2, 3, 4, 5,

by Shayne Fletcher (noreply@blogger.com) at July 19, 2014 05:05 PM