OCaml Planet

September 29, 2014

Jane Street

What is gained and lost with 63-bit integers?

Almost every programming language uses 64-bit integers on typical modern Intel machines. OCaml uses a special 63-bit representation. How does it affect OCaml?

OCaml int memory representation

Most of OCaml's types are in memory represented as a header followed by data. The header is a 64-bit integer containing the length of the data and a tag. Tag is a rough classification of the type. The only OCaml's types which differ from this are ints and sometimes floats.

Floats normally have header and data, data being the value of the float itself. This representation is called "boxed". If a record's field is float, record's data will actually contain the pointer to the float data. The only exceptions are records with only floats and float arrays, whose data instead of pointers contain the values of floats. This representation is called "unboxed".

Values of type int are never stored as header and data (boxed). Int x is stored as (x << 1) | 1, where << is left shift and | is bitwise or, hence its least significant bit is always set. Pointers are word aligned, so they will never have this bit set, hence how ints and pointers are discerned. It is assumed that much of typical data is integers, so this is done to significantly improve performance:

  • there's no need to dereference a pointer when getting an int
  • no memory allocation is needed when creating ints
  • less work for the garbage collector
  • less memory fragmentation
  • no memory is needed for int headers

Distinguishing whether a value is int or pointer is as simple as testing x & 1, so this feature doesn't slow down garbage collector, polymorphic hash, polymorphic compare and whatever else structurally inspects data. One should note that this doesn't apply to the types int32 and int64, which are always boxed.

Penalty

Having the extra bit comes with a price - arithmetic operations are more complicated. For example

  • x + y is translated to CPU instructions x + y - 1
  • x * y is translated to CPU instructions (x >> 1) * (y - 1) + 1
  • x / y is translated to CPU instructions (((x >> 1) / (y >> 1)) << 1) + 1
  • x lsl y is translated to CPU instructions ((x - 1) << (y >> 1)) + 1

Sometimes this penalty is small or nonexistent. For instance there's no need to fix the bit in x + y - z. Only one bit fixing is needed for all five additions in x + y + z + w + u + v.

Another help is the Intel CPU instruction LEA, which can compute the sum of three integers with a single instruction, like x + y - 1. Unfortunately, LEA became very slow in the recent generations of CPUs. Intel doesn't suggest this will change.

This benchmark (test.ml) tries to estimate the difference in the performance. The results from Sandy Bridge show about 2 times speed difference in arithmetic operations. Assembly can be examined by compiling using "ocamlopt -S test.ml".

speed(ns)       63-bit   64-bit   slowdown
add independent 0.327588 0.121502 2.696156
add   dependent 0.328160 0.169375 1.937477
mul independent 0.614824 0.328060 1.874120
mul   dependent 1.094343 0.328872 3.327565
lsl independent 0.359828 0.166088 2.166489
lsl   dependent 0.762251 0.177468 4.295151
xor independent 0.249350 0.122900 2.028886
xor   dependent 0.404255 0.170380 2.372668

Agner's instruction tables show that the difference is even bigger with later generations of CPUs. For instance, Haswell can do four integer adds per cycle versus one LEA.

Conclusion

The benefits of unboxed ints are amazing. On the other hand, arithmetic operations are significantly slower. How much do arithmetic operations affect an average program? Could we have a solution which would keep ints unboxed but have fast arithmetic operations?

by Vladimir Brankov at September 29, 2014 01:52 PM

Github OCaml jobs

Full Time: Software Developer (Functional Programming) at Jane Street in New York, NY; London, UK; Hong Kong

Software Developer (Functional Programming)

Jane Street is looking to hire great software developers with an interest in functional programming. OCaml, a statically typed functional programming with similarities to Haskell, Scheme, Erlang, F# and SML, is our language of choice. We've got the largest team of OCaml developers in any industrial setting, and probably the world's largest OCaml codebase. We use OCaml for running our entire business, supporting everything from research to systems administration to trading systems. If you're interested in seeing how functional programming plays out in the real world, there's no better place.

The atmosphere is informal and intellectual. There is a focus on education, and people learn about software and trading, both through formal classes and on the job. The work is challenging, and you get to see the practical impact of your efforts in quick and dramatic terms. Jane Street is also small enough that people have the freedom to get involved in many different areas of the business. Compensation is highly competitive, and there's a lot of room for growth.

You can learn more about Jane Street and our technology from our main site, janestreet.com. You can also look at a a talk given at CMU about why Jane Street uses functional programming (http://ocaml.janestreet.com/?q=node/61), and our programming blog (http://ocaml.janestreet.com).

We also have extensive benefits, including:

  • 90% book reimbursement for work-related books
  • 90% tuition reimbursement for continuing education
  • Excellent, zero-premium medical and dental insurance
  • Free lunch delivered daily from a selection of restaurants
  • Catered breakfasts and fresh brewed Peet's coffee
  • An on-site, private gym in New York with towel service
  • Kitchens fully stocked with a variety of snack choices
  • Full company 401(k) match up to 6% of salary, vests immediately
  • Three weeks of paid vacation for new hires in the US
  • 16 weeks fully paid maternity/paternity leave for primary caregivers, plus additional unpaid leave

More information at http://janestreet.com/culture/benefits/

September 29, 2014 12:20 PM

September 23, 2014

OCaml Labs compiler hacking

Eighth OCaml compiler hacking evening (at Mill Lane, by the river)

Update: This session will be a joint F#/OCaml hacking event, beginning with a talk from Don Syme about F# compiler and language development!

For the eighth Cambridge OCaml compiler hacking evening we'll be meeting in the University Postdoc centre at 16 Mill Lane (near the river, next door to Makespace) on 6.30pm Tuesday 30th September.

If you're planning to come along, it'd be helpful if you could indicate interest via Doodle and sign up to the mailing list to receive updates:

Where: Postdoc Centre
Basement, 16 Mill Lane
Cambridge, CB2 1SB
United Kingdom

When: 6.30pm, Tuesday 30th September

Who: anyone interested in improving OCaml. Knowledge of OCaml programming will obviously be helpful, but prior experience of working on OCaml internals isn't necessary.

What: fixing bugs, implementing new features, learning about OCaml internals. This time we'll be focusing on code quality: refactoring, adding test cases, reviewng existing proposals and updating packages after the recent release of OCaml 4.02.

Wiki: https://github.com/ocamllabs/compiler-hacking/wiki

We're defining "compiler" pretty broadly, to include anything that's part of the standard distribution, which means at least the standard library, runtime, tools (ocamldep, ocamllex, ocamlyacc, etc.), ocamlbuild, the documentation, and the compiler itself. We'll have suggestions for mini-projects for various levels of experience (see also some things we've done on previous evenings), but feel free to come along and work on whatever you fancy.

We'll also be ordering pizza, so if you want to be counted for food you should aim to arrive by 6.45pm.

by OCaml Labs (cl-ocamllabs@lists.cam.ac.uk) at September 23, 2014 05:00 PM

September 22, 2014

OCaml-Java

New alpha version

A new alpha version has been released, fixing several blocking / annoying bugs.

by Xavier Clerc (xclerc@ocamljava.org) at September 22, 2014 09:00 PM

Alpha2

The new release, namely 2.0-alpha2, is mainly a bugfix relase, fixing the following issues:

by Xavier Clerc (xclerc@ocamljava.org) at September 22, 2014 09:00 PM

September 21, 2014

Shayne Fletcher

Correlation Coefficient

Correlation coefficient

It's been a while since we looked at anything from the domain of statistics so here's another little bite-sized piece - a function to compute Pearson's "product moment correlation coefficient".

It's a measure of dependence between two data sets. We'll express it in terms of unbiased standard deviation which I didn't write out before so I'll include that function too.


let unbiased_standard_deviation t =
(*http://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation

In statistics and in particular statistical theory, unbiased
estimation of a standard deviation is the calculation from a
statistical sample of an estimated value of the standard deviation
(a measure of statistical dispersion) of a population of values,
in such a way that the expected value of the calculation equals
the true value.

*)
let av = arithmetic_mean t in
let squared_diffs =
List.fold_left (fun acc xi -> ((xi -. av) *. (xi -. av)) :: acc) [] t
in sqrt ((sum squared_diffs)/.((float_of_int (List.length t)) -. 1.0))

let correlation_coefficient x y =
(*http://en.wikipedia.org/wiki/Correlation_and_dependence

The most familiar measure of dependence between two quantities is
the Pearson product-moment correlation coefficient, or "Pearson's
correlation coefficient", commonly called simply "the correlation
coefficient". It is obtained by dividing the covariance of the two
variables by the product of their standard deviations.
*)

let x_b = arithmetic_mean x in
let y_b = arithmetic_mean y in
let s_x = unbiased_standard_deviation x in
let s_y = unbiased_standard_deviation y in

if s_x = 0. || s_y = 0. then 0.
else
let f acc x_i y_i =
acc +. ((x_i -. x_b) *. (y_i -. y_b)) in
let n = float_of_int (List.length x) in
let s = List.fold_left2 f 0.0 x y in
s/.((n -. 1.) *. s_x *. s_y)

by Shayne Fletcher (noreply@blogger.com) at September 21, 2014 02:17 PM

September 18, 2014

OCamlCore Forge News

Lambdoc 1.0-beta1 released

Lambdoc is a library providing support for semantically rich documents in web applications, built with Ocsigen/Eliom integration in mind. It includes parsers for four different markup languages: Lambtex, Lambwiki, Lambxml, and Markdown (via OMD). The library also offers the possibility of outputing any Lambdoc document as an Ocsigen HTML5 value.

by Dario Teixeira at September 18, 2014 03:54 PM

Dario Teixeira

Announcing Lambdoc 1.0-beta1

I'm happy to announce release 1.0-beta1 of Lambdoc, a library providing support for semantically rich documents in web applications. Lambdoc was designed with Ocsigen/Eliom integration in mind, though you may of course use it with other frameworks (it does not actually depend on the Ocsigen server or Eliom).

A brief overview of Lambdoc's features

  • A rich set of supported document features, including tables, figures, math, and source-code blocks with syntax-highlighting.
  • Built-in support for multiple input markups (see below), and easy integration of additional custom markups.
  • Runtime customisation of available document features. You may, for instance, declare that a certain class of users may not format text passages in bold.
  • Detailed error messages for mistakes in input markup.
  • A simple macro mechanism.
  • An extension mechanism.
  • The CLI application lambcmd, which allows conversion from any input markup to any output markup from the comfort of the command line.
  • Ships with decent looking CSS, easily customisable to your needs. Note that you'll need CCSS (available on OPAM) if you wish to modify the source for the CSS.

Supported input markups

This first beta of Lambdoc ships with built-in support for four different input markup languages:

  • Lambtex: Shamelessly inspired by LaTeX, Lambtex is my take on what LaTeX should look like if one were to get rid of all legacy baggage and gear it towards publishing on the web. Lambtex supports all of Lambdoc features, and even has a complete manual (which by the way I also recommend if you want to get a comprehensive list of all document features supported in Lambdoc).
  • Lambwiki: Largely inspired by the Wiki Creole syntax, Lambwiki is a light-weight markup language. Though it does not support some of Lambdoc's more advanced features, it is veritably light and its syntactic conventions are IMHO more memorable than Markdown's. Moreover, it also has a complete manual.
  • Lambxml: An XML-markup largely compatible with HTML. I don't find XML to be particularly human-friendly, but Lambxml might prove useful as a gateway for external XML-outputting tools.
  • Markdown: Love it or hate it, Markdown is ubiquitous, and as such supporting it is practically mandatory. Lambdoc supports Markdown via the OMD library, and therefore you should refer to OMD's documentation to learn about the supported flavour of Markdown. Note that Lambdoc's integration of OMD is still experimental, and there are still some issues to be resolved before the final 1.0 release. Prominently, OMD does not currently preserve location information, which is required for Lambdoc's error reporting mechanism. Fortunately, this issue has been acknowledged upstream.

Supported output markups

The only supported output markup is HTML5 via Tyxml. However, the functorial implementation used allows easy integration with Eliom.

Developer documentation

Unfortunately, developer documentation for this beta release is still sparse. Ocsigen/Eliom users are advised to take a look at the four-part tutorial included in the examples directory. The first step of the tutorial is a very minimalistic and straightforward illustration of how Lambdoc can be integrated in Eliom applications. Each subsequent step builds upon this foundation by introducing one new feature. Hopefully this will be enough to get you started.

About the extension mechanism

The extension mechanism is the latest addition to Lambdoc. It allows for the attachment of custom hooks to the processing of inline links, inline and block images, and the generic extern block. It is still somewhat experimental, but hopefully flexible enough to cover most use cases. Check out the last step of the tutorial for a basic example, or the source of lambcmd for a more complex real-world example which uses Bookaml to enable the special protocol isbn for links to books.

On the betaness of this release

Besides the aforementioned issues with the OMD integration, the lack of proper documentation, and the experimental character of the extension mechanism, the beta moniker for this release is also justified by the somewhat ad-hoc build system (I'm not sure OASIS even supports a project using module packs internally). Fortunately, using OPAM should spare you the trouble of worrying about this issue.

One important caveat: though I have no plans for further changes to the API, the betaness of this release also means I'll have no compunction in making them should the need arise.

Concluding remarks

The package is now available on OPAM. It has a tonne of dependencies, but since they are all packaged in OPAM, this shouldn't be a hassle. Note that some of the dependencies (Lwt, Ocsigenserver, Bookaml) apply only to the lambcmd CLI utility, and not the library itself. (Yes, I'm considering simplifying lambcmd for subsequent releases.)

Your comments/suggestions/criticisms are of course welcome. Feel free to send me an email or to open a ticket on the project's page at Github. I'll be particularly thankful if you find any bugs.

by Dario at September 18, 2014 10:42 AM