Sunday, October 30, 2011

Starting out with Ocaml

There are many thing which they do not tell you about Ocaml right when you start out. This is a stumbling block for many, and in this post, I will cover a few facts which just might help out the beginner a tad bit.

Toplevel

The first look at the toplevel is disheartening.

# let f = fun x : x + 1^[[D^[[D^[[D

The toplevel does not support readline functionality out of the box. This leads to considerable frustration (note that I accidentally wrote a ':' instead of '->' and I tried pressing the left-arrow to try to fix it.

However, there exists ledit which can be used to give the REPL some sensible behavior, albeit nothing like iPython's auto-completion.

$ aptitude install ledit
$ ledit ocaml

There are other readline wrappers around too which can provide similar functionality (rlwrap, etc.)

Working with external libraries

Ocaml handles libraries in a very very low-level fashion, and getting things to work correctly from the top-level can be particularly painful. Here are a few tips which may help if you are working with external libraries. Your mileage might vary.

Godi Ocaml

Use Godi's distribution of Ocaml instead of the Debian/Ubuntu packages:

$ aptitude install ocaml      # Not a very good idea

After Godi's installation, it'll take a little effort to set up the path/soft-links to execute the scripts initially, but after that hurdle, the rest is a very smooth ride.

Godi's installation provides some excellent ocaml packages out-of-the-box and does not require root privileges. Also, one of the clinchers is:

$ godi_console

for easy installation of many commonly used packages.

However, getting these packages loaded into the top-level just to play with could be a pain, but there is a remedy.


Topfind

Use topfind to load modules in the top-level and handle dependencies.

# #use "topfind";;
- : unit = ()
Findlib has been successfully loaded. Additional directives:
  #require "package";;      to load a package
  #list;;                   to list the available packages
  #camlp4o;;                to load camlp4 (standard syntax)
  #camlp4r;;                to load camlp4 (revised syntax)
  #predicates "p,q,...";;   to set these predicates
  Topfind.reset();;         to force that packages will be reloaded
  #thread;;                 to enable threads

- : unit = ()
# #require "ZMQ";;
/home/utkarsh/godi/lib/ocaml/site-lib/uint: added to search path
/home/utkarsh/godi/lib/ocaml/site-lib/uint/uint64.cma: loaded
/home/utkarsh/godi/lib/ocaml/site-lib/ZMQ: added to search path
/home/utkarsh/godi/lib/ocaml/site-lib/ZMQ/ZMQB.cma: loaded
#

Finally, there is a way of making your own top-level which loads certain default modules when it starts:

$ ocamlmktop -o my_top_level my_module.cmo your_module.cmo our_module.cmo

Ocamlbrowser
This is available with the godi-ocaml-labltk package which can be installed using godi_console.
This is an near absolute necessity while working with most Ocaml modules, even if they are very well documented. It certainly beats going through the .mli files of the packages.

I still have to try out the Eclipse's Ocaml IDE to see whether it can replace ocamlbrowser with intellisense.


Notes on tuning JVM

Notes from a talk about tuning the JVM by Attila Szegedi, Twitter.

Know the problem and options

Use these option to get to know whether GC happens often:

-verbosegc
-XX:+PrintDCDetails
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution

More eden size is always better. Old generation collectors:

  1. Throughput collectors:
        -XX:+UseSerialGC
        -XX:+UseParallelGC
        -XX:+UseParallelOldGC
  2. Low-pause collectors:
        -XX:+UseConcMarkSweepGC
        -XX:+UseG1GC
  3. Adaptive threshold collectors:
        -XX:+UseAdaptiveSizePolicy (with throughput collectors only)
        -XX:MaxGCPauseMillis=100 (ms)
        -XX:GCTimeRatio=19 (GC time : running time ratio)
  4. If all fails, us concurrent mark-and-sweep, collects all the time. Hence, having extra CPU can help avoid stop-the-world latency!
        -XX:InitiatingOccupyFraction (set to 75-80, triggers GC)

Fat data

Data might be taking too much memory.

Object headers

Object header = 2 machine words = 128 bits on 64 bit machine = 16 bytes

Similarly, Array headers take 24 bytes.

  • new java.lang.Object() takes 16 bytes
  • new byte[0] takes 24 bytes

Padding:

In 8 bytes blocks. Happens for each subtyping.

class A { byte x; };
class B extends A { byte y; }
  • new A() takes 24 bytes (add 1 byte -> get 7 more bytes for padding)
  • new B() takes 32 bytes (same here)

No inline structures

class C { Object obj = new Object(); }
  • new C() takes 40 bytes.
  • 2 objects handlers and 1 pointer (w/ padding) = 16 + 16 + 8 = 40

Compressed pointers

-XX:+UseCompressedOops

Pointers become 4 bytes long and can be used below 32 Gb heap size. On by default from JDK 6, update 21.

Vally in performance in going from 30 Gb heap to 40 Gb, as points grow from 4 to 8 bytes.

Primitive wrappers

Happens without knowing (say, in Scala 2.7.7, Scala 2.7.8 fixed it):

  • Seq[Int] stores java.lang.Integer => needs 24 + 32 * length bytes
  • Array[Int] stores int => needs 24 + 4 * length bytes

Moral: Surprises exist. Profile everything.

Thread locals hurt

Threads don't die (pools). Just create new objects.

Summary

UncompressedCompressed32-bit
Pointer 8 4 4
Object header 16 12* 8
Array Header 24 16 12
Superclass pad 8 4 4
  • Compressed memory: Object can have 4 bytes of fields and still take only 16 bytes.
There are a lot of other interesting things he talked about as well, like Apache Thrift, Guava MapMaker libraries, etc.

~ musically_ut

Thursday, October 27, 2011

Verbosity of programming languages

It is not often that one sees the same functionality being implemented in more than one language outside toy examples and for mere bragging rights. However, the AI challenge's starter packages:

  1. Each implement the same algorithm in different languages
  2. With the same I/O infrastructure
  3. Were not written intentionally to be better than the other (no bias)
So I thought it would be a good data-set to look at how verbose the languages were while controlling for the task at hand:


Lines of code it takes to implement the same functionality
in different programming languages.
(Red bars likely to be higher than shown)

Thursday, October 20, 2011

Software Reliability: 3 general problems

Introduction

While creating a critical piece of software (e.g. creating a Smart Home Controller for my Master's Thesis), one of the emphasis is on reliability.

Broadly speaking, reliability means not crashing or, failing that, exiting gracefully and then restarting without requiring any supervision.

Hence, I set about to write a tiny wrapper to execute applications in, which will wait for heartbeats from the application, restart them if they miss too many heartbeats or if they crash (exit with a non-zero return value), and will listen to the outside world and restart the application when asked to.

Wrapper ensuring some level of reliability


Wednesday, October 19, 2011

Humble "FrozenSynapse" Bundle

Humble Bundle came back again, this time with just one new game FrozenSynapse and the previous FrozenByte bundle with it, if you paid more than the average.

This run of the Bundle was not as peppered with events as the previous bundles were: there were only two mid-air additions to the bundle, instead of half a score of interesting events happening the last time and nothing (to the best of my knowledge) was made open source:

Humble "FrozenSynapse" Bundle's economic performance