Thursday, October 27, 2011

Verbosity of programming languages

It is not often that one sees the same functionality being implemented in more than one language outside toy examples and for mere bragging rights. However, the AI challenge's starter packages:

  1. Each implement the same algorithm in different languages
  2. With the same I/O infrastructure
  3. Were not written intentionally to be better than the other (no bias)
So I thought it would be a good data-set to look at how verbose the languages were while controlling for the task at hand:

Lines of code it takes to implement the same functionality
in different programming languages.
(Red bars likely to be higher than shown)

I used the latest version of cloc to count the lines of code in each language. However, as a few languages are yet not supported by it (e.g., CoffeeScript), they are not present in the analysis.

A surprise here is Go, which is able to amazingly beat Java in its own game (of verbosity)!
However, the package for Go also includes tonnes of debugging code.
C# and C++ immediately bring down that verbosity to (almost) manageable levels.
And another surprise is Lua, with only 200 lines of code. Ocaml's design might be much cleaner, but it is unable to show-off its succinctness, and it is in the same league as C/C++. Haskell does not end up faring much better either, possibly because it has to do a lot of pretty printing and parsing. Every language from D to Lua looks alike, including Scala. 
There is something for the PHP v/s Perl debate here too: PHP takes 33% less lines of code than Perl, even when the primary task is of lexing and parsing. (Doesn't feel right, does it?)

Also, the red bars stand for those packages which do not have the Hills feature implemented and are likely to increase their lines of code. Another source of discrepancy might be some debugging code which is present with some packages and not with others. Nevertheless, I believe that the numbers here pretty much reflect how verbose these languages are.

However, do not read too much into the charts, the comparison is not very objective anyway.

Now for something ...

Another interesting, though oft neglected, part of the language is the disk-spread the language brings with it: the different number of files one needs to deal with while coding in that language. This disk-spread is usually not enforced by the language but by the standards surrounding the language instead (header-files declaration v/s code definition, each Class in its own file, etc).

I understand that defining each class in its own file reflects a clean modular design, but as I do not generally use an IDE, it is also a slight pain (esp. if the language is not recognized by ctags).
How many different files it took for implementing
the same code in different languages (red bars likely to go higher)

Scala code has a whopping 12 files to do the task, along with amazingly low number of lines of code. It manages to do that by having multiple class files with just 1 line of definition in them. That is the disk-spread I am talking about. Java, C# and C++ also have a large disk-spread.

The Makefiles  are not included in the count, though.



Harsh Pareek said...


But since you're comparing small programs, you're comparing their initial overhead i.e. initializations and import statements more.

This is not directly related to how hard it is to program in the language. There are lots of APIs which are often written on several languages, which could be a good starting point.

musically_ut said...

Hi Harsh,

Thanks for the comment.
Yes, this is a rather small data-set to work on, but 200+ lines ought to cover the boiler-plate well enough. Don't you think?

Besides, it was just a harmless look at the languages, not meant for any conclusions regarding how hard it is to code in them. :)

Yes, APIs are just as good a place to start from, and I did eye the API for ZeroMQ for a whlie. However, there is a catch: if the API generation is automated (even partially), with SWIG or any other such tool, it may lead one astray.