Sunday, September 11, 2011

What Perl got right and R got wrong

R tries to do the right thing by having very short names for functions one uses often:

Creating vectors.
Transpose a matrix.
Quitting R
Generic isInstanceOf
apply after grouping
Generic type cast
Common primitive functions in R

As much as I like not having to type extra characters to get to these functions, I have always had to be extra cautious when it comes to naming my variables out of fear of accidentally overwriting any of these. Interestingly, R selectively ignores such overrides letting the primitives prevail if possible:

> R.version.string
[1] "R version 2.12.1 (2010-12-16)"

> c <- function(...) { 42 }   # Accidentally overriding c()
> c(45, 67, 78)               # Expected behavior
[1] 42

> c <- "42"                   # Now it should be a scalar, right?
> c
[1] "42"

> c(45, 67, 78)               # Magical return from the dead!
[1] 45 67 78

This overriding of an identifier as both variable and primitive function is grossly inconsistent, specially since functions are first class objects, same as any vector or string. If I override an identifier, even a primitive, I expect it to be really overridden.

Though the intention of cutting keystrokes is clearly good here, this inconsistency feels avoidable.

On the other hand
In contrast, this highlights how well Perl does the same thing. Larry Wall, the creator of Perl, was a linguist by education and one of his tenets while creating Perl was making it as succinct as possible. To this end, it ventured to those corners of the keyboard which only APL had gone beyond. Also, on an unrelated note, he was the winner of the International Obfuscated C contest twice and some say that after Perl became mainstream, there wasn't much point left in obfuscation contests.

Perl also short-cuts most commonly used functions and type qualifiers:

Variable definitions
Array variable prefix
Hash variable prefix
Scalar variable prefix
Substitute function
Match function
Common primitive functions in Perl

However, in Perl, the domain of the abbreviations looks completely different from identifiers; it is impossible to even imagine confusing them. In R, the use of identifiers and built-in primitives seems uniform, and is actually inconsistent.

And we have a winner ...

I think between R and Perl, Perl is the one which got it completely right.

The design principle to be learned here perhaps is that of having different keystrokes for different folks. The onus of preserving the (abbreviated) primitive functions should lie with the language, and it can be done:

  • the right way by syntactical obviousness as Perl does it,
  • the good ol' way by reserving keywords as Python does it or,
  • the wrong way by allowing overrides and resuscitating primitives as R does it.


Shreyes said...

Though I havent used Perl, but I can vouch for these problems with R:-)

musically_ut said...

Yes, there are some problems with R.

However, there is something which Perl sacrifices to deal with this problem: it becomes a "write once and forget" language. It is seldom possible to be able to understand, let alone reuse, any code written more than a week ago in Perl.

R, with all its caveats, can at least be reused to some level. :)

MK said...

Hey, I used your the c() example here in one of the R sessions I am taking at my workplace!!


musically_ut said...

Glad to know that!