Friday, April 4, 2014

I have moved

I started posting on blog.musicallyut.in as an experiment and I like blogging there.

I will not migrate the posts from this blog over, but all new entries will be posted there. :)

Friday, November 22, 2013

Political orientation (summary)

I have difficulty in immediately grasping what views a person holds when they tell me their political orientation, e.g. that they are Convervative or Socialist. I found the following short list on a LW survey and I liked the concise descriptions associated with them:

Libertarian, for example like the US Libertarian Party: socially permissive, minimal/no taxes, minimal/no distribution of wealth
Conservative, for example the US Republican Party and UK Tories: traditional values, low taxes, low redistribution of wealth
Liberal, for example the US Democratic Party or the UK Labour Party: socially permissive, more taxes, more redistribution of wealth
Socialist, for example Scandinavian countries: socially permissive, high taxes, major redistribution of wealth
Communist, for example the old Soviet Union: complete state control of many facets of life

Friday, November 8, 2013

Hide overflowing labels on pie charts

I have seen many questions related to occlusion of labels on pie charts and have always suggested that the problem should be solved by moving the labels outside the pie chart, or by rotating them to align with the axis.

The solutions I have suggested often is to rotate the label but it has never quite satisfied me. Part of it was the horrible font rendering done by some browsers and loss in legibility that brings and the weird flip when one label crosses over the 180° line. In some cases, the results were acceptable and unavoidable, e.g. when the labels were too long but in most cases, they looked just clumsy.

I would consider using a different form of visualization for the data usually, but out of curiosity, I thought that I should try to see how difficult is it to actually do it.

The results are on this Plunker.

Book Review: 2013

Book reviews

I read (and listened to) a lot of books in the part few months and I just couldn't get time to write about them. Finally, I do have some time to pen down my thoughts.

The pleasure of Finding things out

This one is a classic, Feynman's assorted lectures in a collection. This would probably qualify as a pop science book and I listened to this on my daily transits to and from work and it has to be one of the best books I have listened to. In the book, we catch a glimpse of Feynman at the most human level.

I specially loved the anecdotes which showed him to be a human before a Nobel prize winning scientist, for e.g. the valve anecdote where he candidly admits that he had no idea what the little boxes with X symbols in them meant on blueprints of the new Nuclear factory. He risked his credibility by pointing to one of them and saying "What if this valve doesn't work?" fully expecting one of the engineers to say "Sir, the is not a valve, but a window."

Then there were stories which touched deep, like the moment when he realized that his father, his very own role model, could neither explain nor understand everything about the world. A similar story was how he was able to get his son to develop a inquisitive mind by inventing stories about little people encountering everyday objects at extra-ordinary sizes, but how the very same stories never worked on his daughter who always wanted the same stories from the same book read back to her.

His unabashed sincere approach to the supernatural stood out: does faith healing dilute if administered to more than one person at a time? Does it deplete over time?

So did his love of the correct experimental procedure: can mice use the vibrations on the floor to guide themselves? How about the lights of the room?

Overall, a very satisfying, relaxing and inspiring read.

Accelerando

This book by Charles Stross was a whopper. I read the freely available eBook version, after coming to know of it from somewhere on LessWrong, or maybe it was somewhere else, but I am very glad that I did. The book is fast paced, something or the other keeps happening and the story keeps moving forward. However, the story is not what kept me captivated. Actually, the story is largely accidental. What kept my rapt attention was the science-fiction and then, second, the characters.

The science fiction part is very lucid and I often surprised myself by actually thinking why we do not already have somethings mentioned in the book. It did glaze over a lot of the details, but it made sense. Spawning copies of yourself to research a question and then assimilating the experiences from it feels so natural! And useful! Mining the angular momentum of planets! Yes, do want! Dyson spheres! Yes, please! Economics 2.0! Uh, what? Aliens! Umm ... okay. Civilizations trying to hack the universe they are living in! Whoa. I need to sit down.

All characters in the story are unashamedly flawed. And irreparably so. Some of them right from the outset, some show them as the story progresses. Reality is broken.

This book took effort to write and it shows:

how often do you hear of authors taking time off from a short story to write a novel, because writing the novel is easier?

Overall, I loved it and strongly recommend it to any sci-fiction fan.

Another fine Myth

This was another book recommended by LessWrong in a very tangential way:

Quiggley: "Beware demon, I am not without defenses."

Auz: "Oh yeah? Name three."

It was a very light-hearted and easy to read fantasy book. It progresses at an easy place and would qualify as a children's book, save a few moments. The characters were too gullible and the magic was not as awe-inspiring as it is in HPMoR and there were quite a few kinks in the story. In ridiculousness and aphorisms, Robert Aspirin did not come close to competing with Terry Pratchett's disc-world novels, but it was a good relaxing book to listen anyhow. I liked Gleep. I think I'll test the Myth Adventures series (this is the first book form the series) with a few more books before delivering my final verdict on them. The Color of Magic was not very different from this book, though I remember much less of Rinceweed from that book than Skeeve from this one.

This book alone would stand at 3.5 / 5.

Other stuff

I have read a lot of other stuff as well in this period. I am about 40% into Charles Dicken's Bleak House and have finally managed to catch up with Harry Potter and the Methods of Rationality (strongly recommended, along with the audio podcast). I am also listening to A short history of Nearly everything by Bill Bryson, another pop-science book, but which concentrates more on the lives of the scientists and is steeped in British humour. I have to often cover my grin while standing in a bus or train listening to this book. For some reason, it feels obscene to be that happy. Or maybe:

“And those who were seen dancing were thought to be insane by those who could not hear the music.” ~ Friedrich Nietzsche

Other than that, I have been preoccupied with trying to learn German and have started using DuoLingo on some of my transits!

Wednesday, September 11, 2013

Interactive Storytelling in The Last of Us.

I watched the walk-through of The Last of Us. If you haven't seen to the end/played to the end any game by NaughtyDog, then I bet the ending will catch you by surprise.

I loved it.

I like to categorise and keep a record of things which I am doing. I religiously scrobble my songs to Last.fm. I record every book I read with the finish date on Shelfari. I will need to create a new category here: interactive story watched as a movie. Or maybe as a TV series.

I will add Bio Shock Infinite to it. Perhaps Psychonauts too, I saw David play it (almost) to its end while sitting next to him.

I would add Dear Esther to it as well, but I actually played that game. It was a good storytelling experience, but it gave me motion sickness.

I prefer the non-interactive storytelling experience (like a movie), but with the kind of stories which are being told via an interactive medium today (in video-games).

It reminds me of the time I used a Kindle as bookmarker in a physical book I was reading. I know that there is irony in there somewhere; I just cannot put my finger on it.

Tuesday, March 19, 2013

mitmproxy as a reverse proxy

It turns out that thanks to the powerful scripting interface, mitmproxy can be used as a reverse proxy to redirect calls to different servers easily:

revProxy.py

import os

proxyHost = os.environ['PROXYHOST']

def request(ctx, r):
   if r.request.path.startswith('/api'):
       # Proxy calls with /api prefix to proxyHost
       r.request.headers['Host'] = [proxyHost]
       r.request.scheme = 'https'
       r.request.host = proxyHost
       r.request.port = 443
   else:
       # Proxy other calls to localhost:8000 with some rewriting

       if r.request.path.endswith('/'):
           r.request.path += 'index.html'

       r.request.scheme = 'http'
       r.request.host = 'localhost'
       r.request.port = 8000
   # end if
# end def request

Then use mitmproxy as:

PROXYHOST="my.apps.domain" mitmproxy -s revProxy.py

Friday, January 18, 2013

Two room problem

I had a problem.

I fractured my foot last month, on 6th December, and now am walking with crutches. I have largely recovered now, but one excruciating problem I face is having no free arms while going from room to room. That means I cannot bring food to the table from the kitchen, nor carry used plates, glasses, etc. to the kitchen. However, those problems were easy to solve: just eat in the kitchen and drink from reusable bottles.

The real problem was carrying my work around. I sit in the living room (which is large, airy and has a nice window) and lie down in my bedroom (which is a little cloistered but comfortable and snug). I had to constantly pack my work laptop in my backpack and carry it from one place to another since I like to work at both places. This annoyed me to no end, until I found a solution.

The solution was simple: use two laptops. I keep my personal laptop in the bedroom, work laptop in the living room and travelled light between the two places. Work related things are synced using Google Apps and git while my personal things were synced using Dropbox.

On top of that, I use my phone to make calls, and my Kindle (which also fits in my pockets) to read. So I don't have to carry anything in my hands.

I had got my PC as a present from one of my uncles in 1998. Then I stuck with it (replacing it part by part) for the next decade (!) before I got my first laptop in 2008. And that was a luxury. It has only been 5 years since and the computing power I have in my life has increased about 5 folds. I am suddenly living the first-world technophilic life.

When did that happen?

2012: Books

Read surprisingly few of those this year and did not keep up my implicit promise of reviewing all of them. Though few in number, they made up in the quality. It is hard to pick the best one from among the likes of Museum of Innocence, Flowers For Algernon, The Long Earth and The Glass Bead Game.

The Museum of Innocence was a behemoth, taking me through vicissitudes of Kemal Bey, and has to be noted for the massive closure it brought me by its very last line. However, that does not make a book good, it just makes it worth while; the quality of the book was in its calmness.

Flowers for Algernon was a book which hit deep. The subject matter and the plot lies very close to my heart. However, before it became a novel, it was a short story and the slightly sketchy characters and slightly incomplete threads end up showing in the book. Also, though the perspective of Charlie Gordon accentuates the book's poignancy, it at times makes it difficult to enjoy the story as a whole.

The Long Earth was more a treatise on evolution, humanism, politics and life. However, it contained just enough surprises, emotions, and nuclear bombs to keep it from becoming a little drab like The Glass Bead Game. Joshua and Lobsang, an unlikely team, ended up becoming the one of the best partners I have ever encountered in fiction, reminding me of Rupert Birkin and Gerald Circh from D. H. Lawrence's classic Women in love. Another USP of this book was that it was co-written by Terry Pratchett and plays with the inchoate boundary between "hard" Sci-Fi and Fantasy. This is the book which game me the latest name of my laptop: EarthWestOne, and now I am eagerly awaiting the sequel: The Long War. I cannot wait to meet First Person Singular, Mark Trine, and the next iteration of Lobsang.

But, the book which stole the show for me was Seabiscuit: An American Legend. This book made my heart race and was far better than most works of fiction I have read when it came to suspense, character development, story, crescendo, feints, and climax. However, don't read the epilogue in the same siting as the rest of the book.

Karna: Part One

Another interesting and special book I read last year was Karna, the debut novel of my friend Kartik Kaipa. It is based on the great Indian epic: Mahabharata. The characters are the same, the events are the same, the story is the same, the setting is the same and even the super-natural elements are the same. What change is the narration and some relatively minor details. Contrary to the usual pious tone reserved for the religious epic, Kartik makes the story take a more down to earth, and at times blasphemous, mien. His narration is sharp, at times humorous, and honest to the point of making one's bones ache. There is a point where he even breaks character to say some words about equality between women and men and Indian culture.

The story is well researched, with enough accuracy to allow him to bend the details his way. The protagonist of the story, Karna, has seen his ascension and is about to set on his journey as this book ends. I am eagerly awaiting the next two books planned in the series.

Saturday, August 25, 2012

Book Review: Captain Blood

Captain Blood by Rafael Sabatini

While browsing through the audio books on LibriVox, I came across this book. I had just read some Sci-Fi (Permutation City, which was good) and I wanted something more epic in scale, like one of the Dicken's novels. So this was the best fit. It turned out to be a good choice, it kept me good company for the next week.

I listened to this book in twenty minute chunks during my commute to and from the office. All the voices who read the story back to me were clear and precise, though I did find it a little difficult to relate to the Captain's character after hearing multiple renditions of his voice.

The book is well written, even though a bit clichéd. The story is fairly simple but with some twists and turns which kept me on the edge of my seat. The story is coherent and well told. The narrator has gathered these stories of the Captain from the logs kept by one of his buccaneering companion (Jim). The primary theme of the novel is the love between Arabella Bishop and Captain Blood and the narrator breaks-character to give reader glimpses about certain future events which concern it. However, otherwise the narrator remains chronological.

Pre-allocate your vectors

Or else welcome the good ol' friend, the O(n^2) back in your life.

There are three common ways people add elements to vectors in R. First way:

f1 <- function (n) {
    l <- list()
    for(idx in 1:n) {
        l <- append(l, idx)
    }
    return(l)
}

This is bad. It does what it looks like: adds one element to the vector and then copies it to another variable, which just happens to be the same variable 'l'. This is clearly O(n^2).

The second method is this:

f2 <- function (n) {
    l <- list()
    for(idx in 1:n) {
        l[[length(l) + 1]] <- idx
    }
    return(l)
}

This approach looks decent because, coming from a Python and Java background, a good implementation would allow appending elements to vectors in amortized O(1) time. However, this holds an unpleasant surprize. Another good ol' O(n^2) in lurking in the shadows here.

Time taken to create a vector of size n

The method f2 is arguably better than f1, but none of them is any match for f3 which simply uses pre-allocation:

f3 <- function (n) {
    l <- vector("list", n)
    for(idx in 1:n) {
        l[[idx]] <- idx
    }
    return(l)
}

Of course these are just cooked up examples, but it is not difficult to imagine situations where such incremental creation of list is inevitable (think reading from DB cursors). For this particular example, the fastest solution involves a sapply but is not much faster then f3:

f4 <- function (n) {
    return(as.list(sapply(1:n, function (idx) idx)))
}

So in most cases it'll be better to overestimate the number of items in a list and then truncating it to the right size than dynamically expanding it.

And don't forget to smile and wave at the hidden O(n^2)!

~
musically_ut

PS: If you want to run the profile yourself, here's the gist of the functions and the profiling code.

Colors in R console

Update 2: The colorout package has moved again and is now available on GitHub.

install.packages('devtools')
library(devtools)

install_github('jalvesaq/colorout')

Update: The colorout package on CRAN has not been updated to be compatible with R version 3.x.x yet. However, if you compile and install it yourself, it still works.

download.file("http://www.lepem.ufc.br/jaa/colorout_1.0-1.tar.gz", destfile = "colorout_1.0-1.tar.gz")
install.packages("colorout_1.0-1.tar.gz", type = "source", repos = NULL)

R console

Let's face it, the R-console is one of the more uninviting things I have seen, perhaps second only to the Ocaml console which comes without readline.

This is what my (and probably your) R console looks like this:

R console without color

What I see on the console is a single color which, firstly, makes it a challenge to separate stderr, output, warnings and errors, and, secondly, is just boring. I did not know what I could do about it and I stuck with it because the only other option seemed to be moving to a GUI (JGR or Rstudio). This is not to say that there is something wrong with the GUIs, but I prefer working with only my keyboard and, hence, shun GUIs almost as a rule. (Eclipse for Java is an exception.)

But it changed when I discovered the package colorout on CRAN, which makes my console look like this:

R console with color (using colorout)

This makes it remarkably easier to differentiate different forms of output.
The coloring is not perfect (notice that the second '-' in the interval outputted for tmp30 is assumed to be a negative sign instead of being a separator), but I would choose it any given day over the drab single color.
I did need a little tweak to my .Rprofile file since I did not like the way the default settings display the error:

~
musically_ut

Monday, June 11, 2012

Book Review: Seabiscuit - An American legend

This book is about a horse and four people around it: the owner, the trainer and two riders.

To be honest, I did not start the book with high hopes. To begin with, it was non-fiction and then it was about a horse. About 10 pages deep, I had an inkling that Laura Hillenbrand was luring me into a trap. I distinctly remember thinking "This is not how one writes a historical account" but I could not figure out what was amiss. The facts were all there, the cross-references seemed to match up and I could find nothing wrong apart from some poetic exaggerations which brought forth gentle chuckles. I circumspectly read through the account of earthquake in San Francisco without realizing anything.

It was just after the first race in Tijuana that I realised what was wrong with the book: It wasn't boring!

Book review: The Museum of Innocence

The first 100 pages or so made me consider many times why I was reading the book at all (it came highly recommended) and why didn't Kemal bey just move-on. Why stay there? And moreover, why call it innocence when nothing could have been further away? Why color something purportedly black and white like carnal desires in shades of gray by waving brushes of grandeur on it? Why not just admit it for what it was, why not just smack the monkey, massage the seal, walk the cobra and then do something else?

However, by the time his first year of misery was over, I understood him better. His misery was not a choice. His misery was a way of life. His innocence was blindness and denial, but it still reeked of innocence. He lived in Turkey, he lived the life of a rich man, and though the poor there thought that wealth would cure them of their illnesses, he lived not very differently.

Umlauts and accents in Gnome 3

The extra characters can usually be typed with the help of a compose key. In Gnome, one can choose one of the keys on the keyboard to be a compose key in this way.

Rate songs in notifications with Rhythmbox

Update: Keeping in mind the feedback, I have enabled instant update of the notification once the ratings are changed in my new patch to the notification plugin of Rhythmbox.

When do you rate your songs?

I have a huge music library. Enough to keep me engaged for more than a month if I listen to each song in a row with no repeats. Hence, my preferred time for rating songs is while I am listening to it.

Currently, this process is somewhat complicated. I have to go to Rhythmbox's workspace, rate the song and then figure out which workspace did I just leave and then continue my previous work. This interrupts my work flow.

However, Gnome 3 allows actions in its notifications and one can already do the Previous, Play/Pause, Next actions from within the notification.

Line segment intersection

It looks like an easy problem, it has a trove of information on it and Bryce Boe even has a two line solution for it. However, his program does not handle improper intersections, i.e. when one end of one of the segments lies on the other line. Since a point lying on the line does not make triangles with different orientations (since it has zero area), his program does not count it as an intersection.

The highest voted answer on the stack overflow question needs special care while handling vertical lines (And it is general outline for a solution, not actual code).

Now there are multiple methods of correcting these problems, and this is one of them:

Use this function to test the implementation:

Tuesday, December 20, 2011

Numerically stable standard deviation calculation and code perforation

How would you implement a class which has an append(double x) function to collect values, and a get_std_dev() function which returns the standard deviation of the values collected?

Sounds like an easy problem, here is my guess at what your code would look like:

This looks like an standard implementation, but it contains some subtle bugs. These bugs do not show up in regular usage, but they lie patiently in wait. Remember the tricky binary search implementation bug which escaped detection in the JDK for about a decade and is now staple fodder of all technical interviews?

Handling NULLs and NAs

Real world data always has missing and blatantly incorrect values.

This becomes a painful issue when it comes to coming up with predictive models. While there are multiple ways of imputing data, it is difficult to figure out whether one is doing a good enough job. To make matters worse, the rows missing data might not be random. For example, all incomes above a certain threshold might be deliberately made NA to preserve anonymity. However, the model developer might not be aware of this censoring. Imputing data using any central measure will not only fail to capture this bit of information, but will actually make predictions worse.

Similar encoding might be present when one sees columns with values outside the natural limits. For example, say a column that contains number of questions answered from 5 questions in a test having the value -1 to indicate absentees.

In the worst case, a model developed by completely dropping the offending parameter might perform better than an imputed data-set.

In most cases, we can do better.

Starting out with Ocaml

There are many thing which they do not tell you about Ocaml right when you start out. This is a stumbling block for many, and in this post, I will cover a few facts which just might help out the beginner a tad bit.

Toplevel

The first look at the toplevel is disheartening.

# let f = fun x : x + 1^[[D^[[D^[[D

The toplevel does not support readline functionality out of the box. This leads to considerable frustration (note that I accidentally wrote a ':' instead of '->' and I tried pressing the left-arrow to try to fix it.

However, there exists ledit which can be used to give the REPL some sensible behavior, albeit nothing like iPython's auto-completion.

$ aptitude install ledit
$ ledit ocaml

There are other readline wrappers around too which can provide similar functionality (rlwrap, etc.)

Working with external libraries

Ocaml handles libraries in a very very low-level fashion, and getting things to work correctly from the top-level can be particularly painful. Here are a few tips which may help if you are working with external libraries. Your mileage might vary.

Godi Ocaml

Use Godi's distribution of Ocaml instead of the Debian/Ubuntu packages:

$ aptitude install ocaml      # Not a very good idea

After Godi's installation, it'll take a little effort to set up the path/soft-links to execute the scripts initially, but after that hurdle, the rest is a very smooth ride.

Godi's installation provides some excellent ocaml packages out-of-the-box and does not require root privileges. Also, one of the clinchers is:

$ godi_console

for easy installation of many commonly used packages.

However, getting these packages loaded into the top-level just to play with could be a pain, but there is a remedy.

Topfind

Use topfind to load modules in the top-level and handle dependencies.

# #use "topfind";;
- : unit = ()
Findlib has been successfully loaded. Additional directives:
  #require "package";;      to load a package
  #list;;                   to list the available packages
  #camlp4o;;                to load camlp4 (standard syntax)
  #camlp4r;;                to load camlp4 (revised syntax)
  #predicates "p,q,...";;   to set these predicates
  Topfind.reset();;         to force that packages will be reloaded
  #thread;;                 to enable threads

- : unit = ()
# #require "ZMQ";;
/home/utkarsh/godi/lib/ocaml/site-lib/uint: added to search path
/home/utkarsh/godi/lib/ocaml/site-lib/uint/uint64.cma: loaded
/home/utkarsh/godi/lib/ocaml/site-lib/ZMQ: added to search path
/home/utkarsh/godi/lib/ocaml/site-lib/ZMQ/ZMQB.cma: loaded
#

Finally, there is a way of making your own top-level which loads certain default modules when it starts:

$ ocamlmktop -o my_top_level my_module.cmo your_module.cmo our_module.cmo

Ocamlbrowser
This is available with the godi-ocaml-labltk package which can be installed using godi_console.
This is an near absolute necessity while working with most Ocaml modules, even if they are very well documented. It certainly beats going through the .mli files of the packages.

I still have to try out the Eclipse's Ocaml IDE to see whether it can replace ocamlbrowser with intellisense.

Notes on tuning JVM

Notes from a talk about tuning the JVM by Attila Szegedi, Twitter.

Know the problem and options

Use these option to get to know whether GC happens often:

-verbosegc
-XX:+PrintDCDetails
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution

More eden size is always better. Old generation collectors:

Throughput collectors:

    -XX:+UseSerialGC
    -XX:+UseParallelGC
    -XX:+UseParallelOldGC

Low-pause collectors:

    -XX:+UseConcMarkSweepGC
    -XX:+UseG1GC

Adaptive threshold collectors:

    -XX:+UseAdaptiveSizePolicy (with throughput collectors only)
    -XX:MaxGCPauseMillis=100 (ms)
    -XX:GCTimeRatio=19 (GC time : running time ratio)

If all fails, us concurrent mark-and-sweep, collects all the time. Hence, having extra CPU can help avoid stop-the-world latency!
```
    -XX:InitiatingOccupyFraction (set to 75-80, triggers GC)
```

Fat data

Data might be taking too much memory.

Object headers

Object header = 2 machine words = 128 bits on 64 bit machine = 16 bytes

Similarly, Array headers take 24 bytes.

new java.lang.Object() takes 16 bytes
new byte[0] takes 24 bytes

Padding:

In 8 bytes blocks. Happens for each subtyping.

class A { byte x; };
class B extends A { byte y; }

new A() takes 24 bytes (add 1 byte -> get 7 more bytes for padding)
new B() takes 32 bytes (same here)

No inline structures

class C { Object obj = new Object(); }

new C() takes 40 bytes.
2 objects handlers and 1 pointer (w/ padding) = 16 + 16 + 8 = 40

Compressed pointers

-XX:+UseCompressedOops

Pointers become 4 bytes long and can be used below 32 Gb heap size. On by default from JDK 6, update 21.

Vally in performance in going from 30 Gb heap to 40 Gb, as points grow from 4 to 8 bytes.

Primitive wrappers

Happens without knowing (say, in Scala 2.7.7, Scala 2.7.8 fixed it):

Seq[Int] stores java.lang.Integer => needs 24 + 32 * length bytes
Array[Int] stores int => needs 24 + 4 * length bytes

Moral: Surprises exist. Profile everything.

Thread locals hurt

Threads don't die (pools). Just create new objects.

Summary

	Uncompressed	Compressed	32-bit
Pointer	8	4	4
Object header	16	12*	8
Array Header	24	16	12
Superclass pad	8	4	4

Compressed memory: Object can have 4 bytes of fields and still take only 16 bytes.

There are a lot of other interesting things he talked about as well, like Apache Thrift, Guava MapMaker libraries, etc.

~ musically_ut