Introduction
Update: The by function of R can be used for the same task since data for me is stored in the same data.frame. I have tested that out later on in the post.
Recently I had to run though a large data.frame in R, selecting values from a given column based on an equality constraint on another column, in this format:
This is a fairly common operation and I had thought that beyond the vectorization I had done (borrowed term from MATLAB), R would take care of optimizations under-the-hood.
At this point, I thought how I would do it on a larger database and it struck me that perhaps I can do better than R if I manually index the data and find relevant intervals myself. How much faster can that be?
Update: The by function of R can be used for the same task since data for me is stored in the same data.frame. I have tested that out later on in the post.
Recently I had to run though a large data.frame in R, selecting values from a given column based on an equality constraint on another column, in this format:
for(i in 1:length(values)) {
t <- my.var$row2[ my.var$row1 == value[i] ];
do.something.interesting(t);
}
This is a fairly common operation and I had thought that beyond the vectorization I had done (borrowed term from MATLAB), R would take care of optimizations under-the-hood.
At this point, I thought how I would do it on a larger database and it struck me that perhaps I can do better than R if I manually index the data and find relevant intervals myself. How much faster can that be?