Anyhow, I got the data downloaded finally, only to see that my computer's memory was the next bottleneck. Things looked hopeless till I started looking at it as an exercise in online algorithms.
Then I saw one thin silver lining: the data in the csv file was presorted by the user_ids.
There was my window! If I could pre-segment the data based on the user-ids, then I could run though one user at a time, instead of reading all the data. Though not much, but linear grouping of the users already saved me a lot of effort. My next hope was that after ordering by the user ids, the data would be ordered by timestamps. Sadly it wasn't, but one cannot have everything in life anyway. :-)
- Extracting the data
After a quick perl script to confirm the ordering of user_ids, I wrote a script to calculate the time intervals between a BROWSER_STARTUP event and a BROWSER_SHUTDOWN event in the data. There were various instances when the BROWSER_STARTUP event was present without a BROWSER_SHUTDOWN event. Probably the browser was started when the system was online (most uses of Firefox are online, after all) but the system was not online when the browser was shutdown (or crashed). So, I skipped all the orphaned start-up events. Also, some startup-events might also be made while the host was offline, and ignoring these would bias the intervals towards being longer. I cleaned up the data (there were a few instances ~ 10s when the session lasted beyond 7 days?) and I plotted the intervals (on a semi-log scale) in R.
Now I am a user who generally carries his laptop around and has his browser open almost all the time. I was in for a small surprise, enough to make me go over the code twice before I realized that the code was correct and my intuition was wrong.
- Visualizing the data
This data extraction took a whopping 30 minutes on my meager laptop. The stats are here (all numerical values in milliseconds):
Maximum interval = 534616265 ms (6 days)
Minimum interval = 31 ms
Mean = 4286667.42334182 ms (1.1 hours)
Median = 896310ms (15 minutes!)
Skipped 54748 out of 633212 BROWSER_STARTS
50% sessions were less than 15 minutes long, and this is while the intervals are biased to be longer!
Of course, there are some very long periods, which pull the mean up to 1 hour, but considering that there might be outliers in the data, median is a far more robust measure. In case it is not apparent, this is a beautiful log-normal distribution staring one in the face.
It took me a while to imagine that most operations with a browser end rather quickly for most people, they do not live their lives in a browser. In other words, I am an outlier here!
|How long do Firefox sessions last?|
I am sure most people at Mozilla know this and, if this is indeed correct, they probably are making sure that Firefox is well suited for short operations and not extended week long uses. I am not sure whether these are competing demands, perhaps making the start-up quicker, pre-indexing search files, etc. also help in extended sessions. However, it appears (with more than just a pinch of salt) that Firefox should focus on the quick-draw and fire users for gaining market in the short term, at least. This is taking the exact opposite direction of Chrome's development, by the way.
Will keep digging and hoping to find something interesting.
Update: A few more important factors are:
- What were the average session times per user?
- How many Browser sessions were still open when the week ended?
These have been addressed in this post.