Musically yours: test pilot

Showing posts with label test pilot. Show all posts

Monday, December 27, 2010

Bar charts with icons and most experimenting users

Showing icons instead of the boring colors in charts is an oft-used Infographics technique. I too used it to draw the lower part of a graph.

Now with the PictureBar class, making them in Processing is easy. The output of the code is akin to this (the icons are only samples, can be substituted by your own icons):

Sample output of the PictureBar program

Sample output after some post processing

The program
The source code of the class is well documented but here are a few pointers as to how the code (in particular the draw() function) works:

Given the number of icons to be present in a bar and the distance desired between bars, it determines the width of bars:

This also fixes the icon size because we know the number of icons to put at each level

Select the bar which has the maximum value and determine how many rows of icons will fit in the provided height (the entire height is used)
Decide how many icons to put in the last row of the bar with maximum value:

The total number of icons in this bar will determine the value (in percentage) of each icon
Decision is based on what number minimizes the desired and represented percentages across bars.

The bars are drawn using the given number of icons either from top to bottom or from bottom to top (default)

However, the rows are always filled from left to right.

Example usage of this class is given in the picturebars.pde file. As I am learning Processing as well as playing with the Mozilla Test Pilot data, this is the result of combining both together (and then post processing the output with Gimp).

A word about the data
The data used here is the user survey conducted as part of the Test Pilot suite. The survey was optional and a total of 4,081 users answered it. Out of these, 3361 users used Firefox as their primary browser or used only Firefox and 279 people either did not answer this question or marked other. The chart here shows the distribution of these pagan beta testers. The exact numbers are given to the right of the graph.

Digging a little deeper
Though the graphics shown here is only for demonstrative purposes, after some analysis there is one interesting observation which can be made.

We know already that the user share of different browsers (as on 27th December 2010) is approximately:

Internet Explorer: 46 % of users
Firefox: 30 % of users
Chrome: 12 % of users
Safari: 6 % of users
Opera: 2 % of users
Mobile browsers (others): 4 %

This safely can be taken to be the prior distribution of web users. This means that an average user (about whose browser preference we know nothing) will be an IE user with 46% probability, a Firefox user with 30 % probability, etc. Further, if we know that they are not Firefox users (that is, have honestly marked on the survey they voluntarily submitted that Firefox is not their primary browser), then using Bayes' rule the probability of being in each class is jacked up by 1/(1 - 0.3) or by about 1.43:

65 % users should be of IE
17 % users should be of Chrome
9 % users should be Safari users
3 % users should be Opera users

By these estimates, those 441 users who are Beta testing Firefox should have been thusly divided:

286 IE users instead of 114
75 Chrome users instead of 243 (!)
37 Safari users (Bang on!)
13 Opera users instead of 47

Now before we conclude anything from here, 441 is not a large number and the 279 people who did not answer this question (or gave other as an answer) could have completely changed our outlook.

With this pinch of salt, there are some interesting hypothesis which can be formed, keeping in mind that these users were beta testing Firefox:

IE users either love their browser very much, or they do not experiment a lot
Chrome and Opera users love to experiment with other browsers (Firefox)

Of course, one can make lofty claims about the attrition rates of other browsers, but I do not think that the data is sufficient to conclude that.

Any way to test these?

Appendix

All logos are taken from the respective Wikipedia entries of the browsers.
Large Firefox logo taken from here.
All graphics shared under Creative Commons Attribution-ShareAlike 3.0 license

Also interesting:

How long will your browser sessions last

Thursday, December 16, 2010

How long will your browser session last?

Browser sessions

This is a follow up from my last post and the analysis takes a different direction in the next post, where I talk about someone beta testers who are not Firefox users.

First a short recap:

Extracted the BROWSER_STARTUP and BROWSER_SHUTDOWN events from this data set.
Sorted them by user_ids and then timestamps.
Preserved only alternating startup/shutdown events for each user.

Discarded about 10% of the data here (578,496 entries remained)

Ignoring the user, found out the distribution of the session times and plotted it.
Was surprised.

Unterminated sessions

One of the concerns was that it might be the case that the longer browser sessions were still 'on' when the study period ended. There were only about 10,000 browser session open at the week end, which is less than 2% of the total browser sessions in the data set. Hence, the long lasting browser sessions would not have effected the end results much.

User sessions

Also, it is clear (actually, only in hindsight) that the users who open their browser only for short periods, will open it often in a given fixed period. This is a classical problem of Palm Calculus. As we are looking at time limited data (1 week long), the shorter browser sessions have a greater propensity to occur. However, this does not invalidate the previous results: from the browser's point of view, it will still be closed in under 15 minutes 50% of the time.

Browser's point of view of session times

Or, when stated more aesthetically:

Firefox session time distribution

However, from the User's point of view, the scenario is a little different. Upon looking at the average length of Browser sessions by each user (more than 25,000 users have at least one open/close event pair and 95% have more than 2 such events), it clearly stands out that the number of people who have average time from 15 sec to 15 minutes is not very high:

Number of users who have the given average session time (log scale)

Note that the graph ticks are not aligned to the bin divisions.

Difference

Hence, this visualization which makes clear the difference between how many users experience an average session length and how many times does a browser experience a given session length:

The distribution of users and Firefox sessions against their distribution times.

This is close to what will be my final entry to the Mozilla Open Data Visualization Competition.

Update:

I did not like in the cramped feel of the objects on the graph, so I sacrificed some accuracy (the 5% and 3% bars are the same length in pixels now, but on the other hand, they do not even have error bars).

Hence, I condensed the graphs, changed a little text and decided to go with this:

The data is the same, but the Firefox bar lengths and the User bar lengths are comparable in size now. Even though comparing them does not make any sense, but it is slightly better to have the percentage sizes nearly equal, I think.

Conclusion

So what can we take away from this? The next improvement which Firefox should aim at. Consider the following feature from the two different point of views:

If for only 10% users the average Firefox utilization is less then 15 minutes, and Firefox takes 5 seconds less to start, would it make a difference?
If 45% of the time Firefox is opened and closed in a span of 15 seconds to 15 minutes, would shaving 5 seconds off the startup times make a difference?

Should the priority be more satisfied users or better software?

Which feature / improvements will appeal to users more and which are minor updates?

Which ones should you advertise?

Which point of view should the development team take?

This is just one trade-off, there might be more trade-offs involved in making long term uses/users better than the short term users/users. The information of how the scenario looks like from the user's and the browser's point of view would certainly help in making these decisions and deciding when the feature is a killer one.

Update: The visualization, along with several other excellent entries, is featured here: https://testpilot.mozillalabs.com/testcases/datacompetition

~
musically_ut

Epilogue:

Test pilot Visualization taken from here, designed by mart3ll
Mozilla Logo from here.
All graphics shared under Creative Commons Attribution-ShareAlike 3.0 license

Sunday, December 12, 2010

Are Firefox sessions are really this short?

I have been meaning to play with the Test pilot data for quite a while now. The primary problem was, err ..., my Harddisk space.

Anyhow, I got the data downloaded finally, only to see that my computer's memory was the next bottleneck. Things looked hopeless till I started looking at it as an exercise in online algorithms.

Then I saw one thin silver lining: the data in the csv file was presorted by the user_ids.

There was my window! If I could pre-segment the data based on the user-ids, then I could run though one user at a time, instead of reading all the data. Though not much, but linear grouping of the users already saved me a lot of effort. My next hope was that after ordering by the user ids, the data would be ordered by timestamps. Sadly it wasn't, but one cannot have everything in life anyway. :-)

Extracting the data

After a quick perl script to confirm the ordering of user_ids, I wrote a script to calculate the time intervals between a BROWSER_STARTUP event and a BROWSER_SHUTDOWN event in the data. There were various instances when the BROWSER_STARTUP event was present without a BROWSER_SHUTDOWN event. Probably the browser was started when the system was online (most uses of Firefox are online, after all) but the system was not online when the browser was shutdown (or crashed). So, I skipped all the orphaned start-up events. Also, some startup-events might also be made while the host was offline, and ignoring these would bias the intervals towards being longer. I cleaned up the data (there were a few instances ~ 10s when the session lasted beyond 7 days?) and I plotted the intervals (on a semi-log scale) in R.

Now I am a user who generally carries his laptop around and has his browser open almost all the time. I was in for a small surprise, enough to make me go over the code twice before I realized that the code was correct and my intuition was wrong.

Visualizing the data

This data extraction took a whopping 30 minutes on my meager laptop. The stats are here (all numerical values in milliseconds):

Maximum interval = 534616265 ms (6 days)
Minimum interval = 31 ms
Mean = 4286667.42334182 ms (1.1 hours)
Median = 896310ms (15 minutes!)
Skipped 54748 out of 633212 BROWSER_STARTS

50% sessions were less than 15 minutes long, and this is while the intervals are biased to be longer!

Of course, there are some very long periods, which pull the mean up to 1 hour, but considering that there might be outliers in the data, median is a far more robust measure. In case it is not apparent, this is a beautiful log-normal distribution staring one in the face.

It took me a while to imagine that most operations with a browser end rather quickly for most people, they do not live their lives in a browser. In other words, I am an outlier here!

How long do Firefox sessions last?

Conclusion

I am sure most people at Mozilla know this and, if this is indeed correct, they probably are making sure that Firefox is well suited for short operations and not extended week long uses. I am not sure whether these are competing demands, perhaps making the start-up quicker, pre-indexing search files, etc. also help in extended sessions. However, it appears (with more than just a pinch of salt) that Firefox should focus on the quick-draw and fire users for gaining market in the short term, at least. This is taking the exact opposite direction of Chrome's development, by the way.

However, there are alternate explanations possible. I am not sure how many of these were voluntary terminations; it might be that these are regular restarts. Or it might be an indication of a bigger problem, as this comment indicates: frequent cold-restarts forced by the user to reduce the memory footprint of the browser. Or it might be that people are using Firefox only as a compatibility browser, when something (like Java applets, or broken Javascript code for example) does not work correctly on other browsers, they fire up Firefox, do the task and return back. This last question can be answered using the Survey data.

Will keep digging and hoping to find something interesting.

Feedback welcome.

Update: A few more important factors are:

What were the average session times per user?
How many Browser sessions were still open when the week ended?

~~These concerns will be addressed in the newer post.~~
These have been addressed in this post.

~
musically_ut