Musically yours: infographics

Monday, December 27, 2010

Bar charts with icons and most experimenting users

Showing icons instead of the boring colors in charts is an oft-used Infographics technique. I too used it to draw the lower part of a graph.

Now with the PictureBar class, making them in Processing is easy. The output of the code is akin to this (the icons are only samples, can be substituted by your own icons):

Sample output of the PictureBar program

Sample output after some post processing

The program
The source code of the class is well documented but here are a few pointers as to how the code (in particular the draw() function) works:

Given the number of icons to be present in a bar and the distance desired between bars, it determines the width of bars:

This also fixes the icon size because we know the number of icons to put at each level

Select the bar which has the maximum value and determine how many rows of icons will fit in the provided height (the entire height is used)
Decide how many icons to put in the last row of the bar with maximum value:

The total number of icons in this bar will determine the value (in percentage) of each icon
Decision is based on what number minimizes the desired and represented percentages across bars.

The bars are drawn using the given number of icons either from top to bottom or from bottom to top (default)

However, the rows are always filled from left to right.

Example usage of this class is given in the picturebars.pde file. As I am learning Processing as well as playing with the Mozilla Test Pilot data, this is the result of combining both together (and then post processing the output with Gimp).

A word about the data
The data used here is the user survey conducted as part of the Test Pilot suite. The survey was optional and a total of 4,081 users answered it. Out of these, 3361 users used Firefox as their primary browser or used only Firefox and 279 people either did not answer this question or marked other. The chart here shows the distribution of these pagan beta testers. The exact numbers are given to the right of the graph.

Digging a little deeper
Though the graphics shown here is only for demonstrative purposes, after some analysis there is one interesting observation which can be made.

We know already that the user share of different browsers (as on 27th December 2010) is approximately:

Internet Explorer: 46 % of users
Firefox: 30 % of users
Chrome: 12 % of users
Safari: 6 % of users
Opera: 2 % of users
Mobile browsers (others): 4 %

This safely can be taken to be the prior distribution of web users. This means that an average user (about whose browser preference we know nothing) will be an IE user with 46% probability, a Firefox user with 30 % probability, etc. Further, if we know that they are not Firefox users (that is, have honestly marked on the survey they voluntarily submitted that Firefox is not their primary browser), then using Bayes' rule the probability of being in each class is jacked up by 1/(1 - 0.3) or by about 1.43:

65 % users should be of IE
17 % users should be of Chrome
9 % users should be Safari users
3 % users should be Opera users

By these estimates, those 441 users who are Beta testing Firefox should have been thusly divided:

286 IE users instead of 114
75 Chrome users instead of 243 (!)
37 Safari users (Bang on!)
13 Opera users instead of 47

Now before we conclude anything from here, 441 is not a large number and the 279 people who did not answer this question (or gave other as an answer) could have completely changed our outlook.

With this pinch of salt, there are some interesting hypothesis which can be formed, keeping in mind that these users were beta testing Firefox:

IE users either love their browser very much, or they do not experiment a lot
Chrome and Opera users love to experiment with other browsers (Firefox)

Of course, one can make lofty claims about the attrition rates of other browsers, but I do not think that the data is sufficient to conclude that.

Any way to test these?

Appendix

All logos are taken from the respective Wikipedia entries of the browsers.
Large Firefox logo taken from here.
All graphics shared under Creative Commons Attribution-ShareAlike 3.0 license

Also interesting:

How long will your browser sessions last

Thursday, December 16, 2010

How long will your browser session last?

Browser sessions

This is a follow up from my last post and the analysis takes a different direction in the next post, where I talk about someone beta testers who are not Firefox users.

First a short recap:

Extracted the BROWSER_STARTUP and BROWSER_SHUTDOWN events from this data set.
Sorted them by user_ids and then timestamps.
Preserved only alternating startup/shutdown events for each user.

Discarded about 10% of the data here (578,496 entries remained)

Ignoring the user, found out the distribution of the session times and plotted it.
Was surprised.

Unterminated sessions

One of the concerns was that it might be the case that the longer browser sessions were still 'on' when the study period ended. There were only about 10,000 browser session open at the week end, which is less than 2% of the total browser sessions in the data set. Hence, the long lasting browser sessions would not have effected the end results much.

User sessions

Also, it is clear (actually, only in hindsight) that the users who open their browser only for short periods, will open it often in a given fixed period. This is a classical problem of Palm Calculus. As we are looking at time limited data (1 week long), the shorter browser sessions have a greater propensity to occur. However, this does not invalidate the previous results: from the browser's point of view, it will still be closed in under 15 minutes 50% of the time.

Browser's point of view of session times

Or, when stated more aesthetically:

Firefox session time distribution

However, from the User's point of view, the scenario is a little different. Upon looking at the average length of Browser sessions by each user (more than 25,000 users have at least one open/close event pair and 95% have more than 2 such events), it clearly stands out that the number of people who have average time from 15 sec to 15 minutes is not very high:

Number of users who have the given average session time (log scale)

Note that the graph ticks are not aligned to the bin divisions.

Difference

Hence, this visualization which makes clear the difference between how many users experience an average session length and how many times does a browser experience a given session length:

The distribution of users and Firefox sessions against their distribution times.

This is close to what will be my final entry to the Mozilla Open Data Visualization Competition.

Update:

I did not like in the cramped feel of the objects on the graph, so I sacrificed some accuracy (the 5% and 3% bars are the same length in pixels now, but on the other hand, they do not even have error bars).

Hence, I condensed the graphs, changed a little text and decided to go with this:

The data is the same, but the Firefox bar lengths and the User bar lengths are comparable in size now. Even though comparing them does not make any sense, but it is slightly better to have the percentage sizes nearly equal, I think.

Conclusion

So what can we take away from this? The next improvement which Firefox should aim at. Consider the following feature from the two different point of views:

If for only 10% users the average Firefox utilization is less then 15 minutes, and Firefox takes 5 seconds less to start, would it make a difference?
If 45% of the time Firefox is opened and closed in a span of 15 seconds to 15 minutes, would shaving 5 seconds off the startup times make a difference?

Should the priority be more satisfied users or better software?

Which feature / improvements will appeal to users more and which are minor updates?

Which ones should you advertise?

Which point of view should the development team take?

This is just one trade-off, there might be more trade-offs involved in making long term uses/users better than the short term users/users. The information of how the scenario looks like from the user's and the browser's point of view would certainly help in making these decisions and deciding when the feature is a killer one.

Update: The visualization, along with several other excellent entries, is featured here: https://testpilot.mozillalabs.com/testcases/datacompetition

~
musically_ut

Epilogue:

Test pilot Visualization taken from here, designed by mart3ll
Mozilla Logo from here.
All graphics shared under Creative Commons Attribution-ShareAlike 3.0 license