This is GroundTruthTrek's Typepad Profile.
Join Typepad and start following GroundTruthTrek's activity
Join Now!
Already a member? Sign In
GroundTruthTrek
Recent Activity
Certainly a terrible use of bubbles here, but I think there are cases where using circle area to communicate quantity is useful. It's nice if you have values that vary by too large a factor to express using bars. Also, it can be used to compare a subset to a whole. And on occasion you might have a gigantic number that you want to let run off the chart area, and the curve of the circle's edge allows the reader to complete the circle. I think the recent carbon bomb graphic is a decent expression of the first couple ideas: http://www.vancouverobserver.com/blogs/climatesnapshot/2012/03/08/confused-tar-sands-climate-threat-take-look
Toggle Commented Mar 12, 2013 on Blowing the whistle at bubble charts at Junk Charts
I think that the case where logarithmic scales are "better" than linear scales is if you want to emphasize proportional difference rather than absolute difference. For example, many people prefer to know what % difference there is between the price of things, rather than the absolute difference. If you wanted to capture prices on a graph and express the difference in this way, you'd want to plot log(price) rather than just price. Alternately you could plot the % difference on a linear scale, but that then doesn't give the reader access to information on the absolute prices, which may be of interest as well. Obviously in many cases the simple fact that readers wouldn't know how to read the log axis might incur a greater cost than the more appropriate representation of the data, but I think that decision should be considered on a case-by-case basis. In the case of electromagnetic radiation, it doesn't make much sense to me at all to plot on a linear scale... Such a scale would suggest that the difference between visible light and gamma rays is irrelevant in comparison to the difference between the microwaves in your oven and those coming out of your internet router. This case is a bit different than the more subtle case I outlined above. In this system the number (whether frequency or wavelength) is not really a very direct expression of the nature of what is measured... if we had a name for log(wavelength) like we have a name for log(earthquake energy) [moment magnitude] then we'd rather use that and conceal the fact that any log math was used at all. I used a log axis on a plot for public consumption recently. I did it partly because I felt there was some intrinsic value in challenging the audience to read it... In the blog post including this plot, I included a sidebar explanation of log axes: http://www.groundtruthtrekking.org/blog/?p=2651
Following that line of reasoning along, a very simple utility would be in identifying cases where more explicit statistics are needed to pre-empt poor intuition on the part of a data observer. If people are great at picking out the mean of un-analyzed data, then there's not much need to calculate that mean on the fly. But (for example) people aren't necessarily that great at identifying significant clumpiness in data, so if that's a relevant question it might be good to report data along with some quantification of that clumpiness so that the observer doesn't jump to the conclusion that there's something going on when there isn't. Specific example: A business that sold high-value items might see very few sales per week, and in this situation clumps due to random chance will be common. However real clumps, related to factors the business operators weren't yet aware of, might well happen. So if they had a little helper utility to watch the data stream and estimate the chances that a given clump resulted from random variability, that might be valuable. That's one step beyond the research you're talking about here, but I can see how the research might suggest approaches like this.
It would be interesting to explore this topic graphically rather than numerically. A simple trial would be to present a scatterplot and ask people to draw a best-fit line. Simpler would be a number line inviting the subject to plot the mean. The subject could be asked to match distributions of points they believed arose from the most similar probability functions... Sounds like a kind of fun study to be a participant in. When I took cognitive psych back in college I did something sort of similar: I asked people to add an additional "random" point to a cloud of previously generated pseudorandom points. The results suggested that the process the subject went through to select a point was complex - they intersected an overal tendency toward certain spaces that was unique to each person with a tendency to certain areas (e.g. those far from other points) that everyone preferred with a given arrangement of seed points. Overal I find it very interesting, but I'm not sure what the paths are from results of studies like these to applications. It may be that it could help make choices about where to lead the reader's eye to statistically significant patterns, and where to rely on them to see those patterns without help? Or perhaps it could help analysts inoculate themselves against human weaknesses in the interpretation of data?
GroundTruthTrek is now following The Typepad Team
Nov 14, 2012