This is Garth Snyder's Typepad Profile.
Join Typepad and start following Garth Snyder's activity
Join Now!
Already a member? Sign In
Garth Snyder
Recent Activity
May I suggest a few quibbles with your data and analysis, Jeff? I'll explain here, but you can follow along with my updated spreadsheet if you like. My first observation is that fully 58% of your respondents scored one sample a 1, one sample a 2, etc., all the way through 5. In other words, they most likely misread the instructions and ranked the samples by quality rather than rating them independently. I don't think you have 3511 sets of ratings; you have 2045 sets of ranking data and 1466 sets of rating data. Both data sets are useful and informative, but you can't freely intermix the numbers and analyze them conjointly. The data types are not commensurate, and they require different statistical techniques for analysis. As it happens, though, removing the ranked data and looking only at the 1466 sets of presumed true ratings doesn't really change the patterns you mentioned above. A t-Test matrix on the revised data shows that Feta (128kbps) is clearly distinct from all other samples. And no pairing from the pool of Cheddar (320kbps), Gouda (raw), and Brie (192kbps) shows a statistically significant difference. But... Limburger (160kbps) is in fact rated higher than all other samples, and each of those pairings has a p value far smaller than 0.05. The largest p value is 0.00003. That is very strong and consistent statistical evidence, and you can't just wave it away because it's "clearly insane" (i.e., it doesn't agree with your preconceptions). I agree with you that it's highly unlikely that 160kpbs MP3s actually sound better than their higher- and lower-bit-rate counterparts. My theory is that you've demonstrated that the order of presentation of the samples influences the responses. In other words, respondents may tend to interpret the first-heard sample as a reference baseline. As near as I can tell, you didn't randomize the order of presentation in your original post (or at least, the samples keep coming up in the same order for me...). I bet that if you reran the test with Gouda (raw) listed first, the results would directly (though probably erroneously) contradict your original thesis. It would be interesting to take a look at the ranked data as well to see what it has to say. This is getting beyond my level of statistical knowledge, but I suspect that something like the following treatment would be appropriate: 1) Restrict the data set to 1-5 rankings. 2) Drop the Feta column (since we all agree that Feta is distinguishable; we want to see if Limburger can be distinguished from the others). 3) Recode the rankings to the range 1-4; in other words, for each person's rankings, assign the lowest value the "1", the next-highest value the "2", etc. 4) Prepare a summary table of cheese vs. 1-4 with a count of the number of the appropriate responses in each cell. 5) Prepare a reference table similar to #4, but with 1466/4 in each cell (even distribution). 6) Use a chi-squared test to test whether the distribution observed in #4 is distinguishable from the reference distribution in #5. As Barbie says, survey design is hard...
Garth Snyder is now following The Typepad Team
Jun 27, 2012