Hosting the Discovery Channel's MythBusters must seem like every teenager's dream job. Money, fame, and most importantly, you get to blow stuff up. As is often the case with dreams, however, the reality of the job is comparatively quite ugly. Using a rubber band on the gas button of a lighter until it explodes is one thing, but doing so in an attempt to prove something under the critical gaze of millions of viewers (and thousands of nerds) is quite another. Sometimes you might not get to blow anything up at all - and your success may depend on math of all things.
As this article shows, the MythBusters episode addressing whether or not a yawn is contagious is just one such instance. Sure, the hosts got to do more exciting things while the lackeys carried out a pyro's worst nightmare of an experiment, but in the end, it was Adam and Jamie who had to interpret the results. Little did they know their accuracy was about on level with... well, a teenager.
Be careful what you wish for, adolescents of the world, you might one day get it and be held accountable to OmniNerd.
This article is being challenged on Slashdot.
You do NOT use descriptive statistics to study a sample, you need a completely different way of approaching things, namely, statistical analysis.
What you really need to do is Hypothesis testing (http://en.wikipedia.org/wiki/Statistical_hypothesis_testing), and test whether the hypothesis that more people yawn when seeded than when not seeded.
Your analysis is completely flawed, ask any statistician.
Well, it is flawed, but not totally wrong. I coded the data set that was used on the web page and ran it through a chi-square/McNemar+Risk Estimate test (appropriate tests for dichotomous treatment variables + dichotomous outcome variables). No significant difference alpha=.744. But shame on you for using a straight up correlation. --chris
I received a number of emails concerning the statistical method I used (Pearson's correlation coefficient), which provided some insight but does not sufficiently address the issue of causation in the results. Personally, I don't understand how there can be so obviously not a correlation between two variables and there still be a chance there is causation involved, but with the aim of statistical appropriateness, I have included a number of alternative statistical methods below.
Association Test
An association test such as Fisher's Exact Test is appropriate. This method is specifically for determining any non-random association between two categorical (discrete) variables - which is exactly what we have in this instance. Its use, then, removes any issues there may have been in the Pearson analysis having to do with the data set not being continuous.
For those interested, the calculations are described in the link above. The results are easy to come by, however, using online tools such as this calculator at Matforsk.com. Inserting the MythBuster's data results in the following:
TABLE = [ 4 , 10 , 12 , 24 ] Left : p-value = 0.5127817757319189 Right : p-value = 0.7416878304307283 2-Tail : p-value = 1
This corresponds to there being 4 non-seeded subjects who yawned, 10 seeded who yawned, 12 non-seeded to didn't yawn, and 24 seeded who didn't yawn. The resulting p-values are all well above the commonly accepted limit of .05 for significance.
Confidence Interval for the Difference in Rates
This method was recommended via email by Max Kuhn, a "Ph.D. statistician who works in industry." Max provided a very thorough and helpful analysis of the data, which I've included below:
Here is what I would do: create a confidence interval for the difference in rates. I would do this because 1) it can be used to evaluate whether the difference in rates is equal to zero, 2) the width of the interval helps characterize the uncertainty in the data, which is directly related to sample size and 3) p-values alone do not provide people with enough information. Here is a link to a summary of the calculations.
I use the R statistical language a lot, and here is how I got the result:
p1 <- 10/34
p2 <- 4/16
n1 <- 34
n2 <- 16
q1 <- 1-p1
q2 <- 1-p2
p1 - p2
[1] 0.04411765
sqrt((p1*q1/n1) + (p2*q2/n2))
[1] 0.1335103
p1 - p2 + (qnorm(0.05) * sqrt((p1*q1/n1) + (p2*q2/n2)))
[1] -0.1754872
This means that if they were to repeat the same experiment a large number of times, we could feel confident that the difference might be as low as 18% in the other direction. This doesn't give us a good feeling that the seed did anything.
One caveat: the number of events is somewhat low here and many people would tell you that the statistical theory may not be valid for our data. To investigate this a little more, I bootstrapped the simple difference in rates 5,000 times. This lets us estimate the empirical distribution of the difference in rates for our data instead of relying on distributional assumptions and approximations to give us a confidence bound. The empirical distribution of the difference in rates looks fairly symmetric and Gaussian. Using a "bootstrap-t" interval, the lower 95% confidence bound was -0.28, which gives s more reason to doubt that there is a difference. Code for this analysis is below. ...
testStat <- function(index, data)
{
bootSample <- data[index,]
p1 <- mean(bootSample[bootSample$group == "withSeed", "outcome"] == "yawn")
p2 <- mean(bootSample[bootSample$group == "noSeed", "outcome"] == "yawn")
p1 - p2
}
noSeed <- rep(c("yawn", "none"), times = c(4, 12))
withSeed <- rep(c("yawn", "none"), times = c(10, 24))
mythBusters <- data.frame(
outcome = factor(c(noSeed, withSeed), levels = c("yawn", "none")),
group = factor(rep(c("noSeed", "withSeed"), times = c(16, 34))))
testStat(1:50, mythBusters)
library(bootstrap)
set.seed(1)
results <- boott(1:50, theta = testStat, nboott = 5000, data = mythBusters, perc = 0.05)
Linear Regression to Show Sample Size Needed for Significance
I received yet another very friendly and helpful email from Zinj Boisei who pointed out I was too hasty in dismissing the use of an increased sample size. By using a more appropriate analysis, linear regression in this case, Zinj confirmed there was little significance at the sample size of 50 - and even went on to find out large a sample size of the same makeup would need to be for the results to be significant:
[T]he low correlation [found in the article] indicates that only a small amount of the variability may explained by "yawn seed", but the real question is "is the effect real, whatever the size?" A larger sample size would certainly answer this question.
I took the data you prepared, and constructed a linear model for a logistic regression analysis which could be used to test such a question. As expected, I found that at n=50 the results were nowhere near significant (probability of .75 that we would see these results by chance alone). I then searched for the multiple of the sample size that would reject a null hypothesis of no effect at 95% confidence. I found that this occurred at 37 times the original sample size, or n=1850.
Conclusion Addendum
While the statistical method used in the article was sufficient to show the yawn seed was responsible for a negligible amount of the variance, methods such as association tests, confidence interval analysis and linear regression provide more appropriate insight into the causation involved. In this case, all tests lend credence to the original conclusion: the results of the MythBuster's yawn experiment did not support their conclusion.



article
by 
Add a Comment (16)
Email This
Message Author
Statistics
RSS


Correlation correlation by gnifyus :: NR7 :: Show
O.K., here's a question about correlation coefficients in general.
Say you somehow repeated this same experiment 50 times, and 45 out of 50 times the percentages came out with a correlation coefficient of something below the .10 needed for a weak correlation, but at the same time in favor of the yawns being contagious. In other words the correlation coefficient of whether each experiment's results were majority or minority would show a high correlation. What would this say about yawn contagiousness then?
Or is this considered statistically impossible to actually happen, given the CC's obtained from the first experiment?