What is OmniNerd?

Welcome! OmniNerd's content is generated by you, the reader. Through voting and moderation we strive to highlight the nerdiest of what's around and provide content that's a little more thought provoking than other sites.

Submit New Content

Voting Booth

Is it possible that in the distant future, President George W. Bush, the 43rd president, might be viewed as one of the greatest American Presidents?

52 votes, 15 comments
0
Nerd-Its
+ -

More (and perhaps more appropriate) statistical analysis

Comment comment by Brandon on 24 April 2007

I received a number of emails concerning the statistical method I used (Pearson's correlation coefficient), which provided some insight but does not sufficiently address the issue of causation in the results. Personally, I don't understand how there can be so obviously not a correlation between two variables and there still be a chance there is causation involved, but with the aim of statistical appropriateness, I have included a number of alternative statistical methods below.

Association Test

An association test such as Fisher's Exact Test is appropriate. This method is specifically for determining any non-random association between two categorical (discrete) variables - which is exactly what we have in this instance. Its use, then, removes any issues there may have been in the Pearson analysis having to do with the data set not being continuous.

For those interested, the calculations are described in the link above. The results are easy to come by, however, using online tools such as this calculator at Matforsk.com. Inserting the MythBuster's data results in the following:

TABLE = [ 4 , 10 , 12 , 24 ]
Left   : p-value = 0.5127817757319189
Right  : p-value = 0.7416878304307283
2-Tail : p-value = 1

This corresponds to there being 4 non-seeded subjects who yawned, 10 seeded who yawned, 12 non-seeded to didn't yawn, and 24 seeded who didn't yawn. The resulting p-values are all well above the commonly accepted limit of .05 for significance.

Confidence Interval for the Difference in Rates

This method was recommended via email by Max Kuhn, a "Ph.D. statistician who works in industry." Max provided a very thorough and helpful analysis of the data, which I've included below:

Here is what I would do: create a confidence interval for the difference in rates. I would do this because 1) it can be used to evaluate whether the difference in rates is equal to zero, 2) the width of the interval helps characterize the uncertainty in the data, which is directly related to sample size and 3) p-values alone do not provide people with enough information. Here is a link to a summary of the calculations.

I use the R statistical language a lot, and here is how I got the result:

          p1 <- 10/34
          p2 <- 4/16
          n1 <- 34
          n2 <- 16
          q1 <- 1-p1
          q2 <- 1-p2
          p1 - p2
          [1] 0.04411765
          
          sqrt((p1*q1/n1) + (p2*q2/n2))
          [1] 0.1335103
          
          p1 - p2 + (qnorm(0.05) * sqrt((p1*q1/n1) + (p2*q2/n2)))
          [1] -0.1754872

This means that if they were to repeat the same experiment a large number of times, we could feel confident that the difference might be as low as 18% in the other direction. This doesn't give us a good feeling that the seed did anything.

One caveat: the number of events is somewhat low here and many people would tell you that the statistical theory may not be valid for our data. To investigate this a little more, I bootstrapped the simple difference in rates 5,000 times. This lets us estimate the empirical distribution of the difference in rates for our data instead of relying on distributional assumptions and approximations to give us a confidence bound. The empirical distribution of the difference in rates looks fairly symmetric and Gaussian. Using a "bootstrap-t" interval, the lower 95% confidence bound was -0.28, which gives s more reason to doubt that there is a difference. Code for this analysis is below. ...

          testStat <- function(index, data)
          {
             bootSample <- data[index,]
             p1 <- mean(bootSample[bootSample$group == "withSeed", "outcome"] == "yawn")
             p2 <- mean(bootSample[bootSample$group == "noSeed", "outcome"] == "yawn")
             p1 - p2
          }
 
          noSeed <- rep(c("yawn", "none"), times = c(4, 12))
          withSeed <- rep(c("yawn", "none"), times = c(10, 24))
 
          mythBusters <- data.frame(
             outcome = factor(c(noSeed, withSeed), levels = c("yawn", "none")),
             group = factor(rep(c("noSeed", "withSeed"), times = c(16, 34))))
 
          testStat(1:50, mythBusters)
 
          library(bootstrap)
          set.seed(1)
          results <- boott(1:50, theta = testStat, nboott = 5000, data = mythBusters, perc = 0.05)

Linear Regression to Show Sample Size Needed for Significance

I received yet another very friendly and helpful email from Zinj Boisei who pointed out I was too hasty in dismissing the use of an increased sample size. By using a more appropriate analysis, linear regression in this case, Zinj confirmed there was little significance at the sample size of 50 - and even went on to find out large a sample size of the same makeup would need to be for the results to be significant:

[T]he low correlation [found in the article] indicates that only a small amount of the variability may explained by "yawn seed", but the real question is "is the effect real, whatever the size?" A larger sample size would certainly answer this question.

I took the data you prepared, and constructed a linear model for a logistic regression analysis which could be used to test such a question. As expected, I found that at n=50 the results were nowhere near significant (probability of .75 that we would see these results by chance alone). I then searched for the multiple of the sample size that would reject a null hypothesis of no effect at 95% confidence. I found that this occurred at 37 times the original sample size, or n=1850.

Conclusion Addendum

While the statistical method used in the article was sufficient to show the yawn seed was responsible for a negligible amount of the variance, methods such as association tests, confidence interval analysis and linear regression provide more appropriate insight into the causation involved. In this case, all tests lend credence to the original conclusion: the results of the MythBuster's yawn experiment did not support their conclusion.

Star This to Save in Your Profile Favorite
Thread parent sort order:
Highest Voted : Lowest Voted : Oldest : Newest
Thread verbosity:
Expand All : Minimize Replies to Comments

A separate email concerning this post suggested an easier way to determine the needed sample size for significance is to feed higher numbers into Fisher's Exact Test until the pvalue is low enough. I did this using the online calculator mentioned previously and was able to confirm Zinj's linear regression results. The first three tables show proportionally increasing sample sizes that still do not indicate significant association. The last table shows sample sizes of 37 times those used by MythBusters and a 2-Tail pvalue (which is most appropriate given the data set) very close to .05.

------------------------------------------
 TABLE = [ 40 , 120 , 100 , 240 ]
Left   : p-value = 0.17952061079853096
Right  : p-value = 0.8714850889314807
2-Tail : p-value = 0.33731352248492574
------------------------------------------
 TABLE = [ 80 , 240 , 200 , 480 ]
Left   : p-value = 0.08412887957942149
Right  : p-value = 0.9370868768109766
2-Tail : p-value = 0.1522036361266696
------------------------------------------
 TABLE = [ 120 , 360 , 300 , 720 ]
Left   : p-value = 0.04267499690899746
Right  : p-value = 0.9675062534995688
2-Tail : p-value = 0.08429297979286202
------------------------------------------
 TABLE = [ 148 , 444 , 370 , 888 ]
Left   : p-value = 0.027144911945084876
Right  : p-value = 0.9791738747189928
2-Tail : p-value = 0.05205262573928032
------------------------------------------

No, you miss the point. The method used in the article was absolutely not sufficient to show anything other than degree of linear correlation. As stated at the beginning of this post, you admit you still not understand how R^2 and p-values (or your relevant test statistic) are associated, and how they are different from each other.

It is perfectly possible to obtain samples with extremely low R-squareds (like 0.00ish), that still easily pass linear F-tests. The reason behind this is despite a low degree of correlation, an extremely high sample size can shrink the relevant variances. R-squared does not take sample size into account at all (look at your formulae, the (n-1)'s cancel out.

The only thing behind your support is that you *could* use R to determine a p-value, because Chi-squared (or fischer) and R are so related. This would require modifying the values by the sample-size, at which point you could determine a "critical-R" for a specific alpha value (and specific sample size). Note that is not usually how statisticians think of things, but it is perfectly valid. However this is entirely not what you did. Your idea that R > .10 shows significance is flat out wrong, The mere fact that you (you personally that is, it is theoretically doable with a bit of arithmetic) cannot provide an alpha to this R threshold of yours pretty much proves it.

So basically, the only reason the article agrees with the correct answer is via dumb luck. The statistics cited are totally incorrect, which seems to be the point you're totally missing.