An
open letter to Editors of journals, Chairs of departments, Directors of funding
programs, Directors of graduate training, Reviewers of grants and manuscripts,
Researchers, Teachers, and Students:
Statistical methods have been evolving
rapidly, and many people think it’s time to adopt modern Bayesian data analysis
as standard procedure in our scientific practice and in our educational
curriculum. Three reasons:
1. Scientific
disciplines from astronomy to zoology are moving to Bayesian data analysis. We
should be leaders of the move, not followers.
2. Modern Bayesian
methods provide richer information, with greater flexibility and broader
applicability than 20th century methods. Bayesian methods are intellectually
coherent and intuitive. Bayesian analyses are readily computed with modern
software and hardware.
3. Null-hypothesis significance
testing (NHST), with its reliance on p
values, has many problems. There is little reason to persist with NHST now that
Bayesian methods are accessible to everyone.
My
conclusion from those points is that we should do whatever we can to encourage
the move to Bayesian data analysis. Journal editors could accept Bayesian data analyses,
and encourage submissions with Bayesian data analyses. Department chairpersons could
encourage their faculty to be leaders of the move to modern Bayesian methods. Funding
agency directors could encourage applications using Bayesian data analysis. Reviewers
could recommend Bayesian data analyses. Directors of training
or curriculum could get courses in Bayesian data analysis incorporated into the
standard curriculum. Teachers can teach Bayesian. Researchers can use Bayesian
methods to analyze data and submit the analyses for publication. Students can get
an advantage by learning and using Bayesian data analysis.
The goal is encouragement of Bayesian methods,
not prohibition of NHST or other methods. Researchers will embrace Bayesian analysis
once they learn about it and see its many practical and intellectual advantages.
Nevertheless, change requires vision, courage, incentive, effort, and
encouragement!
Now to expand on the three reasons stated
above.
1.
Scientific disciplines from astronomy to zoology are
moving to Bayesian data analysis. We should be leaders of the move, not
followers.
Bayesian methods are revolutionizing science.
Notice the titles of these articles:
Bayesian
computation: a statistical revolution.
Brooks, S.P. Philosophical Transactions of the Royal Society of London. Series
A: Mathematical, Physical and Engineering Sciences, 361(1813), 2681, 2003.
The
Bayesian revolution in genetics. Beaumont, M.A. and Rannala, B. Nature
Reviews Genetics, 5(4), 251-261, 2004.
A
Bayesian revolution in spectral analysis. Gregory, PC. AIP Conference
Proceedings, 557-568, 2001.
The hierarchical
Bayesian revolution: how Bayesian methods have changed the face of marketing
research. Allenby, G.M. and Bakken, D.G. and Rossi, P.E. Marketing Research, 16, 20-25,
2004
The future of
statistics: A Bayesian 21st century.
Lindley, DV. Advances in Applied Probability, 7, 106-115,
1975.
There are many other articles that make analogous
points in other fields, but with less pithy titles. If nothing else, the titles
above suggest that the phrase “Bayesian revolution” is not an overstatement.
The Bayesian revolution spans many fields of
science. Notice the titles of these articles:
Bayesian
analysis of hierarchical models and its application in Agriculture. Nazir, N., Khan, A.A., Shafi, S.,
Rashid, A. InterStat, 1, 2009.
The Bayesian
approach to the interpretation of archaeological
data. Litton,
CD & Buck, CE. Archaeometry, 37(1), 1-24, 1995.
The promise of
Bayesian inference for astrophysics. Loredo TJ. In: Feigelson ED, Babu GJ, eds.
Statistical Challenges in Modern Astronomy. New York: Springer-Verlag; 1992, 275–297.
Bayesian methods in
the atmospheric sciences. Berliner LM, Royle JA, Wikle CK, Milliff RF. In: Bernardo JM, Berger JO, Dawid AP, Smith AFM, eds. Bayesian Statistics 6:
Proceedings of the sixth Valencia international meeting, June 6–10, 1998.
Oxford, UK: Oxford University Press; 1999, 83–100.
An introduction to
Bayesian methods for analyzing chemistry
data:: Part II: A review of applications of Bayesian
methods in chemistry. Hibbert, DB and Armstrong, N. Chemometrics and
Intelligent Laboratory Systems, 97(2), 211-220, 2009.
Bayesian methods in conservation biology. Wade PR. Conservation Biology, 2000, 1308–1316.
Bayesian inference
in ecology. Ellison AM. Ecol Biol 2004, 7:509–520.
The
Bayesian approach to research in economic
education. Kennedy, P. Journal
of Economic Education, 17, 9-24, 1986.
The
growth of Bayesian methods in statistics and economics
since 1970. Poirier, D.J. Bayesian Analysis, 1(4),
969-980, 2006.
Commentary:
Practical advantages of Bayesian analysis of epidemiologic
data. Dunson
DB. Am J Epidemiol 2001,
153:1222–1226.
Bayesian inference
of phylogeny and its impact on evolutionary
biology. Huelsenbeck
JP, Ronquist F, Nielsen R, Bollback
JP. Science 2001, 294:2310–2314.
Geoadditive Bayesian models for
forestry defoliation data: a case
study. Musio, M. and Augustin,
N.H. and von Wilpert, K. Environmetrics. 19(6), 630—642, 2008.
Bayesian statistics
in genetics: a guide for the
uninitiated. Shoemaker, J.S. and
Painter, I.S. and Weir, B.S. Trends in Genetics, 15(9), 354-358, 1999.
Bayesian
statistics in oncology. Adamina, M. and Tomlinson, G. and Guller, U. Cancer, 115(23), 5371-5381, 2009.
Bayesian
analysis in plant pathology. Mila, AL and Carriquiry,
AL. Phytopathology, 94(9), 1027-1030, 2004.
Bayesian analysis
for political research. Jackman S. Annual Review of
Political Science, 2004, 7:483–505.
The list above could go on and on. The point
is simple: Bayesian methods are being adopted across the disciplines of
science. We should not be laggards in utilizing Bayesian methods in our
science, or in teaching Bayesian methods in our classrooms.
Why are Bayesian methods being adopted across
science? Answer:
2.
Bayesian methods provide richer information, with greater
flexibility and broader applicability than 20th century methods. Bayesian
methods are intellectually coherent and intuitive. Bayesian analyses are
readily computed with modern software and hardware.
To explain this point adequately would take
an entire textbook, but here are a few highlights.
* In NHST, the data collector must pretend to
plan the sample size in advance and pretend not to let preliminary looks at the
data influence the final sample size. Bayesian design, on the contrary, has no
such pretenses because inference is not based on p values.
* In NHST, analysis of variance (ANOVA) has
elaborate corrections for multiple comparisons based on the intentions of the
analyst. Hierarchical Bayesian ANOVA uses no such corrections, instead
rationally mitigating false alarms based on the data.
* Bayesian computational practice allows easy
modification of models to properly accommodate the measurement scales and
distributional needs of observed data.
* In many NHST analyses, missing data or
otherwise unbalanced designs can produce computational problems. Bayesian
models seamlessly handle unbalanced and small-sample designs.
* In many NHST analyses, individual
differences are challenging to incorporate into the analysis. In hierarchical
Bayesian approaches, individual differences can be flexibly and easily modeled,
with hierarchical priors that provide rational “shrinkage” of individual
estimates.
* In contingency table analysis, the
traditional chi-square test suffers if expected values of cell frequencies are
less than 5. There is no such issue in Bayesian analysis, which handles small or
large frequencies seamlessly.
* In multiple regression analysis,
traditional analyses break down when the predictors are perfectly (or very
strongly) correlated, but Bayesian analysis proceeds as usual and reveals that
the estimated regression coefficients are (anti-)correlated.
* In NHST, the power of an experiment, i.e.,
the probability of rejecting the null hypothesis, is
based on a single alternative hypothesis. And the probability of replicating a
significant outcome is “virtually unknowable” according to recent research. But
in Bayesian analysis, both power and replication probability can be computed in
straight forward manner, with the uncertainty of the hypothesis directly
represented.
* Bayesian computational practice allows easy
specification of domain-specific psychometric models in addition to generic
models such as ANOVA and regression.
Some people may have the mistaken impression
that the advantages of Bayesian methods are negated by the need to specify a prior
distribution. In fact, the use of a prior is both appropriate for rational
inference and advantageous in practical applications.
* It is inappropriate
not to use a prior. Consider the well
known example of random disease screening. A person is selected at random to be
tested for a rare disease. The test result is positive. What is the probability
that the person actually has the disease? It turns out, even if the test is
highly accurate, the posterior probability of actually having the disease is
surprisingly small. Why? Because the prior probability of the
disease was so small. Thus, incorporating the prior is crucial for
coming to the right conclusion.
* Priors are explicitly specified and must be
agreeable to a skeptical scientific audience. Priors are not capricious and
cannot be covertly manipulated to predetermine a conclusion. If skeptics
disagree with the specification of the prior, then the robustness of the conclusion
can be explicitly examined by considering other reasonable priors. In most
applications, with moderately large data sets and reasonably informed priors,
the conclusions are quite robust.
* Priors are useful for cumulative scientific
knowledge and for leveraging inference from small-sample research. As an
empirical domain matures, more and more data accumulate regarding particular
procedures and outcomes. The accumulated results can inform the priors of
subsequent research, yielding greater precision and firmer conclusions.
* When different groups of scientists have
differing priors, stemming from differing theories and empirical emphases, then
Bayesian methods provide rational means for comparing the conclusions from the
different priors.
To summarize, priors are not a problematic
nuisance to be avoided. Instead, priors should be embraced as appropriate in rational
inference and advantageous in real research.
If those advantages of Bayesian methods are
not enough to attract change, there is also a major reason to be repelled from
the dominant method of the 20th century:
3.
20th century null-hypothesis significance testing (NHST),
with its reliance on p values, has many severe problems. There is little reason
to persist with NHST now that Bayesian methods are accessible to everyone.
Although there are many difficulties in using
p values, the fundamental fatal flaw
of p values is that they are ill
defined, because any set of data has many different p values.
Consider the simple case of assessing whether
an electorate prefers candidate A over candidate B. A quick random poll reveals
that 8 people prefer candidate A out of 23 respondents. What is the p value of that outcome if the
population were equally divided? There is no single answer! If the pollster
intended to stop when N=23, then the p
value is based on repeating an experiment in which N is fixed at 23. If the
pollster intended to stop after the 8th respondent who preferred candidate A, then the p
value is based on repeating an experiment in which N can be anything from 8 to
infinity. If the pollster intended to poll for one hour, then the p value is based on repeating an
experiment in which N can be anything from zero to infinity. There is a
different p value for every possible
intention of the pollster, even though the observed data are fixed, and even
though the outcomes of the queries are carefully insulated from the intentions
of the pollster.
The problem of ill-defined p values is magnified for realistic
situations. In particular, consider the well-known issue of multiple
comparisons in analysis of variance (ANOVA). When there are several groups, we
usually are interested in a variety of comparisons among them: Is group A significantly different from group B? Is group C different
from group D? Is the average of groups A and B different from the average of
groups C and D? Every comparison presents another opportunity for a false
alarm, i.e., rejecting the null hypothesis when it is true. Therefore the NHST
literature is replete with recommendations for how to mitigate the “experimentwise” false alarm rate, using corrections such as
Bonferroni, Tukey, Scheffe, etc. The bizarre part of this practice is that the
p value for the single comparison of
groups A and B depends on what other groups you intend to compare them with.
The data in groups A and B are fixed, but merely intending to compare them with
other groups enlarges the p value of
the A vs B comparison. The p value grows because there is a different space of possible
experimental outcomes when the intended experiment comprises more groups.
Therefore it is trivial to make any comparison have a large p value and be nonsignificant;
all you have to do is intend to compare the data with other groups in the
future.
The literature is full of articles pointing
out the many conceptual misunderstandings held by practitioners of NHST. For
example, many people mistake the p
value for the probability that the null hypothesis is true. Even if those
misunderstandings could be eradicated, such that everyone clearly understood
what p values really are, the p values would still be ill defined.
Every fixed set of data would still have many different p values.
To recapitulate: Science is moving to
Bayesian methods because of their many advantages, both practical and
intellectual, over 20th century NHST. It is time that we convert our research
and educational practices to Bayesian data analysis. I hope you will encourage
the change. It’s the right thing to do.
John K. Kruschke, Revised
14 November 2010, http://www.indiana.edu/~kruschke/