Lies, damned lies, and statistics. Hacking the P-value.
In order to bring more out of the effort that are invested in the EuroNet Newsletters, we will also be featuring articles as blog posts on a regular basis, so that people that don’t read the entire newsletter all at once (there might be a few;)) can still read (most of) it across time.
Also, don’t forget that the next issue is in the making and you can be part of it!
Hoping to inspire you, take a read at Damiano Cerasuolo’s article “Lies, damned lies, and statistics. Hacking the P-value“, from the last Euronet Newsletter:
In chapters from ‘my Autobiography’ Mark Twain says: “there are three kinds of lies: lies, damned lies, and statistics.” [1] Twain’s statement about the use and the misuse of statistics couldn’t have been more farsighted. On February 2016, 177-year-old American Statistical Association (ASA) released a statement [2] (followed by a scientific publication [3]) issuing guidelines of p value to conduct and interpret quantitative science. P-value is misused. P-value is usually used to test (and possibly dismiss) the “null hypothesis”. If the statistical test of two groups or pair of characteristics results in a P-value below 0.05, the null hypothesis is usually dismissed (depending on the level of significance intended): there is a relationship between the two groups (or the two characteristics) that is not attributable to mere chance. On more practical basis, we want to test the association between two factors, for example age and injectable drug use in two comparable groups issued from a specific population. If our statistical test results in a p-value of less than 0.05, the association between the two factors is usually statistically significant. However, a significant P-value doesn’t provide any information about the strength of the relationship between the two factors, neither about its direction. Criticism of the p-value is not new. On February 25th, 2015, the journal Basic and Applied Social Psychology issued an editorial [4] banning P-values and confidence intervals from all future papers. Undoubtedly these drastic steps could seem counterproductive but they have the merit to start the debate. Without proposing a ban of P-value results, the American Statistical Association observes “good statistical practice is an essential component of good scientific practice”. Meanwhile its executive director, Ron Wasserstein, explains that “wellreasoned statistical arguments contain much more than the value of a single number and whether that number exceeds an arbitrary threshold. The ASA statement is intended to steer research into a ‘post p<0.05 era’”. In other words, P-value should not substitute scientific reasoning but it should come together with numerical and graphical summaries of data, interpretation and understanding of the phenomenon under study and its results in context. Wrong P-value reporting is helping “bad” science being published: without information and with only P-value results, nonsignificant data can easily make its way through publication. Stanford metaresearcher John Ioannidis and colleagues found an increasing number of articles reporting P-values over time[5]. Almost all articles and abstracts with P-values reported statistically significant results while confidence intervals, Bayes factors, or effect sizes were rarely mentioned. The explanation to this phenomenon has already its own name: publication bias (for statistical significance). Daniël Lakens in a 2015 paper published by PeerJ [6] defines publication bias as ‘tendency to publish statistical significant results, both because authors are more likely to submit these results and reviewers and editors to evaluate more positively these manuscripts’. In the way publication bias sacrifices reproducibility (the ability to recompute results) and replicability (the chances other experimenters will achieve a consistent result) to publication itself[7], addressing this issue is urgent. Statistics are a core part of Public Health and although P-value debate could be perceived as “pure statistics”, it is not. Public Health ranges from epidemiology to hygiene, from biostatistics to health promotion. It is not a unitary, monolithic discipline and it requires a multidisciplinary approach to the “P-value gate”, in order to provide the best answers to each subdiscipline. Specialists in public health should join the debate, proposing solutions.
Damiano Cerasuolo
Euronet France
———
[ 1 ] https://en.wikipedia.org/wiki/Lies,_damned_lies,_and_statistics [2] http://www.amstat.org/newsroom/pressreleases/P-ValueStatement.pdf [3] Ronald L. Wasserstein , Nicole A. Lazar. The ASA’s statement on p-values: context, process, and purpose. The American Statistician. http://d x.doi.org/10.1080/00031305.2016.1154108 [4] David Trafimow & Michael Marks (2015) Editorial, Basic and Applied Social Psychology, 37:1, 1-2, DOI: 10.1080/01973533.2015.1012991 [5] Chavalarias D, Wallach J, Li A, Ioannidis JA. Evolution of Reporting P Values in the Biomedical Literature, 1990-2015. JAMA. 2016;315(11):1141-1148. doi:10.1001/jama.2016.1952. [6] Lakens D. (2015) On the challenges of drawing conclusions from p-values just below 0.05. PeerJ3:e1142 https://doi.org/10.7717/peerj.1142 [7] Jeffrey T. Leek and Roger D. Peng Opinion: Reproducible research can still be wrong: Adopting a prevention approach. PNAS 2015 112 (6) 1645-1646; doi:10.1073/pnas.1421412111blog picture from:http://emcrit.org/pulmcrit/demystifying-the-p-value/