Article for Researchers. The p values. What they are/aren’t. Avoiding mistakes in Research.

Darko Medin
5 min readApr 22, 2024

--

There are a lot of misconceptions to what p values are and what is their true importance. In this article, I will try to clarify to the finest details for anyone from Research areas such as Life Science how not to use the p values and how to used the accurately.

  1. The p values do not represent the overall significance of the result. Those with no statistical expertise might think to base their complete interpretation of the result, predictor, correlation, risk factor or any other statistical and biological parameter, solely based on the p value. Don’t do this. This is one of the biggest mistake one can make in Research. The p values represent just a segment of the overall significance of the results, sometimes its important, sometimes its not depending on the context. Lets see what the p values are.
  2. The p values are part of the hypothesis testing frameworks. In this sense its important to differentiate the Hypothesis tests as a statistical tests and the Research hypothesis within your study / publication / other research. Statistical hypothesis is not the same concept as a hypothesis one defines as lets say a Life Science Research study hypothesis. Although these concepts are different, the main goal of the Statisticians is often to try to align as much as possible the Research hypothesis with the Statistical hypothesis (from statistical hypothesis tests). Having said that, its stil important to differentiate the Research and Statistical hypotheses as different concepts.
  3. The p values tell more about the trustworthiness of the result by contradiction of a null statistical hypothesis if one was to repeat the study many times then about the actual magnitude of the parameters observed. The p values tell a story about the sampling and study repetitions. In other words, if one was to repeat the study many times, one would get the result, lets say a difference between two groups at least as extreme as observed 95% of the time. This would correspond to the p value of 0.05. so the lower the p value the higher the percentage of the result at least as extreme the one observed in the data. That's where its main definition that the p values is actually a probability on the long term repetition of sampling to obtain the result at least as extreme as observed.
  4. Basing Research conclusions solely on the p values? Wrong! (most of the time). As mentioned above, Statistical hypothesis and Research hypothesis are different concepts. Aslo, domain knowledge, magnitude of the effects in research, context around the data, study size, power, confounders, study design, references, time aspect, causality analysis, mechanistic aproaches, explainability, understanding and many other aspects are needed to make a Research study conclusion. Now you can see how wrong would it be to base a Research conclusoin based solely on the statistical significance concept such as a p value. Having said that, the p values can indeed be a part, but just the part of the whole Research conclusion making mechanism (re-read point 3 if needed).
  5. The p value is not the probability of a random result. In relation to the point before this one, it can clearly be seen that the p value is not the probability of a specific result, actually its a probability of at least as extreme result under a very specific assumptions. One assumption is that the null hypothesis is true and one is still observing a specific results at least as extreme as observed, lets say as example different than the null hypothesis. Finally we cant talk about probability of a result but rather a confidence, a level of confidence we would have if we were to repeat the study infinite times.
  6. Statistical significance is a universal concept, but the p values are not a universal concept, even tough these two terms are frequently interconnected and even used as synonyms. While these two terms are closely related, there is a difference. Statistical significance is a general term used to describe a principle, while the p value is a metric used to evaluate that statistical principle we call statistical significance.
  7. The p values are calculated differently, so not every p value is interpreted the same way. As example a simple Student’s t-test uses one formula to calculate the p value and the Fisher’s exact test uses another formula and the Chi-sqaure independence test uses another formula. Since the ways the p values are calculated using different ways, their interpretation is different and is bound to the method itself. Further the calculations of p values in different hypothesis tests are also dependent on different statistical assumptions. Having said that, interpreting the p values as a single concept, just ‘the p value’ would be wrong and each p value should be interpreted in combination with its context, method, formula and the data domain.
  8. The null hypothesis assumption. Although the null hypothesis may vary, the p values are most of the time designed to assume a parameter, so assuming the null is true what is the probability of getting the oposite given the long run repetitions of the experiment. This is one of the reasons why p values are not probability, we are actually assuming something, when assumption is made, there is not probability because the parameter is determined / assumed.

9. The p value alternatives? The most logical alternative to the p values is the 95% confidence interval. Confidence interval will tell the similar Frequentist story about the statistical significance, but with more explainable way. With confidence intervals one may actually delve into magnitudes (dependent on some assumptions) and / or thresholds, which is not that intuitive for p values.

10. Summarized perspective. While p values may be important in the results interpretation in terms of statistical significance (not overall research significance), its not a good practice to base the whole result interpretation based on the p values. Other perspectives, such as domain knoweldge, context, magnitude of the result shift, effects and the mechanistic aproach should be taken into consideration in addition to p values. Dont mix p values with the probability of a result from your sampled data and if using them you should understand that p values are based on the potential long run sampling and not a single sample. Empirical testability of a p values depend on the long frequency sampling and have specific assumptions. Using the term confidence instead of probability is more apropriate for uncertainty evaluation in Frequentist framework.

--

--

Darko Medin
Darko Medin

Written by Darko Medin

Biostatistics Consultant / Data Scientist / Artificial Intelligence, Educator, darkomedin.com

No responses yet