Friday, March 24, 2006

To P or Not to P: Why Use a P Value, Anyway?

What is a P Value?

When doing a study, research problem or trying to decide if some finding is significant - statisticians use a method called the "P value". P is short for probability: the probability of getting something more extreme than your result, when there is no effect in the population. To get statistical significance, you assume there is no effect in the population. Then you see if the value you get for the effect in your sample is the sort of value you would expect for no effect in the population. If the value you get is unlikely for no effect, you conclude there is an effect, and you say the result is "statistically significant".

Andrew J. Vickers, PhD, in Medscape, has written an article that might be of interest to many physicians and people interested in trying to decide whether or not a particular study is of any true value. We hear a lot these days of "evidence based studies" with findings that are statistically significant and not "anecdotal". This might apply to the use of a treatment modality such as hyperbaric oxygenation for conditions that might show a benefit for several patients but after rigorous statistical analysis - fail to have a significant "P value" and not really have a beneficial effect across the board.

Dr. Vickers writes humorously of his statistical obsessions in a manner to point out the ramifications of a statistically significant P value. The article is paraphrased below.

"Going home each night, he has a choice between a busy road or winding through the backstreets of Brooklyn. Being statistically obsessed, he records how long each route takes on a number of occasions and calculates means and standard deviations. He needs to know the quickest route and conducts a statistical analysis of his times: it turns out that the travel time for the busy road is shorter, but the difference between routes is not statistically significant (P = .4). Nonetheless, it would still seem sensible to take what is likely to be the quicker route home, even though it hasn't been proved that it will get him there fastest.

So, he now decides to get more information and spends 2 years randomly selecting a route home and recording times. After analyzing the data, there is strong evidence that going home via the busy road is faster (P = .0001), but not by much (it saves me 57.3 seconds on average). So he decides that, he'll wind along the backstreets, simply because it is a more pleasant journey and well worth the extra minute. Pragmatic?

If P values should determine our actions, as most think; in the case of a drug or hyperbaric clinical trial, for example, we say: "P < .05: Rx effective; P ≥ .05: Rx not effective." Yet, the bicycle trip home (above) shows the opposite: he chose the busy road when P was .4 but not when P was .0001. This suggests we need to think a little harder about what P values are and how we should use them.

The most important thing to remember about P values is that they are used to test hypotheses. This sounds obvious, but it is all too easily forgotten. A good example is the widespread practice of citing P values for baseline differences between groups in a randomized trial. The hypothesis being tested here is whether there are real differences between groups. Yet we know that groups were randomly selected, so any differences in characteristics such as age or sex must be due to chance alone.

Science is often said to be about checking ideas, but in many cases this is not what we want to do at all. When he needed to get home quickly, he wasn't interested in proving which was the quickest way home, he just needed to figure out which route would do what he needed -to get him to an appointment on time (Pragmatism?). Moreover, even when we do want to test ideas, the conclusion is often an insufficient reason for action. He eventually proved that using the busy road was quickest but decided to choose a different route on the basis of considerations -- pleasure and quality of life -- that formed no part of the hypothesis test.

An even more difficult problem is when our P value is > .05, that is, when we have failed to prove our hypothesis. This is often interpreted as proof that our hypothesis is false. Using this interpretation might withhold beneficial therapy to many. Such an interpretation is not only incorrect, but Dr. Vickers feels that it can also be dangerous; he will discuss this in a future column."

Andrew J. Vickers, PhD, Assistant Attending Research Methodologist, Memorial Sloan-Kettering Cancer Center, New York, NY

Dr. Omar Sanchez, Buenos Aires physician, has the following quotation that seems apropos here:
“Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital.”

He also sends us a citation for an article in the Spectrum, a journal of the National Cancer Institute titled "What's the Rush? The Dissemination and Adoption of Preliminary Research Results";98/6/372

This article emphacizes the gamble of taking early returns from studies and applying the good results as a treatment modality.