It seems like we hear a lot about clinical trial results that are “significant.” Yet, in many cases it feels like the outcomes of certain diseases really are not changing all that much. ASH abstracts will be out in the next few weeks and it is always a time for news. Unfortunately, much of the news is poorly reported because the language of science is not always the same as the language of the rest of the world. Nowhere is that more important as the word "significant." When the headlines scream "significant" it really helps to understand what is actually being communicated.
We often trumpet a study that has achieved a level of improvement to be considered “statistically significant” yet what that really means is that when two or more interventions were compared, the difference in outcome between the interventions is unlikely to have occurred by chance (or more accurately stated, likely to have occurred by random chance <5% of time should the study be repeated multiple times under similar circumstances).
The problem with this definition is that a small difference between interventions (say an improvement in response rate from 33% to 38% or improvement in survival from 11 months to 12 months) can be “statistically significant” if it is observed in a large enough population whereas most patients might say – “who cares if it is such a small difference.” This is a key point, so I want to make sure it is clear. If you see a 5% difference in a study population of 70 patients, you might agree that there is a good chance the difference is purely random. On the other hand, if you see a 5% difference in a study population of 10,000 patients – chances are that is a real / reproducible difference. In the latter case, we would call that “statistically significant” even if the patient says, “so what.” Take a 50% difference in outcome however, and even if it is observed in a small population, it is a big enough difference to make you think it isn’t a random chance observation.
When we design studies we go through an exercise known as “powering the study” which enables us to project a difference between two interventions and then calculate how many patients we will need to study to enrolled to conclude that our difference is “statistically significant.” If we project that a new treatment improves response rate from 20% to 80% that is a huge number and we need few patients to prove our point. Similarly if we double the duration of response with a new treatment – that doesn’t take many patients either.
When the difference is small though, the studies have to get very large. That is true when we already have very effective treatments (hodgkin’s disease) and you don’t have a ton of room for improvement (ie, can’t cure 130% of patients) or the incremental benefit is small (different hormone manipulations in breast cancer improving outcome by 1-2%). One good clue to how meaningful a result is is simply to look at how many patients were enrolled. If you have > 500 patients per arm, chances are the improvement is fairly modest.
Patients want “clinically meaningful” results such as “Dad survived 6 years instead of 6 months with his pancreatic cancer” or “everyone who takes the new drug feels better and responses are dramatically improved.” Who could blame patients for wanting this.
Over the past 50 years most of our advances have fallen into the “incremental gain” category. This is where we had huge studies to show that we could prolong pancreatic cancer survival by two weeks on average and this was trumpeted as “statistically significant” – yuck! We’ve had a bunch of these recently in colon cancer. Seethis link for a very good article about this.
Sadly, the route to approval of drugs requires “statistically significant” even if it is not “clinically significant.” Of course, a new drug is going to be very expensive and if you have to take $90,000 of treatment to prolong life by several months, you might think twice if you were paying for it (provenge in prostate cancer). The British have a system that measures “clinical significance” as part of their approval process. I have to say that I can see some logic there – please look at this link for more.
I am pleased that many of the experimental treatments in CLL fit the category of “clinically meaningful.” It is important to note that randomized studies to measure the magnitude of difference have not been completed with ibrutinib, CAL-101/GS-1101, ABT-199, GA-101 and so forth – but they are underway. Many thought leaders feel these agents will be both “clinically significant” and “statistically significant” to boot. Hopefully we will gain broader access to these soon and patients will live longer, happier lives.
ASH abstracts are just around the corner. You will probably hear a lot about “significant” results. Pay close attention to the use of the terms “statistically significant” and “clinically significant” – they are different. Look for how large the sample size is in the study. Lymphoid studies tend to be smaller than breast / lung studies. A big lymphoma study or CLL study might be >500 patients. Keep in mind that you cannot define “statistically significant” unless you are comparing at least two groups – so they are either randomized studies or looking at subgroups within a larger study.
Hopefully we will have a lot of studies to discuss that really improve the quality of lives for patients with these disease.
Statistic vs real significance
drug cost vs efficacy