The Muddy Waters of single-arm trials

muddy_waters

Clinical trials in cancer tend to have some differences from other fields. One that really stood out to me,when I joined a cancer trials unit, was the number of single-arm trials that were conducted. They are rare in most other fields, but in cancer, very common. In a review that we’re currently conducting, in a specific subfield of cancer, over half of the trials that we have included so far are single-arm non-randomised studies. That is a lot! Given their prevalence in cancer, it seems important that all these single-arm studies are actually doing something sensible, and are providing useful information to inform and improve patient care.

There are some scenarios where a single-arm trial design is completely appropriate, especially in oncology. However, my view is that single-arm trial designs are often used where they are not appropriate, and their results are frequently misinterpreted. In this post I will attempt to explain why I think that. First, it is useful to summarise what the appropriate uses of single-arm trials are.

The traditional use in cancer is in Phase 2 trials; you have a drug that seems acceptably non-toxic (based on Phase 1 data), so the next question is “Does it do anything useful?” Often, a good way to test that is to use a single-arm trial in a population of patients who don’t have any alternative treatment. This might be patients whose cancer has not responded to existing interventions. In this situation you know that without intervention the outlook is very poor; without treatment, tumours will continue to grow. A tumour shrinking spontaneously, without treatment, is extremely rare. So, if you give a new treatment and observe that some tumours shrink, you can be confident that this is an effect of the therapy, without needing to compare them to a non-treated control arm. The key point is that we know, at least qualitatively, what would happen in a non-treated control group (i.e. there would be zero tumour responses). So a single-arm phase 2 trial can give you sufficient information that a drug is worth evaluating further; the next step might be a randomised trial, comparing with standard care, in a less severe group of patients.

If you’re not in a situation where you know what will happen to patients without treatment, things are much murkier – and this is a situation in which many single-arm trials are conducted. Very often, new drugs (or other treatments) are tested in patients who will also receive other treatments alongside the trial intervention, or receive an alternative therapy if not recruited to the trial. For example, the interventional new drug may be added to an existing chemotherapy regimen, or replace one element of a combination. In this situation, it is very difficult to predict what would have happened to the trial patients if they had not been recruited to the trial. As they would have received some intervention, you cannot confidently attribute any good outcomes to the new therapy. The question about efficacy is therefore a comparative one: do patients who receive the new therapy do better than those who do not? Randomisation is of course the usual approach here, and would seem like the best option, but it may be possible in some situations to get a good estimate of comparative efficacy from a single-arm trial. In rare circumstances, there may be a sufficiently good prognostic model that can be used to estimate the counterfactual outcomes of the trial participants under standard care.

Where there is no prognostic model, an alternative is to construct a “control” group from data sources external to the trial, with which the trial participants can be compared. There are many ways of doing this: these papers (1, 2) describe some of the best ways, for example using propensity scores to match treated patients with controls. The EMA have issued an excellent reflection paper on the use of single-arm trials, that summarises the biases that these trials are subject to(3) and which need to be addressed to obtain reasonable treatment effect estimates from single-arm trial designs. The key message from this paper (3) is that getting a good treatment effect estimate from a single-arm trial is hard, which is a major reason we use randomisation.

Many single-arm trials, maybe most, perform a comparison with a threshold value of an outcome. This is usually either a minimum level of clinical activity that is of interest, or supposedly represents the outcome that would be expected on standard care. For example, a trial might set a threshold of a response rate of 20% as the minimum acceptable level of clinical activity, and would employ some criterion to determine a “positive” result, based on whether the new drug was sufficiently likely to beat this criterion (for example, a Bayesian posterior probability of 80%, or a p-value of less than 0.2).

The problem is that many single arm trials have the stated aim of estimating efficacy, and perform a comparison with a single threshold value. This is not valid, because there is no guarantee that the threshold value will apply to the patients in the trial. Patients in a trial are not a random sample of all patients; there is always selection bias in the trial eligibility and recruitment processes. Patients are not chosen by some random sampling mechanism, and in fact there are multiple non-random selections made before patients are enrolled (the trial eligibility criteria, the doctors’ selections of patients to approach, cultural barriers to trial participation, etc etc). Thus, selection bias will affect the outcomes that are seen, and it may be hard to predict what direction it will operate in. With a binary outcome (such as tumour response) and an intervention that produces some responses, it is in theory possible, by patient selection, to get any response rate from 0% (by selecting only patients that will not respond) to 100% (by selecting only responders). The extremes may be unrealistic, but it may be fairly easy to shift a response rate from 30% to 50% with a bit of (conscious or unconscious) selection. In addition to that, other biases may be operating; for example, standard care may have improved through time, so a historical outcome rate might be worse than current patients would achieve, or diagnosis may have changed so that patients are diagnosed earlier in the course of their disease.

Arguably, it may be reasonable to use a single-arm design, even in situations where patients will receive other treatments and a comparative design seems more appropriate, if the aim is just to screen possible treatments and reject those that appear to do very little (though I’m not very convinced about this). A related issue with estimating efficacy from a single-arm trial is the apparent belief that there is a single estimate of “the” efficacy of the treatment that applies to all patients. But in reality, the outcomes will depend on the patients. We know this: there are many prognostic factors that are known to be related to outcomes, and we routinely divide patients into high, medium and low risk based on their characteristics. If outcomes are related to patients’ characteristics, as they virtually always are, then it makes no sense to refer to “the” estimate of the treatment’s efficacy. Usually, the value that is assumed is some sort of average over all of the patients who are treated, but because of selection bias, this is unlikely to apply to the patients in the trial.

So those are my reasons for thinking that single-arm trials are often used inappropriately, over interpreted, and reach unjustified conclusions. I’m planning to give some concrete examples in future posts, so hopefully I’ll be able to post some more in the near future.

References

Thorlund K, Dron L, Park JJH, Mills EJ. Synthetic and External Controls in Clinical Trials - A Primer for Researchers. Clin Epidemiol. 2020 May 8;12:457-467.
Schmidli H et al. Beyond Randomized Clinical Trials: Use of External Controls. Clinical Pharmacology and Theraeutics 2020; 107(4):
EMA. Reflection paper on establishing efficacy based on single arm trials submitted as pivotal evidence in a marketing authorisation: Considerations on evidence from single-arm trials. https://www.ema.europa.eu/en/documents/scientific-guideline/reflection-paper-establishing-efficacy-based-single-arm-trials-submitted-pivotal-evidence-marketing-authorisation-application_en.pdf

Clinical trial WTF

Single-arm non-inferiority? Does that make any sense?