From covariates to confounding factors: the danger of having too many covariates

Clinical trials typically evaluate efficacy of experimental therapies in heterogeneous patient populations, as patient characteristics vary significantly. These patient characteristics might be prognostic factors that ultimately induce variability in clinical trial data. An imbalance in these factors between treatment groups at baseline will increase variability of the estimated treatment effect, ultimately compromising study power and decreasing the likelihood of demonstrating statistically significant differences.

As a simple example, clinical trials typically include patients within a wide age range, and – for many drugs – it is reasonable to expect that older patients may respond differently than younger patients. This inherent heterogeneity in patients will lead to increased data variability that is most commonly addressed by defining a baseline covariate.

Covariates enable scientists to determine the extent of patient improvement that is related to the treatment and not underlying baseline patient characteristics (e.g. age). In practice, those covariates are adjusted for in the statistical analysis using methods such as the analysis of covariance (ANCOVA).

Adjusting for baseline covariates is a common practice following the EMA and the FDA have guidances. The two key rules are:

  1. Covariates should not be affected by the treatment. As such, only covariates that are defined at baseline should be used.
  2. The covariates and the mathematical form of the model should be precisely defined a priori in the protocol or SAP.

With these key rules, there is no additional regulatory risk related to the use of covariates. Importantly, covariates efficiently could increase study power without increasing risk of type I error (false positive).

Covariates that are used to analyze and interpret clinical trial data can become confounding factors; indeed, this is one of the most basic issues with clinical trials. Let’s imagine a simple example of a study with two arms: treatment and placebo in which – by accident or poor design – all male patients are in the treatment group. The treatment effect will be impossible to dissociate from gender effect. The treatment effect will be poorly estimated and its associated variance will be high. Fortunately, this rarely occurs in well-randomized studies, but this illustrates how covariates can become confounding factors.

It is difficult to determine while covariates are confounding factors and which are not; in fact, to some extent, every covariate is at least a slight confounding factor with treatment. Beyond this, confounding factors could also arise from combinations of covariates, which can increase the variance of the estimated treatment effect. 

The EMA Guidance stipulates that only a few covariates should be used, due in part to the fact that increasing the number of covariates multiplies the chances of creating such confounding factors and can decrease the power of the study.

To conclude, using covariates will help to characterize patient variability and improve the precision of the estimated treatment effect. Adding too many covariates, however, increases the risk that they will become confounding factors with the exact opposite effect. One approach is to select the fewest number of covariates that are most likely to be strongly associated with the outcome.  This approach works well for simple situations but fails when considering more complex scenarios.  Placebo responsiveness, for example, is an inherent patient characteristic – similar to age and gender – that requires consideration of a myriad of measurable factors ranging from demographics to patient expectation and personality traits.  Defining a baseline covariate that describes placebo responsiveness requires a more sophisticated mathematical approach. Placebell©™defines each patient’s placebo responsiveness using a machine learning-based approach to define a composite covariate.