The uses and abuses of meta-analysis
Bruce G Charlton
Charlton BG. The uses and abuses of meta-analysis. Family Practice 1996; 13: 397-401.
Abstract
Meta-analysis is a quantitative process of summary and interpretation which involves pooling
information from independent studies concerning a single theme in order to draw conclusions.
Greatly increased employment of meta-analysis is currently being advocated for clinical
and policy decision making. However, the prestige of meta-analysis is based upon a false
model of scientific practice. Interpreting empirical research is an extremely complex activity
requiring clinical and scientific knowledge of the field in question; and teams of professional
'meta-analysts' with a primary skill base in information technology and biostatistics
cannot take over this role. Meta-analysis is not a hypothesis-testing activity, and cannot
legitimately be used to establish the reality of a putative hazard or therapy. The proper
use of meta-analysis is to increase the precision of quantitative estimates of health states
in populations. If used to estimate an effect, the reality of that effect should have been
established by previous scientific studies. But the summary estimate from a meta-analysis
can only be directly applied to a target population when the 'meta-protocol' and 'meta-population'
match the target situation in all relevant particulars. These constraints can rarely
be satisfied in practice, so the results of meta-analysis typically require adjustment—which
is a complex, assumption-laden process that negates many of the statistical power advantages
of a meta-analysis. Lacking any understanding or acknowledgement of the need for
adjustment, most meta-analyses must be regarded as abuses of the technique.
Introduction
Meta-analysis may conveniently be defined as a quantitative method of pooling information from independent studies concerning a single theme in order to draw conclusions. It is a two-stage process of summary and interpretation.
Opinion regarding the technique ranges between extremes of approbation and disdain. Many commentators agree with Olkin that a meta-analysis of randomized trials constitutes the best form of evidence regarding therapeutic effectiveness.(1) Others have argued that it is motivated by a quasi-alchemical urge to transmute the base metal of inadequate data into the gold (standard) of validated fact, suggested that it is mostly a rather mundane and second-rate kind of intellectual activity and undeserving of high prestige, or simply erupted ''meta-analysis—schmeta-analysis!"(2,3,4)
I will argue that the critics of meta-analysis are closer to the truth than are the evangelists. Meta-analysis has its uses, and may occasionally be valid and applicable to real clinical situations, but these circumstances are so rare that most published instances of the technique must be regarded as abuses.
Meta-analysis based on a false model of
science
All commentators emphasize the difficulty of performing a valid meta-analysis, but the reasons given usually reveal a false model of scientific practice.(5,6) Meta-analysis is often stated to be necessary due to the sheer amount of data generated by present-day research.(1,7) Scientific practice is implied to involve a process of pooling or combining evidence from independent studies, then drawing conclusions based on the weight of evidence. If this were the case, then summarization would indeed be crucial and valid inference would become more difficult as the volume of research increased. This justification for overviews and meta-analyses is principally one of enabling increased efficiency in data assimilation.(8) But this description of theoretical science is false.
In reality, the theoretical practice of science draws upon evidence from studies judged to be both relevant and valid—such studies are seldom common and usually well known to practitioners. This highly selected evidence is then taken into account in constructing and testing theoretical models which can be tested against experiment and observation.(9) Most would-be evidence tends to be judged irrelevant to this process, and is deservedly ignored—certainly bad evidence is not pooled with the good.
The ingredients which make up this process of qualitative judgement and inference have never adequately been described in explicit terms, and scientific practice includes much knowledge that is tacit, and implicit, learned by apprenticeship to other scientists and from experience working in the field. It can, however, be asserted with a high degree of certainty that the scientific process is not primarily a statistical one based upon summarization and combination of all relevant data.(3, 11)
Implicit assumptions of meta-analysis
Proponents of meta-analysis make much of the 'objectivity' of the technique, which derives from the explicit nature of its procedures when compared with most editorials, reviews and commentaries.(6, 12) The sheer quantity and range of sources of the cited literature in a meta-analysis may be very impressive. This is achievable partly because of advances in computer systems of information retrieval, but mostly by the employment of full-time research assistants whose job is to hand-search journals, network among researchers and (by other labour-intensive means) endeavour to unearth recondite and far-flung publications and projects. (1, 7, 13)
The accumulation of data into one place which precedes the statistical manipulations of meta-analysis is frequently unprecedented in a given field. This creation of a complete catalogue may be valuable in itself, especially if it reveals an obvious consistency or pattern to the data which was not previously noticed (although such an oversight is unlikely in a mature scientific discipline). Some authors regard this activity of 'overviewing' evidence as contributing most of the value of meta-analysis, and have suggested that analysis should not go further than identifying a qualitative consistency of results across relevant studies.(14) There is no methodological objection to this kind of elaborate and expensive literature survey, but when unaccompanied by original thought it constitutes a somewhat mediocre activity which bears the same relation to creative science that an undergraduate dissertation does to a PhD thesis.
However, the defining feature of meta-analysis is not enumeration but interpretation and proponents of meta-analysis claim that it can perform this key task of selection and analysis of independent studies by means of algorithmic procedures and statistical summarization.
Meta-analysis makes the underlying assumption that when the results of relevant studies differ, the true value lies 'latent' within the existing data but concealed from investigators: firstly, by their failure to overview the whole data set (including unpublished studies); second, by excessive random error in studies examined one at a time (due to studies containing too few subjects); and third, by the lack of an optimal arrangement of evidence. In effect, the 'scientific truth' is conceptualized as a pattern that, once revealed, is unambiguous in its relevance and applicability so that the implications of research are transparent to any observer.
Meta-analysis therefore assumes that the diversity (or 'heterogeneity') among relevant research studies is randomly distributed around the 'true' value, so that errors in one direction in one study will tend to be balanced by errors in the other direction in other studies and therefore that appropriate statistical pooling and averaging will tend to produce an error-free (or at least error-reduced) estimate of the underlying, unbiased, 'true' value. Meta-analysis is thus indirectly but crucially predicated on a view of scientific truth as social consensus.
But real scientific practice makes no such assumption about the random distribution of error between (or within) studies. Indeed, a more plausible assumption would be that most investigators tend to make the same errors in the same direction, and only a minority of the best scientists will perform studies to the highest standard. Instead of seeking consensus, the social structures of science have the effect (albeit an imperfect one) of subjecting studies to critical appraisal by the peer group, in order to winnow the wheat from the chaff.
The production of scientific knowledge is a process closer to 'trial by ordeal' than trial by opinion poll.
Meta-analysis usurps theoretical science
The meta-analytic view of science leads to an assertion that the relevant techniques for understanding evidence are essentially informational and statistical.
Therefore, meta-analyses tend to be organized, performed and published by teams with disciplinary backgrounds in epidemiology, computing and biostatistics— only secondarily supplemented by advice from workers in the substantive field being overviewed. This is in sharp contrast to the specific scientific and clinical expertise and experience considered a prerequisite for the actual performance of primary medical research.
The bizarre result is that meta-analysis implies that theoretical and empirical science should be done by two different sets of people with different disciplinary abilities. In effect, empirical research is to be done by scientists and clinicians, and the interpretation of this research is to be performed by the likes of epidemiologists and statisticians who will decide what inferences may be drawn from the evidence.
The above scenario would only be credible if advocates of meta-analysis could point to a successful track record of theoretical advance—which they cannot; or if the major difficulties in evaluating research were amenable to standardized evaluation of studies and adherence to correct statistical procedures—which they are not. The massive implausibility of the biostatistical approach to interpretation should be obvious to anybody who has experienced the difficulties of learning how to become a practising scientist. Interpretation is, perhaps, the hardest of all scientific skills to master. The ability to evaluate and compare research papers, and the capacity to use this to judge the current state of knowledge and frame hypotheses for future investigation, is a skill attained—if at all—only with effort and after a prolonged apprenticeship. The skill is also relatively specific with regard to subject matter.
The notion that scientific interpretation can be reduced to statistical considerations, checklists and step-by-step flow diagrams applicable to any problem at any time (1,8,13,17,19) would be laughable were it not becoming accepted practice in some circles. Inventories are not a substitute for substantive knowledge. Clinical experience and that partly trained, partly instinctive, understanding of causes and insight into mechanisms which comes from personally grappling with the primary process of research are both elements that have time and again proved crucial to medical science.(3, 4, 20-22)
Limitations of randomized trials
The limitations of a meta-analysis are dictated by the limitations of the epidemiological studies from which it has been assembled (on the basis of 'garbage in, garbage out')- Randomized trials are generally assumed to be the 'best' epidemiological evidence regarding therapeutic effectiveness, and the methodology most amenable to meta-analysis. (1, 23) Methodological constraints which apply to the randomized controlled trial (RCT) will therefore, mutatis mutandis, also apply to meta-analyses of other epidemiological techniques such as cohort and case-control studies, and surveys.(9)
The major limitations characteristic of 'mega-trials' (large, multi-centred trials analysed by 'intention to treat') (23, 24) derive from poor experimental control and biased recruitment. (21, 25) Mega-trials employ a deliberately simplified experimental design in order to maximize recruitment and compliance, both of subjects and of collaborating trial centres. Due to logistic and ethical constraints, trials are performed on a study population that is typically unrepresentative of any actual 'target population' to which their results might be applied.
Inherent in mega-trial design is that experimental protocols do not attempt to exclude or hold constant all known sources of bias, but instead employ randomization of large numbers of subjects to distribute these potential biases equally between comparison groups. Comparisons between allocated treatments will be unbiased but at the price of conflating several causal processes, and measuring 'intention' to treat rather than the effect of treatment. For instance, if age is an important confounder, mega-trials do not control for age, but randomize large numbers of differently aged subjects. The result is that the age distribution will tend to be balanced between allocation groups; but the effects of age will be conflated with the causal variable under study. The measured association will only be directly applicable to. a target population with the same age structure as the study population.(21-23)
Mega-trials should therefore be considered as descriptive and epidemiological in nature rather than analytical and scientific.(9,14-21) Indeed, although it is an experiment, a mega-trial can most easily be understood and interpreted as if it were a special kind of survey designed to compare the outcomes when two or more protocols are allocated to a group of subjects. Randomization ensures that the comparison groups have equivalent population characteristics, and the large number of subjects allows a high degree of precision in estimating the therapy-outcome association. Generalizing from a mega-trial also resembles generalizing from a survey because both procedures depend crucially on the study population being representative of the target population. A mega-trial does not, as a scientific experiment would, aim to isolate and measure a single causal variable linking a therapy and an outcome; the measured relationship between therapy and outcome is therefore an estimate of the magnitude of an association, not of a causal process. Consequently, mega-trials are not hypothesis-testing studies (21) - and a secondary mathematical summarization of trials, such as a meta-analysis, cannot be hypothesis-testing either.
Meta-protocols and meta-populations
We can now begin to delineate the legitimate uses of meta-analysis. The 'overview' stage is neither distinctive nor sufficient to define meta-analysis—quantitative interpretation is the crucial feature. Meta-analysis is essentially a method for pooling data in order to increase the precision of estimates. The summary statistic of a meta-analysis of RCTs therefore describes the (average) outcome of allocating a meta-protocol to a meta-population. Interpreting the summary statistic of a meta-analysis (i.e. 'applying' the estimate of effect) involves establishing that the meta-protocol and meta-population are comparable to the proposed intervention and the target population.
The nature of a meta-protocol is defined by the methodological parameters of the pooled individual therapeutic interventions of constituent mega-trials. In other words, the meta-protocol is a 'virtual intervention' in an experiment whose experimental rigour is the lowest common denominator defined by the pooled deficiencies of its component studies (the level of control being defined by the lowest permitted level of control, not the average level of control). The meta-population is defined as that virtual group of subjects which has emerged after the overview population has been pooled from the component studies (with Or without statistical weighting of individual studies).
In order for the estimate of the therapeutic effect of a meta-protocol to be applicable to a target population, the meta-population must be a representative sample of the target population. This requires either that the meta-population be a randomly selected sample of the target population, or that the meta-population be created from a balanced blend of individual study populations where relevant causal variables have been measured and assembled in their proper proportion.
Clearly, the vast majority of meta-populations in published meta-analyses are not representative of the target population, or indeed of any real-world population, because meta-analyses are assembled from a group of individual RCTs the populations of which are each unrepresentative (biased) to a significant and undetermined extent.(25) Estimates cannot then be generalized to any actual population without adjustment. Adjustment will need to involve quantification and subtraction of biases. For instance, if an estimate has been confounded by biases in the age structure of the meta-population compared with the target population, then the magnitude of confounding by age will need to be investigated, quantified and its effects removed from the analysis.
It is insufficiently appreciated that the process of 'adjustment for confounding' is not a purely mathematical manipulation, but is a form of quantitative modelling of the consequences of uncontrolled causal influences on the study. Adjustment introduces new assumptions into the analysis—causal assumptions which require validation in independent studies. Adjustment will therefore diminish precision of the estimate, somewhat defeating the object of the meta-analysis.
Conclusion
Meta-analyses of mega-trials yield estimates that apply only to group averages, not to individual patients, due to the high level of within-group heterogeneity of subjects in mega-trials and other epidemiological studies.(21-23) This, in itself, means that a meta-analysis does not necessarily have any relevance to clinical practice. A bad meta-analysis, like any bad piece of research, may be useless or harmful; and, unfortunately, bad research tends to be more common than is good research.
But even accepting the population level of validity, a meta-analysis should be performed on independent studies each of which employs a qualitatively similar and therapeutically credible study design, and where the pooled trial population is representative of the target population. Such a situation of between-study uniformity is extremely rare.(26)
Furthermore, meta-analysis should not be used for testing hypotheses, but only for obtaining a more precise estimate of an effect which is already known to be present from well controlled, hypothesis-testing studies. This means that most meta-analyses are misuses of the technique. For instance, it is wrong (although common) to employ meta-analysis to determine whether a putative health risk is a genuine hazard, or whether a putative therapeutic intervention is genuinely effective. Meta-analyses cannot make qualitative distinctions in cases where causation is doubtful. The epidemiological data from which meta-analyses are constructed measure association not causation, and are not sufficiently controlled to isolate and test hypotheses.
Moreover, there are no valid, general-purpose algorithms nor statistical procedures for the interpretation of empirical research, so that most meta-analyses are underpinned by no more than the subjective opinion of investigators who are sometimes distinguished mainly by lacking the appropriate training, experience, approach and interest necessary to draw inferences from empirical research.
Meta-analysis, when all is said and done, is a technique with very restricted applicability to the clinical practice of medicine. In certain rare, well-understood and well-controlled circumstances it may provide an enhancement in the precision of estimates of group outcomes. But meta-analysis is always likely to mislead due the mismatch between its high statistical precision and low scientific validity.(3-9)
References
1 Olkin I. Meta-analysis: reconciling the results of independent studies. Stat Med 1995; 14: 457-472.
2 Shapiro S. Meta-analysis/Schmeta-analysis. Am J Epidemiol 1993; 138: 673 (abstract).
3 Rosendaal FR. The emergence of a new species: the professional meta-analyst. J Clin Epidemiol 1994; 47:1325-1326.
4 Feinstein AR. Meta-analysis: statistical alchemy for the 21st century. / Clin Epidemiol 1995; 48: 71-79.
3 Charlton BG. Management of science. Lancet 1993; 342: 99-100.
6 Charlton BO. Practice guidelines and practical judgement Br J Gen Pract 1994; 44: 290-291.
7 Chalmers T. Haynes B. Reporting, updating and correcting systematic reviews of effects of health care. BrMedJ 1994; 309: 862-865.
8 Mulrow CD. Rationale for systematic reviews. BrMedJ 1994; 309: 597-599.
9 Charlton BG. The scope and nature of epidemiology. J Clin Epidemiol 1996 (in press).
10 Cromer A. Uncommon sense: the heretical nature of science. Oxford: Oxford University Press, 1993.
11 Van Valen LM. Why misunderstand the evolutionary half of biology? In Saarinen E (ed.) Conceptual issues in ecology. Dordrecht: The Netherlands, 1982.
12 Friedenreich CM. Methods for pooled analysis of epidemiologic studies. Epidemiology 1993; 4: 295-302.
13 Dickerson K, Scherer R, Lefebvre C. Identifying relevant studies for systematic reviews. Br Med J 1994; 309: 1286-1291.
14 Thompson SO, Pocock SJ. Can meta-analyses be trusted? Lancet 1991; 338: 1127-1130.
15 Ahlbom A. Pooling epidemiological studies. Epidemiology 1993; 4: 283-284.
16 Ziman J. Reliable knowledge: an exploration of the ground for belief in science. Cambridge: Cambridge University Press, 1978.
17 Thompson SO. Why sources of heterogeneity in meta-analysis should be investigated. Br Med J 1994; 309: 1351- 1355.
18 Victor N. Indications and contra-indications for meta-analysis. Clin Epidemiol 1995; 48: 5-8.
19 Oxnam AD. Checklists for review articles. Br Med J 1994; 309: 648-651.
10 Julian D. Trials and tribulations. Cardiovasc Res 1994; 28: 598-603.
31 Charlton BG. Mega-trials: methodological issues and clinical implications. J R Coll Phys Land 1995; 29: 96-100.
22 Horvitz RH. A clinician's perspective on meta-analysis. J Clin Epidemiol 1995; 48: 41-44.
23 Peto R, Collins R, Gray R. Large scale randomized evidence: large, simple trials and overviews of trials. J Clin Epidemiol 1995; 48: 23-40.
24 Yusuf S, Collins R, Peto R. Why do we need some large, simple randomized trials? Stat Med 1984; 3: 409-420.
23 Charlton BG. Randomized trials: the worst land of epidemiology? Nature Med 1995; 1: 1101-1102.
26 West RR. A look at the statistical overview (or meta-analysis). JRColl Phys Lond 1993; 27: 111-115.
Note added: I wrote the above 22 years ago, when I was a lecturer in Epidemiology and Public Health; and would judge that it is one of the best and most original things I have done in that line. The conclusion that meta-analysis is almost always bogus and misleading remains as correct as when it was published, but the relevance is now far greater since ignorant pseudo-scientific meta-analysis has all-but taken-over the medical, and indeed bioscientific and psychological, literature; and is routinely mis-used to evaluate causality, measure generalizable treatment and causal effect size, and as a basis for public policy and clinical guidelines. The hegemony of meta-analysis is thus an encapsulation of the corruption of science.
"Meta-analysis makes the underlying assumption that when the results of relevant studies differ, the true value lies 'latent' within the existing data but concealed from investigators", which runs entirely contrary to the ordinary method of establishing scientific consensus, which is to demonstrate that results are independently repeatable under rigorous conditions.
ReplyDeleteProperly applied to science, meta-analysis should be used to tell us whether the commonly obtained results are reproduced by the most rigorous studies, not to tell us what the "scientific consensus" is, but whether there really is one. But the very embarrassing answer to that question is already in. "Scientific" studies currently, for the most part, advertise results which are not reproducible by independent, rigorous studies.