Bruce G. Charlton. Journal of Clinical Epidemiology 1996; Vol. 49, No. 6, pp. 623-626.
Introductory comment for the memorial conference for Petr Skrabanek (1940-1994): My friendship with Petr Skrabanek was conducted entirely through the medium of the written word. I initially made contact after reading Follies and fallacies in medicine [l], and in the all too brief time before his death we exchanged letters on a regular basis-our correspondence fuelled by regular indulgence in the quaint academic habit of swapping publications. I never met the man, yet few individuals have had more influence on my intellectual development. It soon emerged that we shared a background of laboratory work in the biological sciences, a love of literature (both pursuing parallel academic activities in English studies) and an attitude of scepticism concerning the common claims of epidemiology and public health. The boldness, wit, and incisiveness of his papers gave fresh impetus and inspiration to my already established interest in group, or population, studies. The current essay will argue on methodological grounds that the abuses of epidemiology, so lucidly exposed by Petr Skrabanek, are a direct consequence of misunderstanding the scope and nature of epidemiology.
THE DEMAND FOR PRECISION
The value of epidemiology as an approach to understanding and improving health is frequently subject to exaggeration by its practitioners, and by those involved in health promotion and public health.
Partly as a consequence, epidemiology increasingly sees itself as an autonomous scientific” discipline-with its own approach, techniques, departments, conferences, journals, and, most significantly, intellectual standards of proof. Proposals have been made in the United Kingdom for basing a comprehensive and detailed national program of preventive medicine and health promotion entirely on epidemiological evidence [2,3]. I will argue that such autonomy is impossible in principle, and the attempt to achieve it will only result in scientific artifacts and political abuses.
The standard definition of the subject describes epidemiology as the study of health in populations : it can therefore be considered the study of health at the group level of analysis.
Epidemiology comprises both observational and experimental methods, and includes megatrials (those “large, simple” randomized clinical trials  with “pragmatic” aims [6,7]).
The methodological unity of such disparate epidemiology techniques as the survey, the case-control study, the cohort study, and the megatrial is derived from a characteristic mode of inference by induction, based on generalizing from a “sample” [8,9]. This mode of inference can be contrasted with a “scientific” mode of inference based on devising and testing causal hypotheses.
The specific impetus behind the rise of epidemiology seems to be a striving for ever increasing precision in the measurement of health states. As therapy has advanced, clinicians have come to seek quantitative rather than qualitative improvements in management.
A further, and perhaps more urgent, demand has come from those concerned with health policy, who need precise estimates for use in statistical models designed to monitor and control health service performance.
Epidemiology appears to offer a way of quantifying the magnitude of health risks and therapeutic interventions. Precision can be enhanced with a power seemingly limited only by the size of studies. Studies have grown progressively larger; and more recently there has been a fashion for aggregating trials in the process called meta-analysis [ 11].
THE ROOTS OF BIOLOGICAL VARIATION: RANDOM AND SYSTEMATIC ERROR
The major problem that besets quantification in medicine is the large variation between patients: in therapeutic terms this translates as excessive unpredictability in prognosis. The tacit assumption that lies behind the epidemiological practice of averaging populations in order to enhance precision is that the major barrier to attaining valid biological estimates is excessive random error in measurement.
Epidemiology presupposes that the underlying nature and quantity of a variable is obscured by “noise” that can be removed by averaging and other statistical adjustments, on the basis that - given adequate numbers of instances - errors in one direction will cancel errors in the other direction.
And here lies the root of the problem. The implicit assumption that noise, or random error, is the major obstacle to biological understanding is incorrect. The major difficulty in measuring the true value of biological phenomena, and the principal cause of variation between individuals, is typically not random but systematic error [8,9].
Systematic error is due to qualitative differences between either the entities being compared or the causal processes operating on them. Such qualitative differences produce the problem of bias or distortion of comparisons between instances due to the “unlikeness” of instances.
And systematic error may be difficult to deal with, because the complexity of causal interactions at the level of biological phenomena is often extremely great . Even if all the relevant causes are understood, it may not be possible to control them. This difficulty is compounded in human studies by a mass of subjective factors (such as placebo effects), as well as by ethical constraints on study design.
It is this problem of intractable bias - rather than random noise - that accounts for the bulk of observed variation in medicine.
Another common error is to assume that strict randomization of large numbers of subjects is able to control all important forms of bias; yet randomization does nothing to eliminate systematic differences between subjects from the experiment, it merely distributes the systematic error equally between comparison groups .
For averaging to have the effect of increasing precision, the instances averaged must differ only randomly so that errors will cancel . In other words, instances should be qualitatively identical, or homogeneous in all respects relevant to the circumstance. If averaging is to be used to increase precision, each subject in a study should be de facto a “duplicate” of any other subject.
If, on the other hand, these assumptions do not hold, and instances within a group are heterogeneous, then, averaging will not simply increase precision, but will also create an artefact - an entity that does not correspond to any individual instance and is real only at the group level of analysis . An artifact is the inevitable consequence of summarizing dissimilar instances in a single statistic.
In the basic empirical biological sciences, great effort is expended to attain control of relevant causes and ensure homogeneity between instances - for instance, by inbreeding and identical rearing of laboratory animals and by subjecting them to rigorous experimental protocols . Each population or group studied is then composed of interchangeable subjects.
Averages of these “duplicate” instances are a valid way of enhancing precision in measurement because the process serves merely to reveal underlying uniformity in the data. In the human sciences, subjects within a population cannot usually be regarded as interchangeable and the use of averaged data is correspondingly beset with hazards: the poorer the control, the greater the hazard.
Yet, epidemiological methods routinely involve creating statistical summaries of instances that are heterogeneous-indeed, in some cases the heterogeneity is an inevitable by-product of the need to recruit large numbers of subjects.
For instance, the megatrial methodology requires large numbers of subjects; and criteria for entry, experimental protocols, and the outcome measures are deliberately simplified with this aim in mind.
Such deliberate simplification corresponds to deliberate reduction in experimental control, and means that bias is wittingly introduced into the experimental situation by allowing the incorporation of systematic differences between subjects and the causes operating upon them.
The averaging of heterogeneous instances produces summary statistics of populations that are artifactual from the clinical perspective of individuals. Yet the assumption is routinely made that group data are predictive of individuals.
Indeed, the practice of “evidence-based medicine” regards megatrials as the “gold standard” of guidance for the treatment of individual patients . Such misapplication of group data to individual instances is sometimes called the ecological fallacy .
On formal methodological grounds, the estimate derived from an epidemiological study including heterogeneous subjects (such as a megatrial, or a meta-analysis of such trials) tells us nothing about the experience of individuals either within that study (internal validity) or in other circumstances (external validity).
This lack of generalizability arises because enhanced precision has been attained only at the cost of reduced validity: narrow confidence intervals around invalid estimates.
Misunderstandings of the megatrial methodology are due to the tacit conflation of systematic and random error so that a large, simple megatrial and a small, rigorously controlled study are regarded as equivalent when they share the same size of confidence interval.
But experiments with different protocols are different experiments!
MODES OF INFERENCE: SCIENTIFIC AND INDUCTIVE
The distinction between random and systematic error forms the crux of the argument that asserts that epidemiology is not a natural science. Science strives to describe the underlying structure of phenomena: the nature of entities and how they are causally interrelated .
But when epidemiology reduces random error at the cost of increasing systematic error, the “populations” that form the entities of epidemiological analysis become artifactual when interpreted at the level of individual instances, and the relationships between these entities are not unitary causes but incompletely controlled mixtures of causes that can be interpreted only as associations.
Science aspires to create “structured knowledge” in terms of causally linked real entities. Such scientific structures are of the nature of hypothetical models that can be tested against observation and experiment, or used to draw deductions, or make predictions.
Epidemiology, by contrast, provides a summary of a state of affairs rather than a description of an underlying structure, and therefore cannot use the same inferential procedures as science.
Instead, epidemiology makes generalizations on the basis that a specific study constitutes an estimate of the state of affairs in a larger notional population to which its results may be generalized. This style of inference is inductive and relies on the assumption of uniformity in the phenomena under investigation  - the larger notional population should differ only randomly from the epidemiological sample.
The paradigm of epidemiological techniques is therefore the survey.
The ideal survey is a sample containing the full range of instances of a measured variable in their proper proportions, and selected in an unbiased fashion, such that the sample is a microcosm of the larger population.
In practice, this usually implies the need for a large, randomly selected sample (i.e., random with respect to the relevant causal processes). Assuming that bias can be eliminated, the survey can then be summarized purely on the basis of its statistical parameters.
Even epidemiological experiments, such as the megatrial, can be conceptualized as surveys: a megatrial yields estimates of health states in two or more populations that differ only in having experienced different protocols.
Like other forms of inductive reasoning, the validity of epidemiological inference depends on extraneous knowledge from scientific investigation of entities and causes to ensure that populations between which estimates are generalized are similar with regard to relevant causes .
If they are not, and populations differ systematically (as is commonly the case with megatrials), then epidemiology depends on science for an understanding of the magnitude of expected bias. Epidemiological studies are interpretable only when performed within a framework of existant knowledge.
It should also be emphasized that because epidemiological inference is of the nature of a statistical summary, its validity is entirely derived from the validity of its constituent data . Epidemiological models, in contrast to scientific models, are not analytic and do not contribute to understanding phenomena, being merely a “representation” in microcosm of a body of information.
EPIDEMIOLOGICAL PRACTICE AND POLICY
The above analysis may be used to underpin Petr Skrabanek’s forceful criticisms of epidemiological practice [17-20].
Epidemiological techniques, which yield merely statistical summaries of states of health, should not be used as methods of exploring the determinants of human health.
Epidemiology is signified by its population level of analysis and consequent limitation to an inductive mode of inference: it is a mistake to attempt to create an autonomous subject of epidemiology from such negative qualifications.
The reification of this non-causal level of analysis to the status of a discipline has been conspicuously unsuccessful in terms of generating “reliable knowledge” of disease , although highly successful at generating research funding, which for many people is justification enough.
Furthermore, the problem with using epidemiology as a “gold standard” or criterion reference becomes clear. “Evidence-based medicine” (EBM) explicitly regards epidemiological techniques such as the megatrial and meta-analysis as the ultimate foundation of clinical practice and hierarchically superior in nature to evidence from the natural sciences [ 14] (indeed, the proponents of EBM mistakenly believe that megatrials and meta-analyses are sciences).
Medical science can manage without epidemiology, but useful epidemiology cannot be done without medical science.
If we appreciate that the investigation of biological phenomena is beset by problems of uncontrolled bias, we can understand why the common epidemiological practice of seeking small real effects amidst large systematic error reliably leads to estimates of high precision but low validity.
For instance, using big case-control studies to investigate marginal relative risks to health is a recipe for false inference [19,20]. Such studies conflate random and systematic error in their quest for a spurious notion of accurate measurement.
The level of uncontrolled and residual unadjusted bias in such studies will result in systematic differences between the case and control populations that quite overwhelm any real or imagined causal effects of modest magnitude. Large case-control studies do not measure what they purport to measure, but they describe their artifacts with exquisite exactitude.
These basic mistakes in the approach of epidemiology are compounded by the search for multiple “risk factors,” the effects of selective publication in favor of positive results, and by an attitude to public policy that takes a one-sided view of risk assessment .
The criteria for ascribing causation to a relationship have been progressively weakened over recent decades by mainstream epidemiologists, to the point that any correlation is readily interpreted as an element in a vague “multifactorial” web of contributing “determinants.” The need for a biologically meaningful cause is explicitly rejected, and “black box” association is vaunted as a methodological advance .
It is ironic that the massive attention paid to the topic of “causality” in epidemiological texts and journals has served, after all, merely to reduce the rigor with which causation is attributed .
Weaknesses in epidemiological methods are compounded by the frequently moralistic or political aims that drive investigation in this field. The dangers are great, given the inherent lack of ability of epidemiological methods to discriminate real causes from biased associations.
Skrabanek documented, exhaustively and with pungent wit, the innumerable ways in which epidemiology is used to give a veneer of quasi-scientific respectability to the recommendations of government officials, managers, and those who have something to sell [1,23].
This activity trades on the prestige of mathematics and statistical analysis, combined with the impressive weight of evidence provided by large data bases. Merely because a research study is big, slow, and costly, and involves hard sums, does not confer validity on its conclusions.
The end result of these epidemiological abuses has been to transform risk-factor epidemiology into a highly effective, albeit expensive, mechanism for generating irrefutable health scares based on untestably vague pseudo-hypotheses.
And such is the degree of precision with which putative risk is defined that only further and larger epidemiological studies can investigate the question. The asserted autonomy of epidemiology is apparently confirmed because, once a scare has begun, epidemiological studies follow one on another, generating work for the investigators but failing to move any closer to settling the dispute, for the good reason that epidemiology is systematically incapable of resolving debates concerning causal mechanisms.
THE IDEAL EPIDEMIOLOGIST
So much for the down-side of epidemiology. However, the preceding analysis also allows us to make some suggestions for improvement.
The nature of epidemiology may be defined as that activity concerned with preparing and comparing statistical summaries of health states in populations. The scope of epidemiology is dependent on the parameters within which such summaries may legitimately be applied. Legitimate inference is contingent on the prerequisites for induction being present in a given situation.
Whether or not an estimate from an epidemiological study is a valid estimate of a state of affairs in a target population is therefore conditional on the study population being a microcosm of the target population. And determination of this critical attribute of “representativeness” is (mostly) a scientific matter of understanding systematic biases, not a statistical matter of dealing with random error.
One consequence is that epidemiology should be regarded as subordinate to, and contained by, science; because knowledge of causes is essential for establishing the legitimacy of generalizing from a specific epidemiological study.
The major role for epidemiological studies is therefore to enhance precision of estimates of states of health in situations where the nature and magnitude of systematic errors are known. So representativeness is always vital.
Furthermore, if epidemiology is to be used to inform clinical practice (i.e., if summary statistics derived from populations are to be applied to individual patients) then subjects in the populations studied should also be homogeneous in terms of the relevant causal variables. In this case it will be necessary to establish both representativeness and homogeneity of groups in an epidemiological study.
What are the implications for practice? The proper way to practice epidemiology is to regard it as merely one element in the investigation of pathology-as a set of tools, not as an autonomous discipline.
Epidemiological investigation should be subsumed within larger goals of either a biological or medical kind: seen as part of a repertoire of techniques brought to bear in understanding, explaining, and intervening to ameliorate disease.
It is striking that, with only a few counter-examples, the best epidemiology has usually been done by clinical scientists primarily interested-in, and knowledgeable-about, specific problems of pathology, rather than by specialist “epidemiologists” whose interest is primarily in statistics and methods. Collaborations between physicians who do not understand statistics and epidemiologists who do not understand disease are just as bad, due to the lack of any cohesive critical, integrative, and guiding intelligence.
Given these considerations, the current trend for recruiting “pure” epidemiologists from those whose skills are numerical, and whose approach and methods are concerned with noise reduction rather than bias elimination, is a mistake. Collecting these epidemiologists into specialist academic groupings (departments, units, schools, etc.) and regarding them as general purpose “guns for hire” whose expertise is impartially applicable to health problems (“Give us the data, and we will tell you what it means”), only compounds the problem and diminishes the likelihood of correcting error.
We cannot expect much more than a narrowly circumscribed facility from investigators who are “hands-off’ designers and analyzers of population studies, and-only as an afterthought-try to learn enough biology and medicine for the job in hand. Such specialists are more akin to a technician running a blood analyzer than to the scientists and clinicians who use the machine as a tool for understanding natural phenomena.
The ideal epidemiologist would therefore be a generalist, primarily a scientist or a clinician who has an extra realm of skill in dealing with population health. Epidemiologists who aspire to this ideal would wish to develop a profound interest and knowledge of particular diseases (or health states) and their determinants. This is vital because most epidemiological studies are riddled with systematic errors.
The major problems in interpretation are not statistical (size, power, confidence intervals, etc.), but systematic (representativeness, homogeneity, degree of control, etc.). Adjusting and interpreting the results to compensate for inevitable biases is a matter for the numerate scientist, not the abiological statistician.
The ideal epidemiologist should be de facto a theoretical medical scientist, grounded in the kind of biological and clinical “common sense” that allows the possibility of working on a problem in an interdisciplinary fashion. Fragmentary evidence from a range of disciplines may need to be gathered and combined to produce plausible and testable hypothetical models.
Much of the work would involve participating in the surveying, planning, and critical evaluation of clinical and laboratory studies; but the epidemiologist’s distinctive task would be to fine-tune estimates of quantity and define the scope of their applicability, by establishing the nature and magnitude of adjustments required for generalizing study results to specific target populations.
Epidemiology does not, of itself, increase our understanding of the world, although it may increase the ability to make predictions.
Inductive inference tells us that because the sun has always risen in the morning, we may assume that it will do so tomorrow.
Furthermore, humankind has known the calendrical procession of daybreak for tens of thousands of years.
But the parameters within which we can apply such inductive knowledge can be set only by scientific knowledge of causes, and it is this understanding that enables us to unlock the hidden potential of nature .
Until we grasped that the solar system was heliocentric, and discovered the pathways and rotations of the heavenly bodies, we could never have predicted the sunrise on another planet.
1. Skrabanek P, McCormick J. Follies and Fallacies in Medicine. Tarragon, Glasgow, Scotland, 1989.
2. Rose G. The Strategy of Preventive Medicine. Oxford University Press, Oxford, 1992.
3. Charlton BG. A critique of Geoffrey Rose’s ‘population strategy’ for preventive medicine. J R Sot Med 1995; 88: 607-610.
4. Last JM. A dictionary of Epidemiology. Oxford University Press, New York, 1988.
5. Yusuf S, Collins R, Peto R. Why do we need some large, simple controlled trials? Stat Med 1984; 3:409-420.
6. Schwartz D, Lellouch J. Explanatory and pragmatic attitudes in therapeutic trials. J Chron Dis 1967; 20: 637-648.
7. Charlton BG. Understanding randomized controlled trials: Explanatory or pragmatic? Fam Pratt 1994; 11: 243-244.
8. Van Valen LM. Whv misunderstand the evolutionary half of bioloev? In: Conceptual Issues in Ecology (Saarinen E, ed.). Reidel, Dordre&t, Holland, 1982.
9. Charlton BG. Mega-trials: Methodological issues and implications for clinical effectiveness. J R Co11 Physicians Lond 1995; 29: 96-100.
10. Charlton BG. Management of science. Lancet 1993; 342: 99-100.
11. Feinstein AR. Meta-analysis: Statistical alchemy for the 2lst century. J Clin Epidemiol 1995; 48: 71-79.
12. Rosenberg A. Instrumental Biology or The Disunity of Science. University of Chicago Press, Chicago, 1994.
13. Bernard C. An Introduction to the Study of Experimental Medicine. Dover. New York. 1957. 1Reurint of 1865 edition.
14. Rosenberg W, Donald A: Evidence based medicine: An approach to clinical problem solving. Br Med J 1995; 310:1122-1126.
15. Bronowski J. Science and Human Values. Harper Colophon, New York, 1975.
16. Crick F. What Mad Pursuit: A Personal View of Scientific Discovery. Wiedenfeld and Nicolson, London, 1989.
17. Skrabanek P. Risk factor epidemiology: Science or non-science? In: Health, Lifestyle and Environment. Social Affairs Unit, London, 1991, pp. 47-56.
18. Skrabanek P. The poverty of epidemiology. Perspect Biol Med 1992; 35: 182-185.
19. Skrabanek P. The epidemiology of errors. Lancet 1993; 342: 1502.
20. Skrabanek P. The emptiness of the black box. Epidemiology 1994; 5: 553-555.
21. Ziman J. Reliable Knowledge: An Exploration of the Grounds for Belief in Science. Cambridge University Press, Cambridge, 1978.
22. Charlton BG. Attribution of causation in epidemiology: Chain or mosaic? J Clin Epidemiol 1995; 39: 146-149.
23. Skrabanek P. The death of humane medicine and the rise of coercive healthism. Social Affairs Unit, London, 1994.