Tuesday 28 May 2013

Reaction times in a 'perfectly matched' Victorian versus modern population sample: A litmus test of honesty and competence


Michael A. Woodley, Jan te Nijenhuis & Raegan Murphy have followed up their long-term analysis of reaction times as evidence of a rapid and significant decline in average intelligence since Victorian times


by publishing a rapid and robust refutation of the criticism that this result could be explained-away by differential sampling.


They present a modern population sample from 1989 which is (as-near-as-dammit in this imperfect world) perfectly--matched with Francis Galton's sample from the late 19th century.

The 1989 sample had an average reaction time of 245 milliseconds compared with Galtons 1889 average reaction time of much slower average reaction time of 194 ms - confirming Woodley at al's identification of a decline of general intelligence of approximately one standard deviation, or 15 IQ points.


This unusual compression of the time-scale of scientific debate presents a litmus test of the honesty and competence among the commentators who rejected the original paper on micro-methodological grounds of having serious concerns about sampling issues; grounds which I argued were inappropriate, incompetent and - in their effect - anti-scientific: 


Thus we are now in a position to observe whether such critics understand and acknowledge that they have in fact been refuted; or else whether they reveal the existence of some hidden agenda by maintaining their rubbishing and rejection of the Woodley et al paper by ignoring this refutation, or shifting the grounds for criticism. 


ORIGINAL PAPER: A high-quality replication of Galton’s study one century later: Wilkinson & Allison (1989)

Michael A. Woodley, Jan te Nijenhuis & Raegan Murphy

In Woodley, te Nijenhuis, and Murphy (2013, in press) we argue that intelligence has declined substantially since Victorian times, based on a meta-analysis of simple reaction time. An exchange of ideas started at several blogs. We hereby reply to the blogposts of Scott Alexander and HBD Chick, reacting to an earlier post made by us.

A paper has come to our attention that provides strong evidence against the supposed representativeness problem across cohorts (e.g. Alexander, 2013). The study in question is that of Wilkinson and Allison (1989) using a sample of 5,324 visitors to the London Science Museum, which is situated at the exact site of Galton’s 19th century Anthropometric Laboratory in South Kensington.  All visitors undertook psychophysical testing on a simple reaction time-measuring apparatus, just as the people in Galton’s study did. Of these mixed-sex participants 1,189 were aged between 20 and 29, and are thus highly similar to the age range employed in our own study. Their simple RT mean was substantially slower than the weighted 1889 RT mean (245 ms vs. 194.06 ms), and furthermore the mean of this sample falls very close to the meta-regression-estimated mean across studies for the late 1980s (approximately 250 ms, see: Figure 1 in Woodley, te Nijenhuis & Murphy, 2013). The remarkable features of this study are the ways in which it replicates virtually every significant demographic aspect of Galton’s study.

There is the issue of a participation fee. Galton is known to have requested a participation fee of 3 pennies (approximately £5 in modern UK currency). The London Science Museum required the payment of an admissions fee right up until December 2001. Furthermore it still requires the payment of fees of £6 to £10 for access to some special exhibitions (London Science Museum, 2013a). The Wilkinson and Allison (1989) study was in fact conducted as part of a special exhibition entitled Medicines for Man, which was hosted by the Museum from the early 1980s (Medicines for Man Organizing Committee, 1980). Therefore participation fees were employed in the case of both studies.

There is strong evidence for the demographic convergence between the two studies. Johnson et al. (1985) indicate that whilst Galton’s sample included persons from all occupational and socioeconomic groups in Victorian London, it was nonetheless skewed towards students and professionals, and both groups could fairly be described as solidly White and middle class. In the last decades of the 20th century, museum attendance in the UK exhibited precisely the same skew in terms of sociodemography. Eckstein and Feist (1992) for example noted that most UK museum visitors are drawn from White and upper-middle-class populations. Furthermore Hooper-Greenhill (1994) observed that the largest minority ethnic groups in the UK (i.e. Asians and Afro-Caribbeans) are underrepresented amongst museum visitors. In acknowledging this issue, a House of Commons report in 2002 stated that free admission to museums would unlikely ‘… be effective in attracting significant numbers of new visitors from the widest range of socio-economic and ethic groups’ (House of Commons report, 2002, p. 23).

The presence of this self-selection amongst visitors strongly harmonizes the studies of Galton and Wilkinson and Allison. Add to this the fact that participation fees were employed in both cases, the fact that the geographical locations were exactly the same and finally the fact that the age demographic of interest (i.e. twenty-somethings) were intensively sampled in both cases (i.e. 3,410 in the case of Silverman’s subset of Galton’s sample and 1,189 in the case of Wilkinson and Allison). The net of this is that the studies become even more strongly convergent in terms of comparing like with like. Thus the argument of more heterogeneous samples visiting museums in the 1980s compared to more restricted samples visiting museums in the 1880s is critically weakened. The principal objections that can be leveled against this are as follows.

Firstly there is the issue of tourism. Most tourists to the UK are from the US and Europe (Tourism 3B), meaning that they are likely to be both ethnically and socioeconomically matched to the majority of the participants in this study (i.e. UK citizens). In fact, international arrivals in the United Kingdom in 1990 show that of the 439 million inbound tourists, 60% were European in origin and 21% emanated from the Americas. Hence, 81% of the tourist population came from groups which are highly ethnically similar to the British. Only 12% came from Asia and the Pacific with a meager 3% coming from the Middle East and 2% from Africa (Tourism 3B). In sum, it is unlikely that tourists being tested in the 1989 study were substantially ethnically different from the typical UK museum visitor. Based on current statistics from the Science Museum, the preponderance of visitors hail from the UK (69%) and the preponderance of those are from Greater London (44%; London Science Museum, 2013b). Historically, especially prior to the 1990s this figure would have been much higher, owing to far lower levels of tourism to the UK (in 1990 international tourism levels were less than half the current levels,  >940 million per year, BBC, 2013). This means that in all likelihood well over 70% of the participants in Wilkinson and Allison’s study would have been British, and the overwhelming majority of these would have been White, upper middle-class and from London. The overwhelming majority of the international visitors would have been ethnically and broadly socioeconomically matched to the British visitors.

Secondly is the issue of instrumentation. Galton utilized a pendulum chronoscope with a temporal resolution of around a centi-second (i.e. 1/100th of a second, or 0.01 seconds). The electronic apparatus employed by Wilkinson and Allison in all likelihood had a higher resolution (post-1908 chronoscopy at least had the potential to be accurate to a single milli-second; Haupt, 2001), however a centi-second level only resolution in Galton’s apparatus cannot account for the substantial discrepancies between these two studies.
Thirdly, Galton’s sample was single person-single trial, whereas Wilkinson and Allison’s study employed two practice trials followed by 10 trials per person for the purposes of averaging. This protocol would almost certainly have enhanced the reliability of Wilkinson and Allison’s data relative to Galton’s (Jensen, 1980); however in both cases we are dealing with aggregates. Strong biases (i.e. jumping the gun vs. slow to start) have the potential to cancel each other out when employing these sorts of very large datasets, as these sources of error are distributed in a Gaussian fashion. This means that aggregate-level mean-wise comparisons are appropriate for comparisons between data exhibiting different coefficients of reliability coupled with very large Ns.

On this basis Wilkinson and Allison’s (1989) study must be considered an excellent replication of Galton’s study. Its mean reaction time for the relevant age cohort is almost precisely where our meta-regression predicts it should be. This is clearly strong supporting evidence for the robustness of the increase in simple RT latency produced to date and so puts even more nails in the coffin of those who argue that the trend can be accounted for by lack of representativeness across cohorts.

Alexander, S. S. (2013). The wisdom of the ancients. Slate Star Codex. URL: http://slatestarcodex.com/2013/05/22/the-wisdom-of-the-ancients/ [retrieved on 24/05/13]
BBC. (2013). GCSE Bitesize. Geography tourism trends. http://www.bbc.co.uk/schools/gcsebitesize/geography/tourism/tourism_trends_rev1.shtml
Eskstein, J. & Feist, A. (1992). Cultural Trends, 1991. London, Policy Studies Institute.
Haupt, E. J. (2001). Laboratories for experimental psychology: Gottingen’s ascendancy over Leipzig in the 1890s. In: Rieber, R. W., & Robinson, D. K. (Eds.), Wilhelm Wundt in History. The Making of a Scientific Psychology. (pp. 205-250). New York: Kluwer Academic.
Hooper-Greenhill, E. (1994). Museums and their Visitors. London, Routledge.  
House of Commons, Culture, Media and Sport Committee (2002). National Museums and Galleries: Funding and free admission. House of Commons, United Kingdom.
Jensen, A. R. (1980). Bias in Mental Testing. New York: Free Press.
Johnson, R. C., McClearn, G., Yuen, S., Nagosha, C. T., Abern, F. M., & Cole, R. E. (1985). Galton's data a century later. American Psychologist, 40, 875–892.
Medicines for Man Organizing Committee. (1980). Medicines for Man: A Booklet Based on an Exhibition at the Science Museum about Medicines - how They are Discovered and how They Work, how They are Made and Tested, how They are Prescribed and Dispensed, and how Laws Control Their Use. London, Science Museum.
No author (no date). Tourism 3 SB. Oxford University Press
London Science Museum. (2013a). http://www.sciencemuseum.org.uk/visitmuseum/prices.aspx [retrieved on 27/05/2013]
London Science Museum. (2013b). http://www.sciencemuseum.org.uk/about_us/history/facts_and_figures.aspx [retrieved on 27/05/2013]
Wilkinson, R. T., & Allison, S. (1989). Age and simple reaction time: Decade differences for 5,324 subjects. Journal of Gerontology, 44, 29–35.
Woodley, M. A., te Nijenhuis, J., & Murphy, R. (2013). Were the Victorians cleverer than us? The decline in general intelligence estimated from a meta-analysis of the slowing of simple reaction time. Intelligence. doi:10.1016/j.intell.2013.04.006




dearieme said...

Forgive me for going O/T, but I wondered if you might have anything to say about this quotation from Richard Lehman's blog at the BMJ?

"This is the most shocking and shaming medical paper I’ve read for a long time: a survey of the gap in life expectancy for preventable physical illness in psychiatric patients. It comes from Western Australia, which presumably enjoys the same standards of psychiatric care as the rest of the rich world. “When using active prevalence of disorder (contact with services in previous five years), the life expectancy gap increased from 13.5 to 15.9 years for males and from 10.4 to 12.0 years for females between 1985 and 2005. Additionally, 77.7% of excess deaths were attributed to physical health conditions, including cardiovascular disease (29.9%) and cancer (13.5%). Suicide was the cause of 13.9% of excess deaths.” This is a scandal which needs urgent further investigation: are we becoming indifferent to physical illness and early death in the mentally ill? And are the newer drugs like olanzapine and quetiapine that we hand out so liberally actually contributing to an epidemic of cardiovascular disease in these patients??"

Bruce Charlton said...

@d - I haven't looked at this particular study; but the conclusion has been amply shown by both David Healy and Robert Whitaker and in the published literature generally.

It's the drugs - especially the new/ atypical antipsychotics, and their truly vast over-prescription - almost certainly.

This isn't rocket science! I mean, you just have to open your eyes and see what happens to people; and stop lying that it would have happened anyway.

dearieme said...

Oh well, here's a flyer. It's not (just) dysgenic breeding that's making us stupider it's (partly) the medicines we take.

Bruce Charlton said...

@d - too recent - but such drugs would certainly reduce performance in many kinds of cognitive test, plus the long term brain damage.

JohnK said...

A bit late to this party, sorry, but I've just come across something. What if it's all true (quicker RT and higher IQ among Victorians), but it was due to PHENOTYPIC changes?

The assumption has been that the Victorians couldn't possibly have bested 'modern' times in terms of phenotypic enhancements to RT and IQ.

This surprising 2009 paper casts some doubt on that bland assumption.

"...we argue in this paper, using a range of historical evidence, which Britain and its world-dominating empire were supported by a workforce, an army and a navy comprised of individuals who were healthier, fitter and stronger than we are today. They were almost entirely free of the degenerative diseases which maim and kill so many of us, and although it is commonly stated that this is because they all died young, the reverse is true; public records reveal that they lived as long – or longer – than we do in the 21st century."

Bruce Charlton said...

@JK - The observation is that RT gets worse, while IQ tests go in the opposite direction and improve.

I don't really see what the paper has to do with RT - I personally think this paper is wrong pretty much across the board (from what I know from other sources, including experience of meeting some Victorians) - but it could be correct and not have implications for RT.

Could you spell out the link further?