Saturday, 23 October 2010

My alpha MSH RIA study - The biggest waste of time of my life?...


Arguably, the work leading to the paper published here:

and referenced:

Charlton BG, Ferrier IN, Gibson AM, Biggins JA, Leake A, Wright C, Edwardson JA. A preliminary study of plasma alpha MSH concentrations in depressed patients and normal subjects. Biological Psychiatry. 1987 Oct;22(10):1276-9.

Was my biggest waste of time - at least in my early career (in the days when I had a career). 


It was a huge waste of time since this alpha MSH study was something I intended to 'knock-off' in six weeks, but ended up consuming 18 months (although not full time). It led to a small brief report paper (just a couple of pages) with a largely negative result that has received (I think) one citation in 23 years...


But I did have an interesting experience as a result - on the nature of scientific 'evidence' and the effect of  thresholds of skepticism.

I need to be careful not to tell this story in a self-glorifying way, since here I was a skeptic, and (as is usual, over the long term) skepticism was justified - yet of course the essence in science is not about being skeptical as such (which would lead nowhere), but about being skeptical-enough.


The whole thing hinged on detecting a little peptide hormone called alpha MSH in human blood, and the question was firstly whether or not this hormone was actually present in human blood, and secondly if it was present whether there was a circadian pattern of variation (daily rise and fall).

The MSH was detected using a radio-immunoassay (RIA), and the whole thing hinged on the bottom limit of detection in the assay - since the levels being recorded in the previous literature were right at the lower end of assay sensitivity (this is a normal situation in science - the methods are only just good enough to do whatever is trying to be done - because easy stuff in science has already been done and what remains is on the limit of do-ability).

To measure MSH in the blood sample involved using the assay to create a standard curve: known amounts of MSH were added to buffer, and measured with the assay which detected amounts of radioactivity. By measuring radioactivity in the blood samples, the amount of MSH could then be 'read off' the standard curve by interpolating the unknown level into the curve.

But the standard curve has a bottom limit of detection and could only yield results from the point where a sample containing certain amount of added MSH gave a radioactivity count which was significantly higher than adding zero MSH. Everything hinged on distinguishing zero from the point at which the MSH became above-zero.

The problem is that the assay should not be too large - or it would not work (big RIAs did not work - I never really knew why), and also a large assay would use-up the blood samples and antibody - both of which were essentially irreplaceable.

To get the best results the assay should be as large as necessary to generate reliable results, but no larger.


What this boiled down to, was how many 'replicates' (identical repetitions of the assay - how many test-tubes of reagents) should be made for each measurement.

Originally the assays had been done by measuring a single sample for each point on the standard curve and for each unknown sample.

Then people began averaging duplicate samples and using the operational criteria that different concentrations would count as different if the duplicate measurements did not overlap.  The bottom limit of detection was then the lowest concentration of MSH on the standard curve where the duplicates did not overlap with the duplicate for zero.

But duplicates doubled the amount of material consumed by each experiment!

However, I found that duplicates could be fairly widely spaced - there was a fair bit of 'noise' in the assay. So I began averaging triplicates to get a better estimate. This, however, meant that the amount of precious material consumed by each measurement went up by another fifty percent!

Then I became concerned specifically about the detectability threshold, so I used sextuplicates (six measures) for the zero point, and did a mean average and standard deviation on these six - the threshold of the assay was then set as the mean plus two standard deviations above the radioactivity measurement from the zero point (i.e. any sample registering higher radioactivity than this threshold counted as having alpha MSH present at above-zero levels).


While sorting this out, months were passing in the world!

The assay was temperamental, and often did not work at all for reasons which were seldom really clear - but this often happens with biological systems.

Eventually I was able to perform a couple of satisfactory assays which showed (confirmed previous reports, really) that alpha MSH was indeed present in human plasma at just above the limit of detectability - but that it did not vary in any interesting way - at least not that I could detect.

Plasma MSH was, I thought, probably doing nothing functional; just overflowing from MSH produced in the pituitary gland perhaps.

But it was not, by any stretch, an interesting result.


The reason I mention this is to show that science can depend upon extraordinarily tiny decisions - like how many replicates of a measurement can be made.

And that there is no 'right answer' to these questions - just degrees of tolerance, or of skepticism perhaps, or differences in your prior hypotheses - but it is a very personal thing.

Or rather it is a mixture of personal (including differences in personal truthfulness - because in these areas of uncertainty slight differences in the degree of personal honesty or scrupulosity can lead to big differences in apparent-outcome); and also the social - because it was essentially a consensus within the group of people engaged in measuring MSH and similar hormones that led to the practice of how many replicates were necessary.

If you went too far from the consensus, the work would not get published - or if it did get published it would be ignored (which usually happened anyway). 

So within the field there were those who were building elaborate theories of function and disease on the basis of detecting consistent patterns of MSH variation in human plasma; there were those like me who said that MSH was present but didn't seem to be doing anything much; and those who said that MSH was either not present at all - or present at such low concentrations as to have no effect - and that the apparent detection of was due to technical imperfections (such as cross-reactivity between antibodies and a variety of hormones).

These were big, and potentially important, differences in conclusions - and at root they hinged on subjective decisions about how many replicates were necessary or desirable, and how to handle the construction of standard curves.


NB: As in interesting further point - I would always, on principle, draw standard curves by hand - doing the curve-fitting by eye rather than using the 'least squares' statistical method of line drawing which is how curves were generated by statistical programs.

I saw no reason why the least curves statistic was intrinsically valid - to use it was an arbitrary and subjective decision; I preferred to used subjective judgment with respect to line-fitting on each specific curve, rather than as a broad brush decision regarding a standardized statistical technique.

I think I was right - and this is after all how scientists proceeded during the golden age - but nobody would be able to do this nowadays. Statistical conventions rule, and the subject is not open to debate.



dearieme said...

Much of science does boil down to questions of how measurements, or measurement-substitutes, are made. I was pondering the "epidemic" of Type 2 Diabetes recently. There's lots of epidemiology, but it occurred to me that the discussions that the layman like me sees don't touch on measurement - in this context, that would mean that they don't answer the question "how stable is the diagnosis of diabetes?". Is it diagnosed by some measurement exceeding a threshold? Has the threshold been held constant over time? Has the mode of measurement been held constant? Are the diagnoses subject to medical fashion, or to financial incentives offered to doctors?

(P.S. Someone opined that the "epidemic" was as important as Global Warming. I hope that I managed to suppress my snort.)

Bruce Charlton said...

In asking that question you have answered it.

I was a professional academic epidemiologist and public health physician for three years and saw for myself the indifference to the actual validity of measurements upon which everything is based.

There is an intrinsic but unspoken assumption that volume of data can compensate for poor quality of data - this runs through the social sciences.

This misunderstanding seems to arise from a confusion of random noise with systematic bias.

I allude to this here:

But I now realize that my main writing on the subject (The scope and nature of epidemiology) is not available online without subscription - so I will post it onto this blog later today.

dearieme said...

Excellent. Meantime I have stumbled across this

It seems to be a verbose version of my own dictum:

"All medical research is rubbish" is a better approximation to the truth than almost all medical research.

Bruce Charlton said...

Yes - The Atlantic piece is semi-correct; it gets the diagnosis right, but for the wrong reason, based on faulty understanding - so the prescription is likely to be dangerous.

Medical research has been roughly doubling in size every decade for a long time.

Naturally therefore (to look no further for causes), by now medical research is almost all rubbish so it is perfectly rational to operate on the basis that it is *all* rubbish - unless specifically proven otherwise by clear, commonsensical criteria (preferably confirmed by personal experience).