Monday 19 December 2011

Statistical assumptions are not scientific assumptions

*

It is worth noting that frequentist statistics are built on the assumption of no difference between groups (that two groups are assumed to be random samples from a single population). 

From this assumption, which has nothing whatsoever to do with reality (and is essentially an historical accident derived from the work of Ronald Fisher on crop yields), we tend to assume there is no difference between groups unless 'proven' otherwise. 

*

Yet, in the case of human groups separated by scores of generations, and when looking at traits (such as 'g' - general intelligence and personality) which 

1. substantially affect reproductive success, and 

2. are substantially heritable - then this assumption of sameness is irrational. 

*

In other words, it would make more sense, scientifically (as opposed to statistically) to expect to find important differences in cognitive abilities and dispositions (including their magnitude and distribution) between separated human populations. 

*

Indeed, that was pretty much always the case in the past - people expected that 'strange' people would be different from themselves - often exaggerating the degree of difference to an absurd extent in travellers tales. 

*

We have gone crazily far in the opposite direction and not only expect, but statistically assume that there are zero differences in the mean and standard deviation of traits, and that apparent differences are due to sampling biases - except when this probability is very (albeit arbitrarily) low.

*

In practice, as we observe, there is never conclusive evidence to reject the 'null hypothesis' that all human populations everywhere are actually one population varying randomly and apparent differences are due to biased sampling - the null hypothesis can always be saved by ever more attention on real or imagined sampling errors - when people really want to save it.

*

And failing to reject the null hypothesis is falsely assumed to be 'proving' no difference - yet it is nothing of the sort. It is merely the default assumption of statistics, which is an arbitrary - indeed non-scientific, assumption. 

*

(Bayesian statistics claims to overcome this problem of frequentist statistics, but I think it leads to other problems and disagreements. In fact, common sense/ built-in human reason is enough to overcome the problem to the extent that it needs ot be overcome. .i.e The common sense that if things seem to be different, it is reasonable to proceed on the assumption they are different, until proven otherwise. This assumption of difference should not automatically be inverted, as it is with Leftism/ political correctness.)

*

5 comments:

Proph said...

My understanding of the assumption of group equivalence/invariance was that it arose from the principle that the closest one can get to conclusively proving X is conclusively disproving not-X; except, statistically speaking, replace "proving/disproving" with "establishing/refuting within a certain arbitrary level of confidence."

When constrained within this assumption, statistics has much to offer -- just as the natural sciences have much to offer when not constrained by intellectual perversions like naturalism or positivism. The problem, as you say, is that what began as an arbitrary assumption (widely acknowledged as such) has morphed into a life of its own: an actual philosophical worldview.

For this reason I increasingly gravitate toward Cohen's view (who once described null-hypothesis significance testing as "Significant Hypothesis Inference Testing" -- the initialism was intentional). Modern statistics should be more concerned with confidence intervals of effect sizes than arbitrary (and therefore meaningless) probability values.

Bruce Charlton said...

@Proph - confidence intervals are just the same assumptions differently expressed.

No, I think the solution is much more radical - which is to acknowledge that human judgement underlies all science (and all statistical validity), which means that science can only be done by competent, honest and truth seeking people who trust one another (and who have earned that trust).

Proph said...

Confidence intervals for effect sizes don't *necessarily* rely on significance testing; properly used, one could simply say "we observed an apparent effect of X on Y controlling for Z of between .18 and .25" or something similar.

You're right that no honest science can be done, though, without a basic level of responsibility, decency, and trustworthiness. And nothing in the professional ethics instruction of most college students equips them with that, nor do their statistics classes generally equip them with the proper method of interpretation of statistical results. This kind of don't-be-a-dick instruction shouldn't even need to be articulated -- the fact that it needs to be (and isn't) is a testament to how far south the modern scientific endeavor is going/has gone.

dearieme said...

The first chap to teach me statitics took the line that all stats was shot through with assumptioms that were usually/essentially untestable. One great merit of the standard procedures, he said, was that two different workers, if equally honest and competent, would get the same results from the same data. A second was that the procedures were indeed standard, so that you could refer to them by a name, rather than having forever to write out lengthy explanations of what you had done.

I find it revealing that many of "Climate Scientists" doctor data before treating them and then hide the data so that others may not re-analyse them. They often abjure standard methods in favour of lash-ups intended, presumably, to return the results they desire.

So in spite of the shortcomings of conventional methods, they do have real, if modest, merits when applied honestly and objectively.

Bruce Charlton said...

Proph - OK - confidence intervals *could* be used like that - with relatively few assumptions except normal distribution.

Some people were advocating this about 20 years ago when I used to teach epidemiology.

But although CIs were widely adopted, in practice they are used almost exactly like t-tests with the conventional (Fisherian) one in twenty significance level

- i.e. looking to see whether the CI's overlap ('non-significant at p=0.5) or not ('significant - p less than 0.5').

I don't suppose ethics can be formally taught nor legalistically imposed - rather they emerge from and are enforced by the groups of scientists working on particular problems. (i.e. Invisible colleges)