Monday, 6 August 2012

How should we measure general intelligence using IQ tests?


General intelligence (g) is a construct used to explain that (in group studies) each and all cognitive abilities are co-correlated - being good at one implies being good at all the others. The hypothesis is that this co-correlation of abilities is due to a single underlying ability of general intelligence or g, with specific abilities (having various levels) on top of g.

Since g cannot be measured directly, IQ is derived from measuring cognitive abilities and putting people into rank order for ability - for instance, measuring one, several or a lot of cognitive abilities in 100 people, marking the test, then putting the 100 people into rank order (best to worst marks) - highest to lowest IQ.

(The validity of IQ testing comes from the fact (and it is a fact) that the rank order on the IQ test is statistically highly significantly correlated with a wide range of outcomes including exam performance, job performance, health and life expectancy.)


So IQ is ultimately a matter of rank order in tests.

The actual IQ score a person gets comes from a statistical manipulation of the rank order data, to make the distribution into a 'normal' or Gaussian curve, and the average score of a 'representative' population into 100 with (usually) a standard deviation of 15.

This is the 'standard curve' of IQ, since it is the standard against which individuals are measured.

The standard curve is constructed such that it describes the proportion of people that would get a particular IQ score - for example, an IQ of 115 is one standard deviation above the average and therefore about 16 percent of the population would have an IQ of 115 or above.


But there are difficulties in generating an IQ score for individual people, and in moving between the rank order data generated in a group study (used to generate the 'standard curve') and the score of an individual person doing an IQ test.


The individual score in an IQ test ought to be measuring a fundamental property of human ability (a property of the brain, roughly speaking).

Yet many or most IQ tests in practice require non-g abilities such as good eyesight, the ability to read, ability to move hands and fingers quickly and accurately; they require concentration (that a person not be distracted by pain or other interferences), many tests require stamina, a degree of motivation and conscientiousness in completing it... and so on.

In other words there are a range of non-g related factors which might reduce the test score for non-g reasons.

This means that the most valid measurement of intelligence is the highest measure of intelligence in a person.

So the best way to measure intelligence is for a person to do a series of IQ tests on different occasions and to take the highest score as the true-est score.


BUT this must also apply to the standard curve used to generate the IQ score.

The standard curve must be constructed from the highest IQ score of (say) 100 randomly chosen people - and these highest scores put into rank order and made into a normally distributed curve with the correct properties.


Yet this is not what happens.

The standard curve is typically generated using a one-off test on the representative sample, but the individual IQ is derived from the best performance in an IQ test - this systematically biases individual IQ scores towards being higher than they really are.


Of course, there are great logistical difficulties in using multiple tests (on several occasions) and best performances to generate a standard curve - much easier to get a representative group together just once for testing.

But this emphasizes the imprecision of individual measures of IQ.

If an individual gets their IQ score from a single test, it is likely to underestimate their real g, if the test is done in a way or at a time when their performance is impaired.

Yet if the individual has several tries at IQ test on different occasions, in order that their best possible level of performance be used to generate their real, underlying g, then this will overestimate their IQ.

(Doing several tests and taking an average does not work, because the bad performances drag-down the average.)


So, in practice and as things are - I do not feel that individual, one-off personal IQ measurements can be regarded as precise.

Probably individual IQ should be banded into roughly half-standard deviations.

Something like average as 96-104, above average as 105-114, high as 115-124, (above this 'g' begins to break-down as the component tests lose co-correlation) very high as 125-140, and above that we have the super-high and strange world of potential geniuses.

(Below average would probably be a mirror of this - but the meaning of low IQ is a bit more variable, and the levels may be very low.)

But IQ differences between individuals of less than half an SD (less than about 7 or 8) are uninterpretable - even around the average.