Psychology 121, Lecture 11

Bias

by Hal S. Kopeikin, Ph.D. © 2000


Test Bias

A psychometiric definition of bias: Bias is systemic, stable variation in test scores unrelated to the purpose of measurement. Random error is not bias. Bias consistently affects scores but is not what you intended to measure. For example, writing skills might create a bias when it affects scores on tests designed to measure knowledge of psychometrics.  Or,

Sources of bias include: (1) content bias and (2) atmosphere bias. Content bias is the result of test items or materials that result in bias. For example, questions on an IQ test that rely on knowledge about ingredients in Southern cooking would create a content bias.

Atmosphere bias is the result of characteristics of the testing situation which create a bias.

Sometimes bias is also associated with consequences of using a test, rather than properties of the test itself.  Here are two  types of use-related bias:

Use-prediction bias implies that predictions made from test scores are unfair. For example, consider focused example 19-7 p. 558 in your book. Whites get higher scores on the MCAT than some minorities, so using the MCAT for med. school admissions tends to favor Whites. The rationale for using the MCAT is its relation to performance in medical school-- It is s good predictor of this. however, since it does not seem to predict performance in MEDICAL PRACTICE, using the test seems to exclude certain minorities from being MDs without sufficient justification. The claim is that the use of the test is unfair, since it inaccurately predicts that minority applicants will make poorer doctors.

Another example is that in general, SAT scores over predict the performance of Blacks in terms of college GPAs. The regression lines of the two groups are different so when a single regression line incorporating both groups is used, the performance of Blacks in terms of college GPAs is overestimated based on their performance on the SAT. For more on this see figure 19-6, p. 539 and the first paragraphs on p. 540.

Use-Social Consequence bias: Implies that using test may cause or perpetuate social inequities. For example, using standard IQ tests in the assessment of mental ability leads to classifying a disproportionate number of Black children as retarded. Some argue that classification is terribly destructive. (Those so labeled are put into "special" classes that become dead ends for them.)  If labeling is bad, and the test tends to label more Blacks, then using the test is bad for Blacks. Note that if labeling was advantageous, the use of this test would have exactly opposite consequences.

The idea is that the social consequences of using the test are bad. The question is not whether the test is fair or valid, but whether it has bad consequences when used in a certain way. Note that we are far beyond psychometric bias here.  It's not the test per se that is biased, it's the social implications of using it which are problematic.

Is content Bias a problem for minorities on standardized tests?

The BITCH and Chitling tests show that African-Americans know more about urban "Black Culture" than do Whites. There's no reason to believe these  tests measure intelligence. Still, they highlight the the possibility that knowledge sampled on standard IQ, achievement, and/or aptitude tests might be more familiar to some groups than others. If so, content bias is an issue:  Theoretically, content bias would be present if Blacks and Whites with equal underlying intelligence received different scores because of unequal familiarity with test content.

A great deal of evidence, reviewed in your book, suggests this is LESS of a problem than many imagine.

Does SOMPA Correct for Test Bias, or Contribute to it?

SOMPA (System of Multicultural Pluralistic Assessment) uses two sets of norms for interpreting test performance. The pluralistic component assumes that learning potential is the same for all age groups, so IQ scores are normed within the group. Individuals are compared against their own ethnic group. This means that all groups have the same "estimated learning potential."

This approach is highly controversial.  It assumes group differences on the test do not reflect differences in "learning potential."  Essentially, it eliminates group differences by denying they are real.  That's fine if they're illusions, but not if they're real.

Detecting Bias

Determining what is bias vs. the detection of real differences is often complex and controversial.  The controversy about gender issues related to using the SAT (or PSAT) for awarding scholarships exemplifies this.  Consider the article posted at  http://www.psych.ucsb.edu/~kopeikin/121satgenderbias.htm   As it mentions, the average for boys tends to be higher than the average for girls, especially on the Quantitative test.  Since boys do not get higher average grades (in fact, girls do), it would appear the test overpredicts the Quantitative aptitude of boys relative to girls.  However, research on grades suggests they are often strongly influenced by congenial,cooperative behavior in class, favoring girls, so the question becomes whether the SAT or grades (or both) are biased!

Use-consequences of  the CBEST are another fascinating example. That test is used to assure that teachers have basic knowledge of reading, writing, and arthimetic.  The fact that pass rates differ substantially for different racial and ethnic groups means the test eliminates a disproportionate number of aspiring teachers from minority groups (check http://www.psych.ucsb.edu/~kopeikin/121cbest.htm for details).