Psychology 121, Lecture 5
by Hal S. Kopeikin, Ph.D. © 1998
Announcements
-
the midterm is almost upon us. Check the syllabus
for details, and review our discussion of it in your notes from our first
class meeting.
-
the test will be "hard" but not "tricky." Don't worry about computing formulas
we have not discussed in class. For the slope of a line, correlation coefficients,
KR20, etc. you should understand the formulas (what does the number mean?
When should you use it?) rather than memorizing it. DO memorize formulas
I present in class (e.g., mean, z scores). And, learn the proportions
in for a normal distribution in figure 2-7 (rounded to percents is fine).
Validity
-
Validity indicates how well a test measures what it purports to measure.
The meaning of test results depends on the test's validity. Validity tells
you what a test measures and how will.
-
Validity is not absolute. Test are valid to degrees. The question is not,
"Is the test valid?" but, "How valid is it?"
-
Validity depends not only on the test, but on the circumstances
of its use. Test validity can vary tremendously across populations, situations,
and applications. Although validity is often expressed in numbers, no single
number fully captures it. For example, personality tests might be quite
valid when administered by a psychologist with a willing patient. But it
might be very invalid if the person taking the test is interested in tricking
the administrator. Another example: The SAT might be a relatively valid
test for measuring verbal ability but it is not a very valid test for measuring
musical ability.
-
Conceptually, validity can be view as a fraction of the variability in
test scores. There are three possible sources of variation in people's
test scores:
-
Valid variance, due to difference between people on the dimension of interest.
-
Bias, stable differences on factors other than those you are trying to
measure.
-
Random error, unsystematic differences which are unstable and irrelevant.
-
Total variability = Valid variance + Bias variance + Random Error
-
Different forms of validity are more or less relevant, depending on the
test and its use.
-
The forms of validity often overlap and are interrelated.
Types of Validity
Face Validity
-
This is not really a form of validity. Face validity is a measure of how
much the test looks like it measures what it really does measure? What
does the test appear to measure?
-
Doesn't tell us much beyond how test takers are likely to view the test.
May effect motivation and dissimulation (impression management).
1. Content Validity
How well does the test represent a defined universe of information? E.g.,
test of arithmetic with high content validity will have questions on all
of the central aspects of arithmetic and sampling will be proportional
(i.e., will not over-represent on area vs. another).
Content validity is most relevant when test is a sample representing
a well-defined domain. The quality of representation is the key.
Content validity is generally established by logical means, statistical
evidence is secondary. It is established by comparing items and patterns
of answers to a test plan and definition of the underlying domain.
2. Criterion-Related Validity
Concern is how well the test estimates or predicts something else.
The something else is a criterion, operationalized as a criterion measure.
It is important to be able to distinguish a predictor (e.g., like the SAT)
from a criterion, that is, what you are trying to predict (e.g., college
achievement) from a criterion measure, the standard by which you measure
the criterion (e.g., college GPA). Predictive validity involves
estimating a future criterion. Concurrent validity relates to current
status on a criterion.
Key Issues in Criterion-Related Validity
-
Generalizability should be demonstrated through cross validation. One can
never obtain validity information on "the current situation," and the basis
for application of previous results is always a leap of faith. That faith
is better justified by evidence of generalizability, but still must be
tempered by logic. One way to determine the degree of criterion related
validity is to try the test in various situations, in different populations,
etc. The level of criterion-related validity may differ in different situations.
-
Carefully consider the adequacy of criterion measures. There are usually
multiple ways to represent a criterion, rarely is one perfect. The book
thinks GPA is an appropriate measure of college success--is it? Is it the
only one worth considering? That is, some test may be a great predictor
of X but X might not be the thing about which we are really interested
in knowing.
-
Was the sample adequately large? Did it represent the population you are
interested in? What about attrition? Small samples have correlation coefficients
that tend to bounce around a great deal.
-
Consider the impact of range restrictions on predictive accuracy, and differential
accuracy. As range and variability of scores of a test shrink, so does
the possible predictive power of the test. Think about a test that has
close to zero variability in scores --It will have very little predictive
power.
-
Beware: validity coefficients are correlations,
not
squared correlations. There is no rational reason for this; it is just
a historical accident.
3. Construct Validity
What is the nature of the theoretical abstraction, how well is it represented?
This is most relevant for tests measuring theoretical abstractions.
The test and the definition of what is measured are intimately linked.
Note: if the theory is no good, then the test can't be very good either.
The level of construct validity is based on how well the theory works and
how well the test measures the abstraction in question.
Establishing construct validity is an ongoing process which continues
as part of theory development.
Construct validity is established by a pattern of findings, showing
the relationship of what the test measures to other measurements and analyzing
those for fit with the theory.
Convergent and divergent evidence is utilized. Convergent
evidence is demonstration that the test measures what it is supposed to
measure. Divergent evidence shows that a test is unrelated to things that
the theory says it is supposed to be unrelated to. A test should have both
convergent and divergent evidence. For instance, we can expect that IQ
scores will converge with verbal ability, grades in college, etc. And if
IQ scores should not converge with hair color or ring size.
Some contend construct validity is the general case, relevant to all
psychological tests. They maintain that content- and criterion-related
validity are best considered as parts of the pattern necessary to establish
construct validity.
4. How valid is valid enough?
-
A test can never be too valid. But few tests are as valid as we might wish
(of course, our midterms and final are perfect ).
-
Remember that, like reliability, a test's validity varies with the population,
purpose, and conditions under which it is employed.
-
Higher validity is needed when making decisions about individuals than
when selecting groups for reasearch, since the cost and maximum magnitude
of individual errors is often greater with the former
-
Whether a test if "valid enough" often depends on what alternatives are
available to aid decision making. Very imperfect tests may be our
best tools.