Psychology 121, Lecture 2
by Hal S. Kopeikin, Ph.D. © 2000
Overview
Today's lecture introduces some major concepts in measurement. Ideas broached
today will be developed, extended, and referred to throughout the quarter.
Psychological tests can generally be divided into measures of maximal
performance or typical performance. Reliability and validity
are central characteristics in judging the usefulness of tests. Norms
provide a context essential to the interpretation of test scores. Statistics
are mathematical techniques used to describe norms and test results.
Today's Lecture: Basic Concepts in Measurement
Two Types of Tests
-
Maximal Performance tests attempt to measure
what a person can do. Instructions for these test include "do your best.
Although all tests measure current performance, these test have different
temporal foci:
-
Achievement Tests attempt to quantify how much as been learned from
an educational experience or environment (past).
-
Ability Tests explore what a person is presently able to do (present).A
-
Aptitude Tests estimate capacity for learning, what performance
might be after education and training (future).
-
Typical Performance tests examine what a person
is like. They usually include instructions to be truthful and objective.
Tests of career interests and personality traits are examples of typical
performance measures.
-
Objective or Structured measures rely on standardized content, administration,
and scoring. Multiple-choice items predominate. The MMPI and SII are examples.
-
Projective tests utilize ambiguous stimuli, assuming the way subjects
structure and interpret stimuli are revealing.
Reliability and validity
All good tests are reliable and valid. But some caveats are needed here:
First, reliability and validity are a matter of degrees. The more
reliable and valid, the better the test. Second, tests are only good relative
to the purpose of the test, the people being tested, and
the circumstances of the test. Thus, the reliability and validity
of a test vary.
Reliability
-
Reliability refers to consistency in measurement. If the thing being
tested doesn't change, then the reliable test will give the same results
repeatedly. To the extent that the measurement is consistent, the test
is reliable.
-
Reliability usually refers to consistency over time. Hence, testing
and re-testing is a way to check for reliability.
-
Sometimes we refer to reliability over situation.
-
Reliability over items or forms tests is another possible perspective
Some test have two or more forms, so reliability across forms can be measured.
Alternatively a single form can be divided into two smaller subtests, and
we check for consistency between the two half-tests. Note, however, that
the subtest are shorter than the test as a whole, which generally reduces
there reliability, so we need to make a statistical adjustment.
-
Reliability is necessary but not sufficient for validity.
Validity
-
The validity of a test tells us how well the test measures what it is supposed
to measure. Your intelligence could be reliably assessed by your height,
but that would not be valid.
-
There are different forms of validity:
-
Content validity: The defined content domain is well sampled.
-
Criterion-Related validity: The test is a good predictor. E.g.,
the SAT is supposed to have a moderately high level of criterion-related
validity because it is a good predictor of achievement in college.
-
Construct validity: Theories are only models of the world. Construct
validity is relevant to assessing the adequacy of a measure intended to
capture a theoretical abstraction.
Norms
-
Because psychological measurement is relative, test scores are usually
interpreted by comparison to the scores of others. Norms
summarize the scores of a defined group, used for the purpose of comparison.
-
Usually, we are interested in a population, which is represented
by a sample. Norms are almost always derived from the scores
of the sample, to represent the scores of the population.
-
Local Norms describe the performance of a relatively small, specific
group which may be particularly relevant for comparison. Many tests have
local norms in addition to broader national or regional norms.
-
There is rarely a single, "appropriate" norm group, but some are clearly
better than others in a given situation. The selection of norms represents
a major source of controversy in testing today.
Statistics Describing Scores and Populations of them
Scales of Measurement and the meaning of numbers (cf. Table 2-1.
p. 32). Measurement scales are numbering systems specifying properties
which determine what the numbers signify. The properties involve concepts
such as magnitude, interval size, and absolute zero . Use and interpretation
of the numbers depend on these underlying properties. For example, while
it makes sense to describe me as twice as tall as my kid, it probably does
not to say I'm twice as worried. Here are four types of numbering system,
ordered from least to most informative.
-
Nominal. These numbers represent categories, nothing more.
-
Ordinal. The magnitude of these is meaningful (higher is more than
lower), but only as an order.
-
Interval. The numeric distance between numbers is meaningful, although
the absolute size of the number is not.
-
Ratio. The order, intervals, and absolute size of numbers are significant.
Frequency Distributions (cf. Figures 2-2, 2-3, 2-4, pp. 35-37)
are graphs showing the incidence of scores in a sample or population.
Scores (the x axis) are divided into equal segments (the class interval).
Histograms are traditionally used for discrete data, whereas frequency
polygons summarize continuous data; this distinction is often somewhat
arbitrary, but reflects whether the x axis indicates groups or amounts.