Correlation and Regression (This really fits with the material
for Lecture 3, but I often run out of time in that lecture and continue
it here.)
Correlation coefficients summarize the strength and direction
of a linear relationship between variable.
Direction describes whether there is a positive or negative relationship
between the variables. For instance, there might be a positive correlation
between calorie intake and weight. And a negative correlation between the
number of beers you have before a test and your score on the test.
The strength of the correlation describes how well you can predict one
variable knowing the other.
The most common correlation coefficient is the Pearson Product Moment Correlation.
This is often symbolized by an r.
It has a scale of -1 to 1 with the negative sign indicating a negative
correlation.
The absolute value of the number indicates the strength with a 1 being
the strongest correlation.
There are many alternative coefficients, such as Spearman's rho (for rank
order), Phi (for dichotomies), etc. See p.82-83 for more details.
The significance of r (probability of finding that a correlation
that large is due to chance alone). This is a joint function of association
strength and sample size.
Standard Error of Estimate measures residuals. Interpret as the
standard deviation of prediction errors (p.84). Tells you how much unreliability
there is in your estimate.
Coefficient of Determination =r2,
indicating how much of the variability in measure is related to the other
measure. Tells you how much of the variance of one set of scores can be
predicted by known other set of scores.
Cross Validation involves checking a regression equation derived from one
set of scores by applying it to another set of scores. Using prediction
equations, test them with a new population.
Note: correlation does not prove causation: An XY correlation could mean
X causes Y
Y causes X
A third variable affects both X and Y
Something else entirely?
Sometimes multiple predictors are used. For instance college GPA could
be predicted using both high school GPA and SAT scores: GPA= a + b1X1
+ b2X2
Reliability: The Consistency of Measurement
Tests are reliable to the extent measurement is consistent or stable. Unreliability
results from random, unsystematic variation in test scores. Every psych
test has some degree of unreliability.
Classical Conceptualizations of Reliability
X= T+E , where X is Observed Score, T is True Score, and E is Error.
But these terms are misleading. For instance, True Score is really
stable
score not the score that is true. Remember, reliability is not validity.
Better terms would be:
X = stable score + random/unsystematic error
Reliability is the proportion of stable variance in test scores,
Anything that produces instability in scores reduces reliability, including
characteristics of the subject (motivation, mood, health), the test (item
sampling, difficulty, length), and conditions of administration (noise,
crowding).
Common types of reliability include stability over time, forms, items,
or raters.
Forms: Two forms of a test. How reliable are they vis a vis one another?
Raters: How reliable are the raters?
Test-Retest Reliability: Test, wait a specified amount of time and then
retest using the same test. Measured as the squared correlation between
two test administrations.
Parallel Forms Reliability
Measured as the squared correlation between two versions of
a test.
Split-Half Reliability
Divide test in half, score each half, correlate scores, square scores,
use Spearman-Brown formula to correct for shortening the test by halving
it.
Internal Consistency
Represents the consistency of measurement across test items.
Use KR20 for dichotomous items( e.g., True of False), Coefficient
Alpha otherwise. Alpha is a conservative (stringent) estimate of reliability
assumes test is homogeneous (covering highly intercorrelated domains throughout
questions), may be unidimensional (one thing is being measured throughout
questions).
NOTE: Table 4-2, page 118 summarizes types of reliability and how they
are computed. You should know this table.
Beware of "r"
Typically correlations coefficients are referred to as r but reliability
coefficients are often called r too but they represent shared variance
and are often calculated from a squared correlation coefficient.
For example, see page 105. Our book is better than most, often uses R for
reliability coefficient.
Unfortunately, it doesn't always, nor do others, and R often represents
a multiple regression coefficient.
Beware of Difference scores
They are much less reliable than single scores.
The higher the correlation between two scores, the less reliable their
difference. This is a serious practical problem.
Using Reliability Estimates in Score Interpretation
Test scores are more like bands than points. Getting a particular score
(with a decently reliable test) suggests that you are likely to fall within
a certain range of scores if you continue taking the test.
The Standard Error of Measurement expresses how much unreliability to expect
how wide the band is
Sm=Sx * Square root(1-R) (aka SEM) (take the standard
deviation of test scores. Multiply by the square root of (1-reliability).
Conceptually, Sm represents the standard deviation of random
error.
Scores are expected to vary plus or minus one Sm 68% of the
time, within two Sm over 95% of the time.
How Reliable is Enough?
Depends on the purpose of measurement (applied work generally requires
more). For theoretical research, .70 is usually OK.
For professional applications, .90 or more is reassuring (and not typical).
To Improve Reliability, reduce the proportion of random error
Eliminate weak items, add good ones, adjust difficulty, increase administration
time, better standardization, factor analyze see prophecy formula (page
122) for analysis of adding items.
For research, Correction for Attenuation can adjust correlations for unreliability.