Psychology 121, Lecture 8

by Hal S. Kopeikin, Ph.D. © 1998


Overview


Judging tests as aides in decision-making

Since the purpose of testing is usually to help make decisions, one logical way of evaluating a test is determining how much better the decisions made with it are vs. those made without it. Typically, this involves dividing test scores into predicted 'success' or 'failure,' and categorizing the behavior one is trying to predict in terms of 'success' or 'failure.' You should be aware that the terms 'success' and 'failure' are a bit arbitrary. For example, when one is predicting suicide a 'success' might be an actual suicide although this is not a good thing. A finer classification of hits and misses and our predictions is as follows:
 
Positive predictions imply someone will succeed, has a characteristic, fits in a group, etc.

Negative predictions imply (s)he does not

  • True positives: Hits where success is predicted and in fact occurs.
  • True negatives: Hits where failure is predicted and in fact occurs.
  • False positives: Misses where success is predicted but where failure actually occurs.
  • False negatives: Misses where failure is predicted but where success actually occurs.
  • Notice that 'true' and 'false' refer to hits and misses while 'positive' and 'negative' refer to successes and failures.

    Ratios expressing Decision Making Accuracy

    We can use different ratios to express the accuracy of a particular test. There are different ratios which emphasize particular facets of accuracy.
     
    Total Hit Rate= (True Positive + True Negative)/ All results (True Positive + True Negatives + False Positives + False Negatives)
     
    This is a global measure of the decision making accuracy of the test. It counts all errors and all correct decisions equally.
    Positive Hit Rate= True Positives/ (True Positives + False Positives)
     
    This answers the question of 'how accurate are positive predictions?' It is a good measure when false positives are particularly worrisome. For instance, you really don`t want a child molester to falsely pass your test for becoming a baby-sitter.
    Negative Hit Rate= True Negatives/(True Negatives + False Negatives)
     
    This answers the question, 'How accurate are negative predictions?' In this case, the rate will go down with the number of false negatives. This measure might be very important for a test which measures likelihood of suicide. In this case, you do not want to get any false negatives which wrongly predict that the person will not commit suicide.
    Sensitivity= True Positives/ (True Positives + False Negatives)
     
    This ratio answers the question, 'How good is the test at categorizing those who actually succeed?' In this and the next ratio, you start with the results and then look back to see how many the test got right. In this case you examine positives (" successes").
    Specificity= True Negatives/ ( True Negatives +False Positives)
     
    This ratio answers the question, 'How good is the test at categorizing those who actually fail?' In this case you examine negatives ("failures").

    Data Illustrating Concurrent Validity of Depression Measures As Functions of Cutting Scores

    from Rapp SR; Parisi SA; Walsh DA; Wallace CE. Detecting depression in elderly medical inpatients. Journal of Consulting and Clinical Psychology, 1988 Aug, 56(4):509-13.
    Physician Detection of Depression,
    RAW NUMBERS
                         CRITERION MEASURE
                    Depressed     Not Depressed             TOTAL
    Estimate Depressed   2               6                   8
               Not Dep. 21             121                 142
                TOTAL   23             127                 150
    PERCENTAGES
                        CRITERION MEASURE
                    Depressed     Not Depressed     TOTAL
    Estimated    DEP. 1%              4%                  5%
            Not Dep. 14%             81%                 95%
            TOTAL    15%             85%                100%
    
    
    
    
    
    
    
    
    
    DEFINITION OF TERMS     PREDICTOR     CRITERION
    -------------------     ---------     ---------
    TRUE POSITIVE             YES            YES
    FALSE POSITIVE            YES             NO
    TRUE NEGATIVE              NO             NO
    FALSE NEGATIVE             NO            YES

    Effects of Cutting Scores

    You should be aware that some cutting scores will be optimal for some rates but not for other rates. There are very few cases in which all the rates are optimized by with the same cutting score. Adjusting cutting scores will reduce some errors while raising others. For example raising the cutting score will usually reduce sensitivity while raising specificity.
    Beck Depression Inventory as an estimate of Depression
                Sensitivity
                       Specificity
                            Positive Hit Rate
                                Negative Hit Rate

    Cutting Score
                8 100% 50% 26% 100%

                9  91% 60% 29% 97%

               10  83% 65% 30% 95%

               11  78% 72% 33% 95%

               12  74% 76% 36% 94%

               13  70% 82% 41% 94%

               14  70% 83% 43% 94%

               15  70% 84% 44% 94%

               16  70% 87% 49% 94%

               17  65% 90% 54% 93%

               18  65% 90% 54% 93%
     

    Base Rates and Criterion-Related validity

    The base rate is the frequency of a behavior in a particular population. The base rate of a characteristic indicates its frequency in a particular population. For instance, if 90% of the class gets passing grades then that would be the base rate for the class. And if 1/125 of Americans die in car accidents, then that would be the base rate of that behavior.

    Base rates can be used to make predictions and when the behavior is very common or very rare the total hit rate of predictions based on the base rate can be better than many psychology tests. Nevertheless, psychology tests are still used to predict very common and very rare behavior because they make different kinds of errors than base rates. You could develop a test which has a better false negative rate than a prediction based on base rates. This test would be useful if you were interested in catching all possible suicides.(Tests tend to over predict the rare and under predict the usual.) Test-based predictions are likely to be best overall when base rates approach 0.5.


    Test Administration

    Administration is Standardized

    Examiners Effects and Examiner-Subject Interactions

    Race

    Rapport

    Expectancy Effects

    Response Incentives and Reinforcements

    Subject Variables: Anxiety, illness, hormones


    Interviews

    General Characteristics of Interviews

    Structure.

    Rapport

    Interviews are typically more disclosing, honest, and responsive when their relationship with the interviewer is warm and comfortable. Interviewers will therefore make attempts to be overtly pleasant, respectful, interested, and nonjudgemental. They will facilitate communication with appropriate responses and nonverbal behavior. The major exception to this the stress interview, where examiners intentionally create a challenging or threatening interpersonal context to assess the interviewee's reactions to such situations.

    Interactive

    Interviews are interaction between two people, effected by both. Interview outcomes thus depend on characteristics of both participants and their interaction. Influence is reciprocal. Interviews are adaptive, so responses determine the direction of subsequent explorations. Until recently only interviews had this interactive component. Now computers can also tailor questions as you go along.