.

 

   

 

Home | Links | About the Trowbridge Foundation | Newsletter | Forensic Cases
Noteworthy Forensic Conferences | Abstracts of Available Articles & Presentations

Malingering Exposed

By Brett Trowbridge, PhD, JD and Philip Frank, PhD

This article supported by The Trowbridge Foundation

Presented at the European Society of Psychology and Law, Sept 2002, Belgium
and published in their Report of Proceedings

 

The issue of malingering, or "faking," of mental symptoms comes up in a number of legal contexts, including, but not limited to, criminal cases involving competency, insanity, or diminished capacity; personal injury cases; workman’s compensation claims; and Social Security Disability claims.  A dictionary definition of “malinger” is, “To pretend illness, especially in order to shirk one’s duty or to avoid work, etc.”[1]Thus the lay connotation of the word has come to suggest a moral judgment that the malingerer is “bad”. Continuing this “bad” connotation in, The Diagnostic and Statistical Manual of Mental Disorders, fourth edition (DSM IV), used by psychologists and psychiatrists to make diagnoses, malingering (V65.2) is defined as "the intentional production of false or grossly exaggerated physical or psychological problems, motivated by external incentives such as avoiding military duty, avoiding work, obtaining financial compensation, evading criminal prosecution, or obtaining drugs.[2] 

Malingering is not an inconsequential problem.  In a widely cited 1994 survey, 320 forensic psychologists estimated that malingering occurs in 15.7% of all their forensic cases.[3] In an article reviewing the available literature as to feigned neuropsychological impairment, practitioners subjectively estimated that as many as half of those being evaluated may be faking all or part of their cognitive disabilities.[4] In a study of criminal defendants judged incompetent to proceed with trial at an inpatient forensic hospital a 22% prevalence of malingering was reported.[5] In another study 20.8% of those being evaluated for insanity were suspected of malingering or were definitely malingering.[6] One author estimated that 32% of the persons he evaluated for mild head injury were malingering.[7] Rogers and Cruise[8]suggested that the prevalence of malingering varies widely across forensic settings, but is likely to occur in approximately one-sixth of all forensic cases.  Unfortunately, these are probably underestimates of the actual prevalence of malingering, because they do not include individuals who successfully feigned mental illness.

A meta-analysis reviewed 32 published studies on the relationship between financial compensation and the experience and treatment of chronic pain, and concluded that compensation is related to increased reports of pain and decreased treatment efficiency.[9]

This issue is complicated by the fact that most malingering is of the "hybrid" type.  For example, a plaintiff injured in an automobile accident often has true psychological symptoms.  Exaggeration of these symptoms is different from the "pure" malingering of the psychiatrically "disabled veteran" who never actually served in combat but claims battle symptoms.  In cases where there is actual psychological trauma, it will be even more important to use instruments that are able to separate real from feigned or exaggerated symptoms.

One problem we have in detecting malingering is that malingering is not dichotomous, as in either "present" or "absent".  There is no reason to believe that if malingering is present in one instance, that then all information obtained from the person is inaccurate, because malingering probably cannot be conceptualized as a trait.  People often perform at their level of ability on some measures while malingering on others.  In some cases of documented brain dysfunction patients have performed consistent with malingering on some parts of the examination.[10] Thus, valid performances on some measures do not rule out malingering, and malingered performances on some measures do not rule out valid performances on others, and do not rule out genuine disability.

Definitions

The term “malingering” presents us with numerous definitional problems.  There are varying degrees of truthfulness and deception, and people can be motivated to distort their response in various directions depending on the purpose of the forensic interview, and may do so consciously and deliberately, or may act out of unconscious motivations.

Obviously there are self-reports that are primarily valid and honest, and there are those that are primarily deceptive.  There are also inconsistent response sets, in which the interviewee gives conflicting reports.  Some persons may provide invalid or untrue information which they actually believe.  Some may make distortions in reporting in order to maintain a good impression or to defend themselves psychologically.

The general term which subsumes all forms of deliberate distortion or misrepresentation of psychological symptoms is “dissimulation”.[11] Dissimulation encompasses all forms of deliberately inaccurate deceptive and inconsistent response styles, including malingering, defensiveness, and inconsistent reporting.  Malingering is distinguished from factitious disorder, in which the goal of the individual is to assume the role of patient.  In malingering the production of symptoms stems from beyond the patient role and is aimed at the acquisition of some secondary gain.  In malingering the individual is motivated to reach some goal, such as money, or the avoidance of punishment. 

Also subsumed under dissimulation is defensiveness, which is the opposite of malingering, as it involves a deliberate effort to deny or minimize physical or psychological symptoms.  It is certainly not uncommon for people to deny symptoms during a forensic interview.  Oftentimes this denial is on an unconscious bias, as when a schizophrenic maintains there is nothing wrong with him.  This paper does not concern itself with defensiveness, a response set often seen in parenting evaluations, evaluations to determine suitability for employment, etc.

Inconsistent reporting can be distinguished from both honest reporting and dissimulation.                           

Figure 1

In an honest reporting response set the individual is making a good effort to respond truthfully and consistently.  In a malingering response set the individual is making a high effort to appear to have problems or symptoms that are not actually present.  In an inconsistent response set the individual is making little or no effort to respond either truthfully or untruthfully.  Subsumed under inconsistent response sets are careless responding (low effort to respond truthfully) and random or irrelevant responding (low effort to respond untruthfully).  Inconsistent response sets are sometimes not under conscious control, as in psychosis or extreme cognitive problems which prevent a consistent response set.  Obviously in these cases the response set is the result of bona fide symptomatology.  Inconsistent response sets may also occur because of special problems such as poor vision, inability to read, etc.  Since inconsistent response styles are inherently “unreliable” by definition, it is axiomatic that conclusions based on data generated from inconsistent responding cannot be “valid”.  As will be discussed later in this paper, Richard Frederick has developed measures of consistency in his Validity Indicator Profile (VIP), as he conceives of two independent factors: low effort versus high effort, and effort towards valid truthful responses versus effort towards untruthful responses, as schematized in Figure 1.  Thus, his VIP distinguishes between four possibilities: honest (“compliant”) responding, careless responding, random responding, and malingering.

Figure 2 sets out the DSM-IV definition of Malingering.  No actual diagnostic criteria are provided.  Instead DSM-IV states that malingering should be “suspected” in a medicolegal context, where there is a marked discrepancy between the person’s claimed stress and objective findings, when there is lack of cooperation with evaluation or treatment, and when there is presence of an antisocial personality disorder.  It immediately becomes clear that this set of “criteria” for “suspecting” malingering is overbroad.  Whereas many antisocial personalities malinger, a larger number do not in our experience.  Indeed, the best available data suggest that most forensic subjects do not malinger.[12] There is no evidence that antisocial personalities are more effective malingerers.[13] Substance abusers frequently do not co-operate with treatment, so when should those persons be considered to be malingering?  And what are the “objective” findings that we are to match with the person’s “stress of disability”?  In attempting to apply the DSM-IV definition one immediately notices that the definition is essentially unhelpful.  Furthermore, use of the DSM-IV criteria may result in a significant number of false-positive diagnoses.[14]

Figure 2 – DSM-IV

DSM-IV Criteria for Malingering

V65.2 Malingering

       The essential feature of Malingering is the intentional production of false or grossly exaggerated physical or psychological symptoms, motivated by external incentives such as avoiding military duty, avoiding work, obtaining financial compensation, evading criminal prosecution, or obtaining drugs.  Under some circumstances, Malingering may represent adaptive behavior – for example, feigning illness while a captive of the enemy during wartime.

       Malingering should be strongly suspected if any combination of the following is noted:

1. Medicolegal context of presentation (e.g., the person is referred by an attorney to the clinician for examination)

2. Marked discrepancy between the person’s claimed stress of disability and the objective findings

3. Lack of cooperation during the diagnostic evaluation and in complying with prescribed treatment regimen

4. The presence of Antisocial Personality Disorder

Malingering differs from Factitious Disorder in the motivation for the symptom production in Malingering is an external incentive, whereas in Factitious Disorder external incentives are absent.  Evidence of an intrapsychic need to maintain the sick role suggests Factitious Disorder.  Malingering is differentiated from Conversion Disorder and other Somatoform Disorders by the intentional production of symptoms and by the obvious, external incentives associated with it. In Malingering (in contrast to Conversion Disorder), symptom relief is not often obtained by suggestion or hypnosis. 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Problems with using the DSM-IV criteria are even more pronounced when dealing with children or adolescents, since DSM-IV does not permit us to diagnose “antisocial personality disorder” until a person is 18 years old.  Thus, children and adolescents are excluded by age alone from one of the four factors listed. It is true that antisocial personality disorder itself is defined in part, by the presence of conduct disorder before the age of fifteen, and DSM-IV has drawn a tenuous connection between conduct disorder and antisocial personality disorder.  However, as DSM-IV notes, “in a majority of individuals the disorder remits by adulthood.”  DSM-IV’s definition of malingering seems to be even less useful for adolescents and children.[15]

Figure 3 – Rogers’ Model[16]

Rogers’ Proposed Model for the Classification of Malingering

I.                    A pattern of self-reported symptoms which would include at least one of the following:

A.      Endorsement of an unusually high number of rare symptoms (i.e., symptoms which are very infrequent in bona fide patients).

B.     Endorsements of an unusually high number of blatant symptoms (i.e., symptoms which are immediately recognizable by nonprofessionals as indicative of severe psychopathology).  It is often useful to ask about symptoms which are not obvious signs of mental illness (e.g., early morning awakening) for the purposes of comparison.

C.     Nonselective endorsement of symptoms which appear to be improbable based on the sheer number.

D.     Endorsement of absurd and preposterous symptoms.  This criterion should be applied only to individuals who appear coherent and relevant in their speech, since some grossly psychotic patients may also endorse absurd responses.

II.                  Corroboration of dissimulation through one or more of the following:

A.      Collateral interviews which suggest that the individuals self-report is strongly indicative of feigning (e.g., family provides evidence of relatively good adjustment in contrast to self-described “gross impairment”).

B.     Pronounced differences between reported prior episodes and their clinical documentation.  Differences should be dramatic and strongly suggestive of feigning (e.g., claims of multiple suicide attempts requiring medical interventions while hospitalized, when there is no evidence in the clinical record of any suicidal ideation or gestures).

C.     Unequivocal evidence of feigning on standardized measures such as the MMPI and the SIRS.

III.                Evidence based on self-report or collateral interviews that the individual’s motivation for feigning was not exclusively a desire to be a patient or an attention-getting device in a borderline patient.

 

 

improbable symptoms.  These include endorsement of an unusually high number of rare symptoms, endorsement of a high number of blatant symptoms, endorsement of a high number of symptoms overall, and endorsement of absurd and preposterous symptoms.  We will find that all of these strategies for identifying malingerers work well clinically, and that these same strategies can be used to develop tests to identify malingerers.  Rogers’ second step is corroboration through reading records, talking to relatives, etc., which is always important in any case in which malingering is expected. 

We know that clinical evidence by itself is usually less reliable and valid than evidence from scientifically derived instruments.  A variety of studies have shown clinicians to be poor at detecting malingering.[17] A number of studies have shown that neuropsychogists are essentially ineffective in identifying child malingerers.[18] Similarly, without specific malingering tests neuropsycholgoists have trouble identifying adolescent malingering.[19] Those analyzing scores on conventional neuropsycholgical tests of memory such as the Wechsler Memory Scale-Revised (WMS-R) have not been able to demonstrate an ability to distinguish between true “amnestic disorder” (DSM IV 294.0), or severe mental disorder caused by brain trauma and disease, from simulated or feigned amnesia.[20] However, neuropsychologists blindly evaluating the results of standard neuropsychological tests were able to identify relatively unsophisticated malingerers when MMPI’s were available.[21]

Figure 4

     

Heilbrun’s (1992) Guidelines for Test Selection in Forensic Evaluations[22]

  • Commercial Availability – accompanied by technical manual and peer reviewed.
  • Reliability – recommend .80 or higher or explicit justification.
  • Relevant Test – relevance to legal issue and appropriate validation research.
  • Standard Administration Procedures
  • Applicability to population being tested.
  • Actuarial Data – objective tests preferred.
  • Response Style Evaluation – some method for determining validity of test results.
         
               

If we are interested in developing tests to use in forensic psychology that will be more useful than (or supplemental to) our clinical judgment, what attributes do we want our test to have?  Kurt Heilbrun’s criteria are set forth in figure 4.  The test must be commercially available, have a manual, and be peer reviewed.  It must be reliable, relevant, and validated through appropriate research.  Administration must be standard.   There should be some way of detecting dissimulation, or “response set”.  Interpretation should be done through a pre-specified quantitative analysis. 

Rogers established three criteria for the validation of detection strategies of malingering.  The first two criteria involved the convergence of findings between (1) research designs and (2) methods of assessment (e.g. structured interviews and multiscale inventories).  The third criterion is that detection strategies must be cross-validated on clinically diverse samples.[23]At the present time most of our detection strategies do not fulfill at least the third of Rogers’criteria.

Methological Considerations in Test Validation Research

The scientific study of malingering is in its infancy.  It is only in the last ten to fifteen years that much serious research has been done on the problem.  Malingering is a difficult topic to investigate, because in order to do a scientific study one must compare malingerers with non-malingerers on some test or procedure to see if the test or procedure can distinguish between the two groups with an acceptable accuracy rate.  This same conundrum usually exists in science – in order to study something you have to define it, and to know something about it.   Here, we need to be able to come up with a group of malingerers so that we can develop a test designed to detect malingerers!

It has been empirically demonstrated that clinical methods of detecting conditions are usually not as accurate as formal standardized testing procedures using actuarial decision methods.[24] However, formal standardized testing procedures usually have to be developed from clinical notions, and that is what has happened in the area of malingering.  Based on clinical experience researchers hypothesized that compared to non-malingerers, malingerers would provide more inconsistent responses, would endorse overall a higher number of symptoms, would endorse rare or blatant symptoms, would endorse rare combinations of symptoms, or would make errors not frequently made even by seriously disabled people.  

One research paradigm (called the “known-groups” design) is to have clinicians choose a group of people from their practices who they think are malingering, and to choose another group that they believe is made up of those who are not malingering, and to compare them on a proposed test or measure.  However, we already know that many clinicians do a relatively poor job of distinguishing between these groups – indeed that is why we are trying to develop a test in the first place.

Another popular approach is to instruct one group of subjects (often college students) to attempt to malinger psychopathology, brain damage, or some other condition.  This approach is relatively easy - much easier than attempting to locate a group of true malingerers.  However, there is no way of knowing for sure whether these “simulated” malingerers are actually behaving as real malingerers would in a forensic situation.  Indeed, it is not clear to what extent these “simulated” malingerers even follow the instructions[25], since there is usually little or no financial incentive for them to do so.  Some experiments have provided small amounts of money as rewards for successful malingering, but it is unclear whether these small rewards bear much relationship to the much larger incentives true malingerers often have, such as winning their own freedom or receiving a lifetime pension.  Many experts believe that “simulated” malingering research designs are appropriate for initial research, but are not sufficient as validation experiments for psychological tests to be used in court.[26] These types of “simulated” studies do not resolve the issue of whether people who are thought to be following instructions to fake respond the same as people who are not following instructions to give an optimal performance given to them during forensic testing.

Obviously, a better approach would be to come up with a pool of “true” or known malingerers, but it is clear that this is fairly difficult to do.  A few malingerers are so unsophisticated that they are easily detected.  For example, a person who responds non-sensically to all questions, denies knowing his own name, and never utters a coherent sentence, may be “caught’ writing a sophisticated letter or playing a game of chess.  Most people would agree that this person is malingering, and formal testing would probably not be needed to make that determination.  A person who presents as catatonic or severely brain damaged might be videotaped using computers, driving, and carrying on animated conversations.  If formal tests were given to these people they would do very poorly on virtually all of the items, so it is not difficult to distinguish these malingerers from non-malingerers on virtually any test.  The point here is that these “stupid”, or unsophisticated, or easily caught malingerers may not be very much like the more sophisticated malingerers that we are really setting out to try to detect.  Since we can never know for sure exactly which people are malingerers and which are not, the only “known” malingerers we have are those who have been caught.   We run the risk of developing instruments that will only detect unsophisticated malingerers.  These crude tests may be acceptable as screening instruments useful in identifying obvious malingerers, but may not be sufficiently powerful to help us detect many sophisticated malingerers. 

Some studies involve groups of people claiming brain dysfunction in litigation with groups of people claiming brain dysfunction who are not in litigation, under the assumption those who are litigation are malingering, and those who are not in litigation are not malingering.  This approach is called the “differential prevalence” design. Clearly these groups may be “impure”, in the sense that some grouped as malingerers may not be malingering, and some grouped as non-malingerers may indeed be malingering.  The problem is that there is no “gold standard” for malingering, so any “known-group” comparisons with a non-malingering group may be impure, thus limiting the utility of any test validated with those research designs.  Another approach is to have professionals such as psychologists and psychiatrists, “malinger” for the so-called “known group” condition.  However, there is no reason to believe that malingering professionals will appear the same as true malingerers in the real world. 

Significance of Base-rates

Determining the validity of any assessment procedure involves examining sensitivity, specificity, and false-positive and false-negative error rates.  Sensitivity refers to the correct detection of the condition by the test or procedure, while specificity relates to accurately determining those persons without the condition.  No test or diagnostic procedure is ever perfect, so there is always an error rate, which will be of one of two types.  The first, often called Type I error, is the false positive error, which is when the procedure shows that the person has the condition when in actuality he does not.  The second, often called Type II error, is the false negative error, which is when the condition is present, but the procedure does not identify it.  In terms of malingering research, Type I error would be calling people who are not malingering malingerers, and Type II errors would be identifying people as non-malingerers who are truly malingering.

Base rates for any condition are based on a ratio[27]:

Number of cases with the condition


Number of cases in the population (N)

As an example, for a condition that exists for 50 people out of every 1,000, the base rate for that condition is 50/1000, or five percent.

Base rates are extremely important in detection of conditions, and thus directly applicable to psychological evaluations, although clinicians often over-look them.  Psychologists validate their psychological tests and procedures by doing research.  A typical validation experiment will have a number of individuals with the condition, and often an equal number of controls, or people who do not have the condition. The test or procedure is given to both groups.  A two-by-two contingency table can then be created to evaluate the test’s performance.

 

                     Indication of the Test

     

              +

    -

True Condition

   

"Hit"
Sensitivity
true positive

 
"Miss"
Type II Error
false negative

No Condition

   

"False Alarm"
Type I Error
false positive
 

"Correct Reject"
Specificity
true negative

          All individuals involved in the validation study are placed in the appropriate category based on determining whether that case was a true positive (TP), false positive (FP), false negative (FN) or true negative (TN).

The base rate is calculated as:

TP+FN


N

The sensitivity, or true positive rate is calculated as:

TP


TP+FN

The specificity, or true negative rate is calculated as:

TN


FP+TN

Positive predictive power (PPP) is calculated as:

TP


TP+FP

Negative predictive power (NPP) is calculated as:

TN


FN+TN

Overall diagnostic power is calculated as:

TP+TN


N

Notice that these formulas can only be applied if the frequency count for each of the four cells is available, as is the case after doing a validation experiment.  A forensic psychologist attempting to assess someone to determine if that individual has a condition does not have all the information available to him to fill in the four cells. One thing he does not know is the base-rate of the condition among the people he sees in his practice.  If the psychologist has validation research available to him he knows how accurately the test identified the condition under the experimental situation’s base-rate, which is usually at or close to 50%.  However, to determine the positive predictive power (PPP) and the negative predictive power (NPP), an estimate must be made of the condition’s base-rate among the people the psychologist sees, and that base rate may not be close to the 50-50 situation typically used for validation research.

Test manuals usually publish sensitivity and specificity information based on the validation studies, and sensitivity and specificity do not vary according to the base-rates of the condition in the group of people the psychologist sees.  Sensitivity is knowing that the person is malingering, the chances the test will pick it up, and specificity is knowing that the person is not malingering, the chances the test will show it to not be present.  Notice that these statistics are not useful in an evaluation situation.  The useful statistics in an evaluation situation, positive predictive power (PPP) and negative predictive power (NPP) do vary greatly depending upon the base-rate of the condition in the evaluator’s practice.

Figure 5

Practical Interpretation of Diagnostic Efficiency Statistics[28]

Statistic

Meaning of statistic

Base rate/prevalence

Percentage of the population that has the disorder/condition

Sensitivity

Knowing the person has the condition/disorder; the likelihood the test will pick it up

Specificity

Knowing the person does not have the condition/disorder; the likelihood the test will show it to not be present

Positive predictive power

Knowing the test is positive; the likelihood the person actually has the condition/disorder

Negative predictive power

Knowing the test is negative; the likelihood the person actually does not have the condition/disorder

Overall diagnostic power

Percentage of correct classifications by the test

Figure 5 explains in practical terms what these statistics mean.  Notice that sensitivity data will only be available when the psychologist knows for sure whether or not the person has the condition, which is usually not true in a typical situation for a psychologist doing an evaluation.  Thus sensitivity and specificity are really only relevant in the experimental situation, where it is already known whether each individual has the condition or disorder.  In an evaluation situation positive predictive power and negative predictive power will be the most relevant, because all that is known in the evaluation situation is whether the test or procedure shows positive or negative. 

To illustrate the problems associated with base-rates, let us consider three examples of a hypothetical test used to try to detect malingering.  In one example the base-rate of malingerers is 50%, in the second 10%, and in the third 90%.  In each of the examples N is 1000, and using established cut-off scores the hypothetical test to detect malingering is 90% accurate.  In each example the specificity is .9 and the sensitivity is .9, regardless of the base-rate.  In the 50% base-rate situation the PPP and the NPP are also both .9.  However, in the 10% base-rate situation NPP is very good (.921), but in that 10% base-rate situation PPP is only .5, which means that if the psychologist knows the test shows positive, the chances the individual truly is malingering is only .5 or 50%!  Conversely, in the high base rate situation (90%) PPP is very good at .921, but the NPP is only .5, so if the psychologist knows the test shows negative the chances the person is truly not malingering is only 50%!  Thus, in high base rate situations NPP is relatively low, and in low base rate situations PPP is relatively low.

Notice that the equations provided earlier to determine PPP and NPP will only work in the experimental situation, in which the number of true positives (TP), false positives (FP), false negatives (FN) and true negatives (TN) are known, which is not true in a normal evaluation situation. 

In a normal evaluation situation all that is known is the sensitivity and specificity numbers from the published research, but the prevalence (base-rate) of the condition among those seen in the evaluator’s practice is not known and must be estimated.  The psychologist obviously needs to have at least some basis for making this estimate.  Positive predictive power (PPP) and negative predictive power (NPP) can then be calculated as follows:

Positive Predictive Power (PPP) = 

Prevalence X Sensitivity


PrevalenceXSensitivity + (1- Prevalence) X(1-Specificity)

           

Negative Predictive Power (NPP) =

Specificity X (1- Prevalence)


SpecificityX(1- Prevalence)+PrevalenceX(1- Sensitivity)[29]

Notice that in each of the three examples PPP and NPP have been calculated using these formuli as well, and they come out the same as when the original equations were used.

We see then that it makes dramatic difference in the usefulness of the test whether the evaluator’s practice consists of mostly malingerers or mostly non-malingerers.  If the evaluator’s practice has 90% malingerers the test will correctly identify 92.1% of true malingerers, but will only identify 50% of non-malingerers!  Conversely, if the evaluator’s practice has only 10% malingerers, the test will correctly identify 92.1% of the non-malingerers, but only 50% of the malingerers.

To test a test’s true “efficiency” these PPP and NPP values must be compared with what results could be obtained by chance without using any test or procedure whatsoever.  In the 90% base-rate condition the best strategy would be to say that all people being evaluated are malingering, as that call would be correct 90% of the time.  Conversely, in the 10% base-rate condition the best strategy would be to say that none of the people being evaluated are malingering, and that call would also be correct 90% of the time.  Notice that in our examples the PPP was .921 for the 90% base rate example, or slightly better than would be expected by chance.  Similarly, in the 10% base rate example the NPP was .921, or slightly better than would be expected by chance.  In the high base rate example the NPP that could be established by chance would be only 10% (the chances the person would not be malingering as determined by chance alone); but the NPP using the test is 50%, considerably better than would be expected by chance alone.  Conversely, in the low base rate example the PPP that could be established by chance alone would be only 10% (the chances the person is malingering as determined by chance alone), but the PPP using the test is 50%, considerably higher.

We see, then, that in all three examples both the PPP and NPP are “effective”, in the sense that the test is more “effective” than what could be achieved by using chance alone.  However, as the base-rates increase or decrease, or as the test accuracy varies, this will not always be true.  Consider an example offered by Faust and Nurcombe[30].  

Assume N= 1000         Test cut-off is 90% accurate

Example One

500 Malingerers (+)

500 Non-Malingerers (-)

(TP) True Positive = 450 – Number of those identified as malingerers that truly are malingerers

(FP) False Positive = 50 – Number of those identified as malingerers that are truly not malingerers

(TN) True Negative = 450 – Number of those identified as non-malingerers that are truly non-malingerers

(FN) False Negative = 50 – Number of those identified as non-malingerers that truly are malingerers

Prevalence = TP+FN/N = (450+50)/1000 = .5

Sensitivity – Knowing the person is a malingerer, chances the test will pick it up

                   TP/(TP+FN) = 450/(450+50) = .9

Specificity – Knowing the person is not a malingerer, chances the test will not show malingering

                   TN/(FP+TN) = 450/(450+50) = .9

(PPP) Positive Predictive Power – Knowing the test shows malingering, the chances the person is actually a malingerer

                   TP/(TP+FP) = 450/(450+50) = .9

(NPP) Negative Predictive Power – Knowing the test shows the absence of malingering, the chances the person is actually not malingering

                   TN/(FN+TN) = 450/ (50+450) = .9

Overall Diagnostic Power = (TP+TN)/N = (450+450)/1000 = .9

Positive Predictive Power (PPP) =

Prevalence x Sensitivity/ (Prevalence xSensitivity +(1-Prevalence) x (1-Specificity))

= .5 x.9/ (.5x.9)+ (.5x.1) = .45/ (.45+.05) = .9

Negative Predictive Power (NPP) =

Specificity x(1-Prevalence)/Specificity x(1-Prevalence)+Prevalence x(1- Sensitivity)

= .9 x .5/(.9 x .5) + (.5 x .1) = .45/ .45 + .05 = .9

 

Assume N=1000 Test cut-off is 90% accurate

Example Two

100 Malingerers (+)

900 Non-Malingerers (-)

(TP) True Positive = 90 – Number of those identified as malingerers that truly are malingerers

(FP) False Positive = 90 – Number of those identified as malingerers that are truly not malingerers

(TN) True Negative = 810 – Number of those identified as non-malingerers that are truly non-malingerers

(FN) False Negative = 10 –Number of those identified as non-malingerers that truly are malingerers

Prevalence = TP+FN/N = (90+10)/1000 = .1

Sensitivity – Knowing the person is a malingerer, chances the test will pick it up

                   TP/(TP+FN) = 90/(90+10) = .9

Specificity – Knowing the person is not a malingerer, chances the test will not show malingering

                    TN/(FP+TN) = 810/(810+90) = .9

(PPP) Positive Predictive Power – Knowing the test shows malingering, the chances the person is actually a malingerer

                   TP/(TP+FP) = 90/(90+90) = .5

(NPP) Negative Predictive Power – Knowing the test shows the absence of malingering, the chances the person is actually not malingering

                   TN/(FN+TN) = 810/ (10+810) = .921

Overall Diagnostic Power = (TP+TN)/N = (90+810)/1000 = .9

Positive Predictive Power (PPP) =

Prevalence x Sensitivity/ (Prevalence xSensitivity +(1-Prevalence) x (1-Specificity))

= .1 x. 9/ (.1 x .9)+ (.9 x .1) = .09/ (.09 +. 09) = .5

Negative Predictive Power (NPP) =

Specificity x(1-Prevalence)/Specificity x(1-Prevalence) +Prevalence x(1-Sensitivity)

= .9 x .9/(.9 x .9) + (.1 x .1) = .81/ .81 + .01 = .92

Assume N=1000 Test cut-off is 90% accurate

Example Three

900 Malingerers (+)

100 Non-Malingerers (-)

(TP) True Positive = 810 – Number of those identified as malingerers that truly are malingerers

(FP) False Positive = 10 – Number of those identified as malingerers that are truly not malingerers

(TN) True Negative = 90 – Number of those identified as non-malingerers that are truly non-malingerers

(FN) False Negative = 10 Number of those identified as non-malingerers that truly are malingerers

Prevalence = TP+FN/N = (810+90)/1000 = .9

Sensitivity – Knowing the person is a malingerer, chances the test will pick it up

                   TP/(TP+FN) = 810/(810+90) = .9

Specificity – Knowing the person is not a malingerer, chances the test will not show malingering

                   TN/(FP+TN) = 90/(10+90) = .9

(PPP) Positive Predictive Power – Knowing the test shows malingering, the chances the person is actually a malingerer

                   TP/(TP+FP) = 810/(810+10) = .921

(NPP) Negative Predictive Power – Knowing the test shows the absence of malingering, the chances the person is actually not malingering

                   TN/(FN+TN) = 90/ (90+90) = .5

Overall Diagnostic Power = (TP+TN)/N = (810+90)/1000 = .9

Positive Predictive Power (PPP) =

Prevalence x Sensitivity/ (Prevalence xSensitivity +(1-Prevalence) x (1-Specificity))

= .9 x. 9/ (.9 x .9)+ (.1 x .1) = .81/ (.81+. 01) = .921

Negative Predictive Power (NPP) =

Specificity x(1-Prevalence)/Specificity x(1-Prevalence)+Prevalence x(1- Sensitivity)

= .9 x .1/(.9 x .1) + (.9 x .1) = .09/ .09 + .09 = .5

In the example Dr. Jones and Dr. Smith are trying to diagnose Dissociative Identity Disorder (DID), which is assumed to have a base-rate of only one out of every 1000 people in the population.  Dr. Smith always diagnoses “no” and says the person does not have DID.  He would never identify the one case of DID if he made calls on 1000 people, but his specificity would be very good, as by saying that no one had DID he would be right 999 times out of 1000.  In the example Dr. Jones makes the diagnosis with perfect sensitivity (he always finds the one DID case out of each 1000 he evaluates), but he has imperfect specificity, as he makes one single false positive error for every 100 diagnostic judgments.  In the final analysis Dr. Jones makes 10 times more errors than Dr. Smith, even though he does identify the one case of DID out of every 1000 evaluated.  Notice that in this example Dr. Jones’s test is clearly “valid”, but Dr. Jones still has a much higher error rate than Dr. Smith.  Even if a test or procedure is “valid” its “effectiveness” must still be computed.  A test or procedure is only “effective” in the sense that it increases overall accuracy.  If the base-rate of the condition (prevalence) exceeds the test’s combined error rate (false positives and false negatives) the test is effective.  If the base-rate of the condition (prevalence) is less than the test’s combined error rate, the evaluator will be more accurate using base-rates alone.[31] 

Obviously Dr. Jones’ approach to diagnosing DID has some merit, and the evaluator must weigh the costs and benefits of both Type I and Type II errors.  Suppose that the test were designed to determine which persons were going to commit suicide.  Dr. Jones’s test would probably be considered preferable to Dr. Smith’s approach, since he would identify all of those who were going to commit suicide, although he would falsely identify 10 individuals out of every 1000 as suicidal who were not. 

Detection of Malingering of Cognitive Disorders

          Floor Effect

One test frequently used to detect the validity of cognitive memory complaints, the Rey Visual Memory Test (a.k.a. Rey Fifteen Item Test), illustrates the detection strategy called “Floor Effect”.  This test was developed by noted French neurologist Andre Rey, and was popularized in the United States by Lezak.[32] The task requires the memorization of 15 different items arranged in three columns and five rows. 

The idea of underlying the Rey 15 Item Test is a very simple one—that malingerers often score worse than truly disabled people on many instruments, since malingerers do not know what items even very disturbed or disabled persons can complete correctly, and thus score below the “floor” scores of disturbed or disabled persons.  Lezak stated that the principle underlying this test is that the malingering individual can be mislead to perform poorly on a simple task that “all but the most severely brain-damaged or retarded patients perform easily.”

 Figure 6 - REY 15 Item Test:

A
B
C
1
2
3
a
b
c
I
II
III

The 15 items are shown to the subject for ten seconds and are then taken away, after which the subject is asked to write down the items he can remember on a sheet of paper.  Subjects are told the test is a test of memory which many people may find difficult, although the ethics of making this false statement are debatable, since few people (even most seriously brain-damaged people) make large numbers of errors on this relatively easy test.  The test is scored for the number of correct items.  Lezak suggests a cut-off score of nine or more correct items as indicating a low probability of malingering, with eight or fewer correct items as suggestive of malingering.  Thus, if the subject can correctly reproduce three of the five rows or more he is thought to be unlikely to be malingering.  As the test costs virtually nothing, and takes only a short time to administer and score, it has become a broadly used instrument for detecting malingering of cognitive symptoms.  However, there is limited published research showing evidence of its utility.

Figure 7 shows that its sensitivity is poor, ranging from 14.5% to 22.5%, with specificity in the low 80’s.  This really does appear to be a test that only detects a small number of unsophisticated malingerers, but for those malingerers objective verification is useful.  One certainty cannot say that if the test shows malingering, the individual probably is malingering, but we might have some confidence in saying that a person identified as not malingering is probably not malingering. 

As with any cut-off, very extreme scores are more conclusive then scores close to the cut-off.  For example, a person who gets 14 of the 15 items right is very unlikely to be malingering, while a person who gets only one item right is very likely to be malingering. 

Obviously any number of existing tests can be utilized as floor-effect tests.  The evaluator should be suspicious any time an individual scores significantly below scores that severely disturbed or brain-damaged patients usually obtain.

Figure 7

Effectiveness of the Rey's 15-Item memory Test in
The Detection of Feigned Neuropsychological Deficits[33]

Study

Feigners

Comparison Groups

Sens. (%)

Spec. (%)

 
           

Goldberg & Miller (1986)

None Used

50 Psychiatric & 16 MR adults

Unk.

90.9(a)

 

Bernard & Fowler (1990)

None Used

18 neurological & 16 normals

Unk.

94.1 (b)

 

Schretlen, Brandt, Krafft, & Van Gorp (1991)

69 sim.(c)        7 mal.

148 mixed patients (d)

14.5

82.5

 

Davidson, Suffield, Orenczuk, Nantau, & Mandel (1991)

40. Sim.

127 mixed patients & 40

normals(e)

22.5

82.0(f)

 

Lee, Loring, & Martin (1992)

None used

140 neurological patients(g)

Unk.

95.70%(h)

 

a Specificity was 100.0% for psychiatric inpatients and 62.5% for adults with mild mental retardation.

b Specificity was 88.9% for neurological inpatients and 100.0% for normal adults.

c Simulators consisted of 47 normals asked to feign an amnestic syndrome and 22 inpatient substance abusers asked to feign “insanity.”

d Patients were composed of 10 amnesics, 54 patients with psychiatric disorders, 20 patients with mixed or unclear neuropsychiatric conditions, 55 brain-injured patients, and 9 demented patients.

e Patients were composed of 52 brain-injured, 25 spinal cord injured, and 50 chronic pain subjects.

f Specificity was 59.6% for brain injured, 76.0% for spinal cord injured, 70.05 for chronic pain, and 90.0% for normals.

g The study is a differential prevalence design with 16 litigating patients of whom an unknown percentage are likely to be feigning.

h Patients were composed of 100 inpatients with temporal lobe epilepsy and 40 brain-injured outpatients.

Note. The commonly used cutting score was <9 for feigning; the only exceptions are Davidson et al. (19910 and Lee et al. (1992), both of whom employed <8. sens. = sensitivity (percent of feigners accurately classified); Spec. = specificity (percent of honest responders accurately classified); Unk. = unknown; MR = mild mental retardation; sim. = simulators (i.e., normals instructed to fake); mal. = malingerers (i.e., individuals in clinical settings that are determined to be feigning).

One group examined neuropsychological test performance across groups of probable malingerers, patients with head injuries, and patients with depression and somatization disorders and found that probable malingerers performed worse on a variety of neuropsychological tests compared with other groups.[34] Similar findings have recently been reported by Van Gorp, et.al.[35]

Figure 8 lists a number of neuropsychological tests and research supporting their use to detect malingering using a floor-effect strategy. The same detection strategy can be used in competency to stand trial evaluations.  Research shows very low scores on the Georgia Court Competency Test are indicative of Malingering.[36]

Clinicians are often intuitively aware of how floor effect is useful in detecting malingerers.  It is the rare severely disturbed or brain-damaged individual who cannot state his name, age, birth date, telephone number, address, his mother’s and father’s names, etc., but many malingerers report they do not know this type of basic information.  Tests have been developed using such basic information as test items and some success has been achieved by attempting to identify malingerers in that fashion.[37]

IMPROBABLE PATTERNS

Another strategy adopted by those attempting to identify malingering is looking for improbable patterns on tests that a patient group would rarely show.  For example, it is a well known principle of learning that recognition is much easier than recall.  Any individual whose recognition scores are much lower than their recall scores could thus be suspected of malingering.  Rey used this principle by administering the Rey Auditory Verbal Learning Test, a recall task, and comparing it with the Rey Word Recognition Test. 

Figure 8

List of Neuropsychological Measures and Relevant Citations of Studies Supporting their Effectiveness in Identifying Malingering.

From Sweet, J.J. (1999); Malingering: Differential Diagnosis, in Sweet J.J., Ed. (1999).  Forensic Neuropsychology: Fundamentals and Practice.  Sweets and Zeitlinger , Exton PA where the complete citations may be found.

California Verbal Learning Test

 

Coleman, Rapport, Millis, Ricker, and Farchione (1998)

Millis, Putnam, Adams, and Ricker (1995)

Millis and Putnam (1997)

Sweet, Wolfe et al. (in press).

Trueblood and Schmidt (1993)

Trueblood (1994)

Category or Booklet Category Test

 

Bolter, Picano, and Zych (1985)

Ellwanger, Tenhula, Sweet, and Roesenfeld (1997)

Tenhula and Sweet (1996)

Digit Span

 

Beetar and Williams (1995)

Binder and Willis (1991)

Greiffenstein, Baker, and Gola (1994)

Heaton, Smith, Lechman, and Vogt (1978)

Iverson and Franzen (1994)

Iverson and Franzen (1996)

Martin, Hayes, and Gouvier (1996)

Meyers and Volbrecht (1998)

Mittenberg, Theroux-Fichera, Zielinski, and Heilbronner (1995)

Rawling and Brooks (1990)

Suhr, Tranel, Wefel, and Barrash (1997)

Trueblood and Schmidt (1993)

Finger Tapping

 

Binder and Willis (1992)

Heaton, Smith, Lehman, and Vogt (1978)

Halstead-Reitan Neuropsychological Battery

 

Heaton, Smith, Lechman, and Vogt (1978)

Mittenberg, Rotholc, Russell, and Heilbronner (1996)

Reitan and Wolfson (1996)

Trueblood and Schmidt (1993)

Trueblood and Binder (1997)

Knox Cube Test

 

Iverson and Franzen (1994)

Luria-Nebraska Neuropsychological Battery

 

Mensch and Woods (1986)

McKinzey, Podd, Krehbiel, Mensch, and Conley Trombka (1997)

Memory Assessment Scales

 

Beetar and Williams (1995)

Rey Auditory Verbal Learning Test

 

Bernard (1990)

Bernard (1991)

Bernard, Houston, and Natoli (1993)

Binder, Villanueva, Howieson, and Moore (1993)

Chouinard and Rouleau (1997)

Cradock, Gfeller, and Falkenhain (1994)

Greiffenstien, Baker, and Gola (1994)

Greiffenstien, Baker, and Gola (1996a)

Suhr, Tranel, Wefel, and Barrash (1997)

Rey Complex Figure Test

 

Chouinard and Roulea (1997)

Seashore Rhythm Test

 

Gfeller and Cradock (1998)

Trueblood and Schmidt (1993)

Sensory-Perceptual Exam (selected portions)

 

Binder and Willis (1992)

Heaton, Smith, Lechman, and Vogt (1978)

Trueblood and Schmidt (1993)

Warrington Recognition Memory Test

 

Crardock, Gfeller, and Falkenhain (1994)

Iverson and Franzen (1994)

Iverson and Franzen (1998)

Millis (1992)

Millis (1994)

Millis and Putnam (1994)

Wechsler Adult Intelligence Scale-Revised

 

Mittenber, Theroux-Fichera, Zielinski, and Heilbronner (1995)

Trueblood (1994)

Rawling and Brooks (1990)38

Wechsler Memory Scale

 

Greiffenstein, Baker, and Gola (1994)

Iverson and Franzen (1996)

Rawling and Brooks (1990)

Wechsler Memory Scale-RevisedBernard (1990)

 

Bernard, Houston, and Natoli (1993)

Bernard, Houston, and Natoli (1993)

Denney (1999)

Greiffenstein, Baker, and Gola (1994)

Greiffenstein, Baker, and Gola (1996a)

Martin, Franzen, and Orey (1998)

Mittenberg, Azrin, Millsaps, and Heilbronner (1993)

Wisconsin Card Sorting Test

 

Bernard, McGrath, and Houston (1996)

REY Word Recognition Test

The Rey Word Recognition Test consists of a list of fifteen words that is read to the subject at the rate of about one a second.  After that the examiner reads a list of 30 words, half of which were on the original list, and the subject should respond “yes” or “no” as to whether the word was on the original list.   Figure 9 shows both lists of words.

An easy way of scoring the test if you have no recall test to compare it with, is to add the number of correctly recognized words (true positives) to the number of correctly rejected words (true negatives), which results in a total correct score.  The best possible score would be 30, and this is a type of floor effect test, since even very disabled individuals usually get most of these easy recognition items right.  Frederick recommends a cutting score of 18, with those scoring lower suspected of malingering. 

Figure 9

Half
Camel
Mistake
Toy
Morning
Hair
Wax
Grain
Cookie
Fly
Place
Cherry
Door
Knee
State
Hello            Camel    Half         Today       Power

Style             Door       Grass        Cookie    Thread

Hair     Horse     Light         Fly          Cherry  

Gift      Morning          Airplane   Grain       Toy

Wax              Concert          Bottle       Place       Wall

Cheese Smile          Knee        Mistake   State

In the validation study that Fredrick did on his VIP, Fredrick used the Rey tests to compare his test with the commonly used Rey Measures:

Figure 10: Frederick’s Results when Validating the VIP[39]

 

Sensitivity %

Specificity %

Overall Classification Rate %

VIP Nonverbal

47.7

98.8

74.0

VIP Verbal

42.0

98.1

71.0

PDRT

17.0

99.4

67.7

RMT

4.9

97.5

61.4

WRT

8.8

100.0

64.6

DCT

11.8

97.5

64.1

VIP-Validity Indicator Profile; PDRT- Portland Digital Recognition Test;

RMT- Rey Memory Test; WRT- Word Recognition Test; DCT- Dot Counting Test

As can be readily seen, on the Rey Memory Test (Rey 15 Item Test) Frederick’s figure for sensitivity was 4.9 % and his figure for specificity was 97.5%.  Notice that the Rey Word Recognition Test was a slight improvement, with a sensitivity of 8.8% and a specificity of 100%.  However, such low sensitivities show these Rey tests to have an objectionably high error rate.  The figures for the Rey Dot Counting Test are also poor, with a sensitivity of 11.8% and a specificity of 97.5%.

The original principle of the Dot Counting Test is that a normal subject should count smaller numbers of dots faster than larger numbers of dots.  Thus, Rey would show subjects cards with ungrouped dots ranging from four to ten dots.  The cards would be presented in random order, and Rey would time how long it took the subject to count the dots on each card.  Rey would then order the cards in the order from the fewest dots to the most dots.  He would then look to see if the times increased progressively with each card.  If some cards with fewer dots to count required more time than cards with more dots to count, malingering was suspected.

REY Dot Counting Test
The other principle used on the Dot Counting Test is that it should take less time to count patterned or grouped dots than it takes to count ungrouped dots.  The number of dots ranges from seven to 28.  There are 12 cards, with six showing grouped dots and six showing ungrouped dots.  Rey believed that malingerers would take much longer than truly disabled people to count the grouped dots.  The Dot Counting Test can be scored by dividing the total time for counting all of the grouped dot cards by the total time for counting all of the ungrouped dot cards.  This figure must then be multiplied by the total number of errors +0.5 (The 0.5 is necessary to avoid multiplying by zero).  Errors are computed by finding the absolute values of the errors between the examinee’s responses and the actual number of dots.  For example, if a card contains nine dots, and the examinee responded “seven” or “eleven”, then the error score for that card is two.  These absolute values are totaled over all twelve cards.

Score = Time (grouped)  x (total errors +0.5)

        Time (ungrouped)

Frederick suggests the best cutting score with this instrument is eight, with scores greater than eight indicating malingering, but using that cutting score the test’s sensitivity was only 11.8%, as already mentioned above.          

 Results of one study suggest that merely tabulating the total sum of incorrect counts on the Dot Counting Test produces a score which is an even more powerful discriminator between malingerers and non-malingerers.[40] This is an encouraging development, since the total number of mistakes is easy to tabulate and requires no timing of the subject.

Symptom Validity Testing

Floor effect tests for malingering rely on normative data to determine which items even severely disturbed or brain-damaged individuals usually answer correctly.  In symptom validity testing principles of probability are used to determine which scores are so poor that they are below what would be expected by chance alone.  Symptom validity testing is a forced-choice method, usually with only two choices, in which the subject is told to answer all questions, guessing on those on which he is not sure of the answer.  Malingerers are faced with the problem of how many wrong answers to indicate on items on which they know the correct answer.  Oftentimes on individually administered symptom validity tests the examiner will tell the subject each time whether his answer was correct or incorrect.  Malingering subjects who have just been informed they have answered an item correctly may feel they must miss more items to appear sufficiently impaired.

Consider a 100-item two-choice forced-choice test.  On average a person guessing without even reading the questions would get about half the items right.  Those scoring substantially lower are very effectively showing that at least on some of the items they did know the right answer but deliberately picked the wrong one.  For example, suppose someone got only ten items right out of 100.   This is an outcome that would be very unlikely even based on chance, or random responding.  Such a score would be clear evidence of malingering.

Binomial probability theory tells us that randomly generated 2-choice 100-item tests will produce a pattern of frequency counts that approaches a normal curve, with a mean of 50 items correct.  Using the .05 probability level as a cut-off, z scores can be calculated, which would show that the likely range of scores generated from random responding is 41 to 59.  Sores between 41 and 59 are probable scores among individuals who “guess” at every answer, or do not even read the items.  Scores of 60 or more are probably indicative of at least some effort to answer the items correctly.  Conversely, scores of 40 or less are probably indicative of a deliberate attempt to miss the items.  Indeed, in terms of measuring ability to distinguish between correct and incorrect items, a score of 10 would be equivalent to a score of 90–both show a considerable knowledge of the correct and incorrect answers.  Those who are severely disturbed or brain-damaged would be very unlikely to score significantly below a chance score, and indeed would usually know at least some of the items, thus scoring slightly above chance.

Symptom validity testing was originally developed by Pankratz, who used it to determine whether a claimed disability actually existed.  For example, to test a claimed hearing deficit he would have one hundred trials, each time asking the subject whether a tone had been played or not, with the subject being instructed to guess if he was not sure.  After each trial the subject would be informed whether his response has been correct or incorrect.   Subjects scoring at or near 50% were deaf.  Those scoring significantly above chance were demonstrating at least some ability to hear, as were those scoring significantly lower than chance. [41] If a patient claimed to have no feeling in one hand, Pankratz would cover the hand with a drape, and then touch the hand either on the front or the back, each time asking the patient, “Did I touch the front or the back?”  After each answer the subject would be told if his answer has been correct.  Those who gave the incorrect answer with a frequency much higher than chance were thought to be showing symptoms that were not genuine. [42]

Frederick has used a symptom validity method to test the validity of symptoms of amnesia of an event (or inability to remember what the subject has been told since the event about what allegedly transpired).  He makes up a list of two-choice forced-choice items about the event, such as, “Do the police say you were driving a car or a truck?”, and “Was the truck they say you were driving a Ford or a Chevrolet?”  Feedback is given after each question as to whether the answer was correct.  Subjects whose percentage incorrect is significantly higher than chance are actually showing true knowledge of the facts of the case.

We can see that symptom validity testing is actually a further adaptation of the floor-effect principle.  Both floor-effect and symptom validity testing strategies use the principle that some malingerers will score lower than even severely disturbed or brain-damaged patients.  Some malingerers cannot distinguish between scoring in the disturbed or brain-damaged range, and scoring in the almost impossibly low range, in which the subjects gets many more answers wrong than right when confronted with two choices.  As already mentioned above, these malingerers appear to be the unsophisticated and thus more easily discoverable malingerers.  More sophisticated malingerers may be aware that they should get about half of the two forced-choice items right if they want to show no ability, and should get considerably more than half the items right on most tests in order to achieve a believable score in the brain-damaged range.  Indeed, most symptom validity tests really turn out to be floor-effect tests after all, since the most effective cut-off score usually turns out to be much higher than a chance level of performance, (i.e. a “floor”).

Test of Memory Malingering

An example of this type of symptom validity test is the Test of Memory Malingering, or TOMM, a test of visual memory.  This test is available through Multi-Health Systems for $95.00, and scoring is simple and can quickly be done by the examiner.  This easily portable test is individually administered by the examiner but is not time-intensive.  There are two “study phases, two “test” phases, and an optional delayed retention trial.  The subject is shown fifty line drawings at the rate of one every three seconds, so this initial “study” phase takes only two and one half minutes.  Then, during the “test” position, the same fifty items are presented paired with new drawings the subject has not previously seen, and the subject is asked to indicate which one of the drawings he recognizes from the study phase, and is forced to guess if he says he does not know.  The initial study phase and test phase will usually take from five to ten minutes.

The TOMM uses two decision rules.  The first is that scoring less than chance on any trial indicates the possibility of malingering.  The binomial distribution shows that based on 50 two-choice forced-choice items the 95% confidence interval for chance random performance ranges from 18 to 32 correct out of 50.  Scores below 18 are unlikely to occur by chance, so any score less than 18 (on any trial) is suggestive of malingering.

On the other hand, even severely disturbed or brain-damaged individuals usually score much higher than chance, particularly if given more than one learning trial.  Thus, on the TOMM, the second decision rule is that any score less than 45 out of 50 on Trial two or the Retention Trial is indicative of the possibility of malingering.

At the end of the first-study phase and test phase the non-malingering subject will usually get forty five or more out of 50 right.  A person scoring between 19 and 45 is in the questionable range, and the study phase and test phases are administered again.  Almost all brain-damaged subjects will score 45 or higher the second time around, so those who score less than 45 are suspected of malingering.  The manual states, “Rather than using the score of 45 as a rigid cut-off, it should be used as a guideline, with the likelihood of malingering increasing as the score deviates further from the normative baseline for each specific diagnostic group.”

Only a few malingerers, presumably the least sophisticated, score less than 18.  Indeed, far more possible malingerers are detected using the floor-effect decision rule that any score less than 45 on the Trial two or Retention trial is indicative of possible malingering. 

Data presented in the test manual for the TOMM suggest that performance on the TOMM is not sensitive to the effects of age or years of education.  Furthermore, subjects perceived the TOMM to be more difficult than it actually was.  Two validation studies with small N’s are reported, a simulation study with 27 simulated malingerers and 22 controls, and a comparison of 17 traumatic brain injury patients “not-at-risk” for malingering with eleven traumatic brain injury patients “at risk” for malingering, and with eleven cognitively intact controls and twelve neuropsycholgically impaired controls.  Sensitivity and specificity information was only provided for the simulation study.  All individuals in the control group achieved a score of 49 or greater (100% specificity), while 93% of the simulators scored less than 49 (93%sensitivity).[43]

Easy versus Difficult

We have already seen some examples of the easy vs. difficult detection strategy.  As we discussed, the original idea of the Rey Word Recognition Test was that recognition is easier than recall, so those with poorer recall scores than their recognition scores are suspected of malingering.  The Rey Dot Counting Test uses the principle that counting grouped dots should be easier (and thus faster) to count than counting ungrouped dots, so higher grouped dot counting scores than ungrouped dot counting scores are suggestive of malingering.  Other examples would be a faster score on Trails B than Trails A, or more Digits Backward then Digits Forward.  The principle is that for non-malingering individuals easier items should be completed faster or with a higher percentage correct than more difficult items.  On just about any test with normative data some items will be missed more frequently than others.  The Easy vs. Difficult detection strategy uses a comparison of the number right and/or response times on the easy items with the number right and/or response times on the difficult items.

Victoria Symptom Validity Test

The Victoria Symptom Validity Test (VSVT) is an example of a symptom validity test that also uses floor-effect and the easy-difficult principle. The VSVT is a computerized test that is available from PAR; it is relatively expensive, at $429.00!  This provides for unlimited uses, but the disk can only be installed on one computer.  Thus, the test is not easily portable in the sense that the computer must always be taken along in order to use the test.  The advantage, however, is that after an introduction most subjects can take the test themselves without supervision.

Like the TOMM, the VSVT is a visual recognition task.  On the VSVT the test phase on each item occurs immediately after the study phase for that item.  Instead of using line drawings like on the TOMM, the VSVT uses five digit numbers.

The subject is shown a five-digit number for five seconds, and then shown two five-digit numbers, one of which is the previously viewed number.  The subject is asked to identify which of the two choices matches the previously viewed five-digit number.

The VSVT has three “blocks” of 16 items, eight of which are “easy” and eight or which are “difficult”, for a maximum correct score of 48.  On “easy” items the distractor has all five digits different than the original stimulus, whereas on “difficult” items, only one of the digits in the distractor is different from the stimulus.  The computer program keeps track of Total Correct, Easy items correct, Difficult items correct, Easy items response latency, Difficult items response latency, and Right-left preference.  Administration of the test requires just fifteen to 20 minutes, and the computer does the scoring.

Subjects should get approximately 24 of the 48 items right just by chance.  Scores from 18 to 29 are considered “questionable” because they fall in the range of random responding, as they occur infrequently even in brain-damaged populations.  Scores of 17 or less are considered to be indicative of malingering, as scores that poor would not occur even during random responding more than one time out of 20.  Scores of eight or less correct out of the 24 “difficult” items are also below the level associated with chance responding, and thus are also suggestive of malingering. Correctly answering only one of eight difficult items on any Block is also suggestive of malingering. Thus, in this sense, the VSVT is a symptom validity test.  Responses are also analyzed for the number correct among “easy” and “difficult” items, as well as the amount of time the subject takes to respond to each item.

Subjects have to remember Block One stimuli for fine seconds, have to remember Block Two stimuli for ten seconds, and have to remember Block Three stimuli for 15 seconds.  The increased times do not make the items more difficult, but are meant to make it appear that the Blocks are becoming harder.  Differences between Blocks are also analyzed – if performances become progressively worse as the Blocks progress, malingering is suspected.

When reporting its interpretations of results on the VSVT, the computer program does not use dichotomous classification rules dividing the subjects between valid and invalid profiles.  Rather, it classifies scores that are clearly above chance responding as “valid”, classifies those that are in the range of random responding as “questionable”, and classifies those scores that are clearly below chance levels of responding as “invalid”.  This three-category system precludes calculating sensitivity and specificity data.  Validation groups for the VSVT included control college students (N= 95), feigning college students (N= 43), compensation-seeking head-injury patients (N=32), and non-compensation seeking head injury patients (N=32).  All of the control and non-compensation seeking patients were correctly classified as “valid”, since all performed above chance levels.  Within the feigning group 18.6% were classified as “valid”, 51.2% were labeled as “questionable”, and 30.2% were classified as “invalid”.  The compensation-seeking group was classified as 85.4% “valid”, 11.2% “questionable”, and 3.4% “invalid”.  It appears if a dichotomous cut-off were used the VSVT would have sensitivity and specificity data about comparable to that reported on the TOMM.

Performance Curve Analysis

Clearly the concept of easy vs. difficult does not have to be dichotomous.  Instead, items can be ordered on a continuum from very easy to very difficult.  An individual’s “performance curve” can then be plotted, with the percentage correct on one axis, and the difficulty of the items on the other.  On a two-alternative forced choice test a person who is not malingering will get most or all of the easier items on such a test right, but as the items become harder, his percentage correct will gradually decrease to approximately 50% correct, or a chance level of responding.

Most tests of intellectual functioning or school achievement order their items on each sub-test from the easiest to the most difficult.  This is convenient for the examiner, who can establish a “floor” and a “ceiling”, and thus does not have to administer all of the items on the sub-test in order to come up with a score.  However, these tests suggest an obvious strategy to malingerers, which is to get the easy items right until a point is reached at which the malingerer thinks a brain-damaged person would begin to miss items, after which the malingerer’s strategy would be to respond randomly to the rest of the items on that sub-test.

A better strategy in devising tests for malingering is to order the items in random order of difficulty, making it more difficult for malingerers to know which items are the easier items, and which are the more difficult items.  We have already seen a type of performance curve analysis strategy using random order of difficulty of items on the original method of using the Rey Dot Counting Test.  Remember that Rey originally used all ungrouped dots, with the number of dots per card varying from four to ten dots, for a total of seven cards.  The cards were presented to the subjects in random order, and the time it took to count the dots was recorded for each card.  Rey would then order the cards from least dots to most dots, and plot a  “performance curve”, checking to see if the amount of time required to count the dots increased progressively with the number of dots on the card in a near-linear fashion.  Results which were not near-linear were considered to be suggestive of malingering, as in Figure 11, 

Figure 11



Validity Indicator Profile

Frederick’s Validity Indicator Profile (VIP) uses performance curve analysis as well as floor-effect and symptom validity testing as strategies to detect malingering of cognitive symptoms.  The VIP is probably the most sophisticated test of malingering of cognitive symptoms.  It is available through NCS, and only methods provided for scoring the test are to send the test to NCS, to e-mail it to NCS, or to have the test scored on one’s own computer on a pay per use basis.  It costs about $16 to score one VIP.  The test can simply be handed to the subject who then fills it out himself, although as with all symptom validity tests, it is important to make sure that all of the questions have been answered.  More capable subjects usually take much longer to complete the test than less capable subjects.  

One advantage of the VIP is that it consists of two sub-tests, the Verbal and the Non-Verbal sub-tests.[44]  Some malingerers may adopt the strategy of only malingering on certain types of tests, in order to attempt to feign a certain type of disability.  Thus, on the VIP a subject could for example, malinger on the Verbal sub-test but not malinger on the Non-Verbal sub-test.

The Verbal sub-test consists of 78 items, each of which consists of a stimulus word followed by two choices, only one of which is a synonym of the first word.  In the example the stimulus word is “carpet”, and the two choices are “rug” and “shoe”; we immediately recognize that the test is a two-alternative forced choice test that lends itself to a symptom validity testing strategy for detection of malingering.

Items range from very simple to very difficult, but (as discussed above) they are presented to the subject in random order of difficulty.  The very hardest items are deliberately “tricky”, in the sense that the correct answer is a word few people would know, but the incorrect answer appears to be close to the right answer, although not exactly right.  For example, in number 39, a “virginal” is actually an archaic type of harpsichord, although most normal people would not know that obscure fact, and would chose “honest”, an approximate synonym to the more usual sense of the word “virginal”.  An honest responder is likely to be “tricked” by these misleading items, and thus to get more than half of them wrong.  Malingerers are likely to choose what they think is the wrong answer, although it actually turns out to be the right answer, and therefore malingerers actually get a higher than chance percentage of these items right!

Remember that for analysis purposes Frederick re-orders the items in order of difficulty from the easiest to the hardest.  Plotted on the vertical axis on Figure 12 is the running average of the percentage correct, averaged over the previous ten responses.   Scores range from 100% correct (1.0) to none of the items correct (0).  On the horizontal axis is the item difficulty.  Figure 12 represents the curve generated by a non-malingering subject.  Notice that a non-malingering subject’s curve will start at 1.0 or close to 1.0, as non-malingering subjects, even severely brain-damaged subjects, will get almost all of the first ten items correct.  Frederick calls the place on the vertical axis where the performance curve starts the “Point of Entry”.  Non-malingering subjects will almost always get 8 out of the easiest 10 items correct, so any point of entry of 0.7 or less is suggestive of malingering.  Obviously, Frederick is using the point-of-entry statistic as a floor-effect strategy for detecting malingering.

An honest, or non-malingering, subject’s curve will start at 0.8, 0.9, or 1.0.  The honest subject’s curve will continue with a high percentage correct until the subject reaches the level of difficulty where he no longer knows all of the right answers.  Gradually his proportion correct will drop to around .5, or the random level of responding in this two-alternative forced choice test.  Because of those “tricky” most difficult items, an honest responder will have a “tail” pointing downward for the last few items, since an honest responder will be tricked into missing a fair number of those items.

As illustrated in Figure 13, a person who is not even reading the items will average about 50% correct, and will show a relatively flat graph, starting and ending at about .5 regardless of the level of difficulty (Line B).  An unsophisticated malinger will get almost all or all of the easiest items wrong, so his point-of-entry will be 0 or 0.1.  His proportion correct will continue to be far below the level that would be expected based on chance performance until he reaches the level of difficulty at which he does not know the answers.  At that point the curve will rise to approximately 0.5, the chance level of responding.  On the most difficult “tricky” items the malingerer will, paradoxically, get a higher percentage correct than the honest responder, because the malingerer is “tricked” into thinking he is choosing the incorrect answer when he is actually choosing the correct answer.  Thus, the malingerer’s “tail” will go up (Figure 13, Line C).  Notice that an estimate of a malingerer’s true ability can be calculated, since until his curve approximates 50% the malingerer is demonstrating ability in deliberately choosing the wrong answer.

Frederick divides his performance curves into “sectors”.  Sector one consists of the easiest questions, usually the first part of the curve that is relatively flat for both malingerers and honest responders.  Sector two starts where the curve begins to move towards 0.5, and represents that level of difficulty where the subject begins to have trouble knowing the correct answer to the question.  Sector 3 starts when the curve reaches 0.5 and represents the items for which the subject is “guessing”.  The “tail” is the final sector.  The difficulty level of the items at which the sectors are delineated will vary depending upon the ability of the subject, with more capable subjects showing much longer sector one distances than less capable subjects.

Obviously negative slopes (downward curves) in Sector 2 are characteristic of honest responders, whereas positive slopes (upward curves) are characteristic of malingering responders, and flat slopes are characteristic of random responders. Persons with longer Sector one distances are showing more “ability” than those with shorter Sector one distances.

Frederick’s Non-verbal subtest for the VIP consists of 100 items, which are derived from the Test of Non-Verbal Intelligence (TONI). The TONI consists of a series of increasingly difficult matrices, (similar to the Raven’s Progressive Matrices), which subjects solve by selecting the correct answer from among either four or six alternatives, depending upon the item.  For the VIP Non-verbal subtest the items were modified to have only two choices (one correct, and the other a distractor) to facilitate use of the strategy of symptom validity testing.  Presentation order was modified so the item difficulty was randomized for the subject taking the test just as with the VIP verbal subtest.  As mentioned above, this makes knowing which items to miss much more difficult for the malingerer, precluding the strategy of answering questions correctly to a certain level of difficulty, and then responding randomly.  The VIP Non-verbal subtest does not have a “tail” of “tricky” items at the highest level of difficulty as does the VIP Verbal subtest, but otherwise the performance curve analysis is carried out exactly the same as with the Verbal sub-test.

As mentioned earlier in this paper Frederick advocates a four-fold classification scheme for performance on cognitive testing.  This scheme combines effort and motivation to generate four response classifications: compliant, careless, irrelevant, and malingering.  The computer program which analyzes the VIP thus divides the performance curves into those four categories.  Interpretation of the results seems clear if the profile is labeled as “compliant (honest)”, or “malingering”, but becomes more problematic on the numerous cases labeled either “careless” or “irrelevant”.  However, research supports the formulation of motivation and effort as different constructs.[45] For example, malingerers are not the same as individuals distracted by chronic pain.

Consistency of Item Endorsement

Besides the performance curve analysis discussed above, Frederick analyses three different measures of “consistency” on the VIP; his purpose here is to identify inconsistent, careless, or random respondors. The “Consistency Ratio” is an index of the extent to which an individual answers paired items of comparable difficulty correctly, with the difficulty of items established by the normative sample. The “Norm Conformity Index” measures response consistency in comparison to the average response pattern of the normative group.  The “Individual Consistency Index” examines response consistency with respect to the individual’s own average response pattern. The detection strategy being utilized here is that subjects with invalid response styles will have trouble showing consistent patterns such as would be shown by true patients with valid response styles.  Inconsistent responders may be malingering, or may be what Frederick calls “careless” responders, those who for at least parts of the test are not working up to their true ability.

The Reitan-Wolfson Retest Consistency Index

One approach using the detection strategy that malingerers will be more inconsistent than non-malingerers is the Retest Consistency Index on the Reitan-Halstead Battery.  Very frequently those who are claiming brain damage will take the battery more than one time, perhaps once for the defense and once for the plaintiff.  Reitan and Wolfson hypothesized that malingerers would not show consistent scores upon the re-taking of the test, whereas valid responders would show more test-re-test consistency.  Using a differential prevalence design, they compared WAIS-R sub-test scores from an initial testing and a subsequent testing among head-injured subjects in litigation and those not in litigation.  Results indicated that subjects involved in litigation were much less consistent between the two testings.[46] The authors subsequently developed the Re-test Consistency Index, a measure which correctly classified 90% of the litigants and 95% of the non-litigants.[47]

Detecting Malingering of Psychopathology

When the MMPI-2 was introduced over a decade ago the authors decided to include some new measures of response consistency which are in many ways similar to the measures of consistency just discussed that are used on the VIP.  On the MMPI-2, VRIN (Variable Response Inconsistency) consists of 67 pairs of items that have similar or opposite item content.  The idea here is that valid responders will answer a large number of those items consistently.  Totally random responders will answer a large number of those items inconsistently. 

There is disagreement as to the optimal cutting score of VRIN to distinguish between consistent and inconsistent patterns or item endorsement.  Cutting scores from 10 to 14 have been suggested.[48]If the raw score on VRIN is seven or lower, there is a high probability the patient has endorsed the items consistently.  If the raw score is eight to fifteen, it is not clear whether the patient has endorsed the items consistently or inconsistently, and an analysis of the traditional “validity” indices is recommended.  A VRIN score of 16 or higher shows a high probability the patient has endorsed the items inconsistently.[49]  Notice that VRIN is not affected by the presence of psychopathology, as are the traditional L, F and K scales.

TRIN (True Response Inconsistency) consists of 23 pairs of items, and is very similar to VRIN except that the scored response is either “true” or “false” to both items in each pair.  TRIN measures a tendency to frequently answer “true” or frequently answer “false”. TRIN has little published research.  Scores that are very low (five or less) or very high (13 or more) probably reflect inconsistent item endorsement.

Measures of response consistency indicate only whether the subject has endorsed the items consistently, and not necessarily whether the answers have been accurate.  A sophisticated malingerer may be able to malinger consistently.

Original MMPI Validity Measures

Hathaway and Mckinley[50], the original developers or the MMPI, were ahead of their time when they developed “validity scales” for the instrument.  The MMPI was the first test of psychopathology to use assessment of response sets as a method of weighting scores (k-corrections).  For years the validity scales on the MMPI were typically the only way psychologists approached the issue of malingering.

Hathaway and Mckinley originally intended the F scale to be used for the identification of recording and scoring errors of responders who could not read or comprehend the test items and of individuals who did not co-operate sufficiently with the testing procedure.  Thus, the F scale was originally intended to be a measure of response inconsistency.  However, it was discovered early on that F was also sensitive to intentional attempts to portray oneself in an overly negative manner, a test-taking approach that has been termed “faking bad”.[51]

Infrequency Scales

Psychologists are all very familiar with the validity scales on the MMPI.  The F scale is the traditional “fake bad” index of malingering on the MMPI, since its items  detect unusual or atypical ways of endorsing.  Psychologists are taught that there are three reasons why an individual might receive a high F score.  High scores may indicate inconsistent patterns of item endorsement, the presence of actual pathology, or malingering.  According to Rogers clinicians are probably safe to conclude that a raw score greater than 26 (T score >110) on the F scale does not reflect actual psychopathology, but it could reflect either an inconsistent pattern of item endorsement or malingering.  The optimal cutting score on the F-scale to identify persons who are malingering within clinical samples has ranged from 17 to 30.  According to one study, a cutting score of 17 would classify nearly 20% of patients as malingering, while a cutting score of 28 would classify only a little over 5% as malingering.[52]

The F scale was originally constructed by selecting those items on the MMPI that were endorsed infrequently, or by less than 10% of the time by the normative sample.  The F scale included a heterogeneous content encompassing bizarre sensations and symptoms as well as atypical beliefs and unlikely self-descriptions.  Many psychologists are not aware there is substantial item overlap between F and other scales (37 items, or 57.8%), particularly scale 8 (15, or 23.4%).[53] The MMPI-2 F scale deleted four and modified 12 items from the original F scale, leaving it primarily intact[54]. Because of the strong item overlap between the F scale and other scales, the F scale is poorly designed to differentiate malingering from true psychopathology.

A better approach from the point of view of detecting malingering might have been to select those items for the F scale that had the least overlap with other scales.  An even better approach might have been to administer the test to some known malingerers, and to have used items that best differentiated malingerers from those with true psychopathology as the basis for a “malingering” scale.  Unfortunately there is little published research on the MMPI-2 using known malingerers, although there are a number of simulation studies.

Figure14 – Clinical Assessment of Malingering and Deception[55]

The Fb scale was developed by Butcher, et.al to test whether patients were responding in an invalid manner on the latter portions of the MMPI-2.  The Fb consists of 40 items that begin at item 281, and are mostly worded in the “true” direction. Fb shares no items with F, but was developed using a similar strategy (endorsed by less than 10% of non-clinical samples).  Like the F scale, Fb has substantial overlap with the clinical scales (27.5%), with most of the overlap occurring with scale 8 (9 items, or 22.5%).

The Fp scale[56]is a set of 27 items which are derived largely from F and Fb.  The scale was constructed using 706 Veterans Administration psychiatric patients.   Using inpatient psychiatric samples is important, because in clinical practice the task is typically the differentiation of malingerers from bona fide patients, rather than from normals.  The idea of the study was to come up with items answered infrequently by both inpatients and the MMPI-2 normative sample.  The goal of the study was to develop a new MMPI-2 infrequent response scale in which the confounding of malingering and psychopathology would be minimized.  Thus, the Fp scale is an attempt to deal with some of the problems with F and Fb outlined above.  However, since it was not possible to add new items to the already existing MMPI-2, only already existing items were used, which thus still causes the Fp scale to overlap with the clinical scales (29.6%), particularly with scale 8 (five items, or 18.5%).  Using a simulated design in which college students completed the MMPI-2 twice, once under standard conditions, and a second time under instructions to fake psychopathology, some incremental validity for the Fp scale was established.[57]

In 1994 Rogers, et.al did a meta-analysis, looking at ten MMPI-2 fake-bad scales/indexes.  A total of 15 MMPI-2 malingering studies, all of which employed simulation designs, were reviewed with respect to the efficiency of ten fake-bad scales/indexes.  Eight of these studies also employed psychiatric comparison samples.  The fake-bad indicator with the greatest effect size was F, followed by F-K.

As mentioned above, cutting scores for F ranged widely across the studies, from 17 to 30.  Recognizing that this confusion about cutting scores makes clinical practice confusing, the authors concluded the best practice is that when F is greater than 23 raw, or greater than 81 T-score, malingering should be suspected.  The authors agreed with the above analysis of problems with F and Fb, stating the meta-analysis “strongly suggests substantial overlap among the criterion groups”, which “mitigates against accuracy in categorical classification.”[58]

Problems of contamination of F items with psychopathology are even worse on the MMPI-A, used for adolescents ages 14 to 18.  On the MMPI-A items were selected for inclusion on the F scale if they had an endorsement frequency of less than 20% by the normative sample.  Thus, in adolescent populations the F scale is even more problematic.  Adolescents produce rather high elevations on F, so it is curious that an endorsement frequency of 20% was chosen for inclusion of an item on Scale F of the MMPI-A.  Adolescents tend to report more unusual symptoms on the MMPI than do adults.  Thus, using a cut-off of 90 T-score the F-Scale on the MMPI-A has been shown to have poor predictive power in identifying adolescent malingerers.  However, when a T-score cutoff of greater than 81 was used, more acceptable results were achieved.  In another study done by Stein, et.al,[59]better accuracy was obtained using the same cut-off score, see figure 15.

Figure 15

Accuracy of MMPI-A Scales and Indices for Detecting Malingering in Adolescents[60]

Cutoff

PPP

NPP

F-minus-K(a)

   

>10

0.91

0.59

>20

0.83

0.91

Scale F T-Score (a) >81

0.66

0.91

Scale F Raw Score (b) > 23

0.94

0.89

More recent studies of malingering with the MMPI-2 have examined the effects of instructing subjects to malinger specific mental disorders, such as schizophrenia or depression, as opposed to being told merely to fake bad.  These studies may be slightly better than the previous designs, but still have all of the problems associated with simulation designs, as described above.  For example, 40 college students were asked to fake either depression or schizophrenia with an incentive of $100 offered to the most convincing simulator on the MMPI-2.  Their scores were compared with honestly responding patients with schizophrenia, honestly responding patients with depression, and a non-clinical comparison sample.[61]

As in the studies discussed above, F had the largest effect size in distinguishing between depressed patients and those feigning depression, and in distinguishing between schizophrenia patients, and those feigning schizophrenia.  All of the validity scales and indicators examined were found to be relatively more effective with detection of feigned schizophrenia, suggesting it is more difficult to detect feigned depression than it is to detect feigned schizophrenia.

In a more recent study 23 mental health professionals with expertise and experience in assessing and treating major depression were asked to complete the MMPI-2 as if they were suffering from major depression.[62]Their protocols were compared to a sample of patients diagnosed with major depression.  F and Fb had the largest effect sizes in distinguishing between the two groups, with Fp doing more poorly.

We must conclude that the best measure of malingering on the MMPI-2 and MMPI-A is the F scale, but that high scores on F may also be achieved because of significant psychopathology.  Similar problems exist with the MCMI-II, MCMI-III, and MACI validity indicators.  The Debasement Scale, the primary fake-bad indicator, is made up entirely of items that are drawn from the clinical scales, so it shares many of the same problems associated with the F scale on the MMPI tests, discussed above.  At present there is little research on malingering using the MCMI tests or the MACI.[63]

Miller Forensic Assessment of Symptoms Test (M-FAST)

A recently published test to detect malingering of psychopathology, the M-FAST, is an attempt to develop a test specifically designed to detect malingering and is specifically validated using malingerers.   This approach should produce an instrument superior to the MMPI or MCMI tests in detecting malingerers, since the MMPI and MCMI test were not constructed or validated with detection of malingering as the first priority.

The M-FAST, a structured interview, is a 23 item screening inventory that can be administered in five to ten minutes.  It is easily portable, costs only $105, and is available through PAR.  The M-FAST was developed and validated through a series of studies using either clinical or non-clinical samples.  Individuals in the clinical samples were part of studies that employed known-group designs, whereas individuals in the non-clinical samples participated in simulation design studies.

Figure 16 -- M-FAST Scales[64]

M-FAST Scale

Description

Reported vs. Observed (RO)          (3 items)

Assesses inconsistency between self-reported symptoms and observed behavior.

Extreme Symptomatology (ES)    (7 items)    

Assesses the examinee's experience of symptoms that have atypical severity and pervasiveness

Rare Combinations (RC)              (7 items)

Assesses the endorsement of unusual symptom combinations

Unusual Hallucinations (UH)          (5 items)

Assesses the examinee's experience of unusual and uncommon psychotic symptoms

Unusual Symptom Course (USC) (1 item)

Assesses report of a sudden onset of cessation of mental illness

Negative Image (NI)             (1item)

Assesses whether an examinee has an overly negative self-image

Suggestibility (S)                          (1 item)

Assesses suggestibility to experience unusual symptoms

Despite the fact there are only 25 items on the M-FAST, the author divides the test into seven different scales, but three of the “scales” are scored using only one item.  As is obvious from Figure 16, many of the detection strategies utilized on the M-FAST are ones we have already discussed: extreme symptomatolgy, rare combinations, and unusual symptoms.  Three items concern themselves with differences (if any) between the subject’s self-reported behavior and his actual observed behavior.  One item assesses a report of a sudden onset or cessation of illness, another assesses whether the examinee has an overly negative self-image, and another assesses suggestibility to experience unusual symptoms.

Figure 17 – SAMPLE

Utility Rates of M-FAST Total Scores for the Non-clinical Samples M-FAST

Scores

NPP

PPP

Specificity

Sensitivity

1

1.00

0.66

0.51

1.00

2

0.99

0.80

0.77

0.99

3

0.98

0.90

0.90

0.98

4

0.94

0.96

0.96

0.93

5

0.93

0.96

0.97

0.93

6

0.94

1.00

1.00

0.93

7

0.86

1.00

1.00

0.82

8

0.81

1.00

1.00

0.75

9

0.86

1.00

1.00

0.83

10

0.84

1.00

1.00

0.81

11

0.82

1.00

1.00

0.77

12

0.77

1.00

1.00

0.69

13

0.73

1.00

1.00

0.61

14

0.68

1.00

1.00

0.51

15

0.63

1.00

1.00

0.39

16

0.61

1.00

1.00

0.32

17

0.59

1.00

1.00

0.28

18

0.56

1.00

1.00

0.18

19

0.56

1.00

1.00

0.18

20

0.54

1.00

1.00

0.12

21

0.53

1.00

1.00

0.08

22

0.52

1.00

1.00

0.04

23

0.52

1.00

1.00

0.04

24

0.52

1.00

1.00

0.04

25

0.51

1.00

1.00

0.00

 (N=210 Base rate of simulated malingering = 51%)

Figure 18

Utility Rates of M-FAST Total Scores for the Clinical Samples
M-FAST

Scores

NPP

PPP

Specificity

Sensitivity

1

1.00

0.48

0.40

1.00

2

1.00

0.52

0.49

1.00

3

0.99

0.57

0.60

0.97

4

0.96

0.56

0.72

0.93

5

0.96

0.62

0.78

0.93

6

0.97

0.68

0.83

0.93

7

0.96

0.72

0.86

0.93

8

0.91

0.73

0.89

0.79

9

0.87

0.88

0.95

0.73

10

0.84

0.95

0.98

0.67

11

0.81

1.00

1.00

0.57

12

0.78

1.00

1.00

0.47

13

0.75

1.00

1.00

0.40

14

0.83

1.00

1.00

0.63

15

0.72

1.00

1.00

0.30

16

0.71

1.00

1.00

0.23

17

0.68

1.00

1.00

0.13

18

0.67

1.00

1.00

0.10

19

0.66

1.00

1.00

0.07

20

0.66

1.00

1.00

0.03

21

0.66

1.00

1.00

0.03

22

0.66

1.00

1.00

0.03

23

0.66

1.00

1.00

0.03

24

0.66

1.00

1.00

0.03

25

0.66

1.00

1.00

0.03

 (N=86 base rate of malingering = 35%)

Despite the fact there are seven different scales, the M-FAST total score is what is used in making a decision as to whether the subject is malingering or not, so in a sense the M-FAST really only has one “scale”.   Figures 17 and 18 are taken from the M-FAST test form, and are the tables to be used in assessing the utility rates of M-FAST total scores.  The manual suggests a cut-off of six, with higher scores suggestive of malingering.  As with any cut-off score, scores much lower or much higher than the cut-off are more conclusive than scores close to the cut-off.  

Structured Interview of Reported Symptoms (SIRS)

The leading example of the new breed of validated tests for malingering of psychopathology is the Structured Interview of Reported Symptoms (SIRS).[65] Studies have shown that persons with antisocial personality disorder and individuals who have been coached are generally unsuccessful in feigning on this test.[66] It is a 172-item structured interview with norms and extensive reliability and validity data, which makes it a true psychological test.  It is most appropriately used in assessing malingering of psychotic symptoms among adult subjects, although it can be used in assessing feigning in other types of psychopathology as well.  Clinicians are not allowed to deviate from the questions, so the testing is standardized.  The SIRS costs $209 from PAR and, is easily portable.  It takes forty-five minutes to an hour to administer.

Validation studies used simulation designs comparing non-psychiatric patients who were told to answer truthfully with those told to fake; and known group studies comparing psychiatric patients with known malingerers.  The differences between groups were statistically significant. More importantly, they were large enough that the test could reliably differentiate between feigned and genuine disorders.  Overall, the results of the research on this test are promising.[67]

Figure 19 – Brief Description of SIRS Scales[68]

Scale (scale designation/number of items)

Description

Primary Scales

 

Rare Symptoms (RS/8)

Consists of symptoms that occur very infrequently in bona fide patients.

Symptom Combinations (SC/10)

Consists of item pairs of common psychiatric problems that rarely occur simultaneously.

Improbable and Absurd Symptoms (IA/7)

Consists of symptoms having a fantastic or preposterous quality that renders them, by definition, unlikely to be true.

Blatant Symptoms (BI/15)

Consists of symptoms that untrained individuals are likely to identify as obvious signs of a major mental illness.

Subtle Symptoms (SU/17)

Consists of symptoms that untrained individuals are more likely to associate with everyday problems or minor maladjustment than with a major mental illness.

Selectivity of Symptoms (SEL/32)

Comprised of the combination of Blatant Symptoms (BL) and Subtle Symptoms (SU) scales, and indicates the non-selective or indiscriminant endorsement of psychiatric problems.

Severity of Symptoms (SEV/32)

Consists of the number of BL and SU symptoms endorsed at an "extreme" or "unbearable" severity.

Reported vs. Observed Symptoms (RO/12)

Based on a comparison of the patient's willingness to be honest and self-disclosing.

Supplementary Scales

 

Direct Appraisal of Honesty (DA/8)

Consists of items that address the patient's willingness to be honest and self-disclosing.

Defensive Symptoms (DS/19)

Consists of items that represent a variety of everyday problems, worries, and negative experiences which most individuals have experienced to some degree.

Overly Specified Symptoms (OS/7)

Consists of symptoms that are described with an unrealistic degree of precision and typically indicates an implausible attempt to quantify an emotional problem.

Symptom Onset and Resolution (SO/2)

Consists of items that reflect sudden, atypical changes in the course of a mental disorder.

Inconsistency of Symptoms (INC/32)

Consists of items identical to those contained in the BL and SU scales, repeated as a measure of discordant self-reporting.  The scales is based on the number of disparities between the initial and subsequent administration of these items.

Figure 19 describes the Primary scales and the Secondary Scales on the SIRS.  It is obvious that most of the detection strategies used on the SIRS are ones we have discussed before.  These detection strategies are familiar from Rogers’ Model for the classification of malingering we discussed at the beginning.   Many of the scales use items similar to the M-FAST items.  Unlike the M-FAST, however, all scales but Symptom Onset and Resolution have seven or more items, so the scales on the SIRS are true scales made up of a number of similar items.

The author of the M-FAST, Holly Miller, recommends that the SIRS be given whenever a score of more than six occurs on the M-FAST.  In this sense, the SIRS is the more complete and sophisticated test to identify malingering of psychopathology, to be used if the M-FAST screening test suggests malingering. 

Questions on the SIRS, are divided into three categories: General Inquires, Detailed Inquiries, and Repeated Inquiries. The majority of the SIRS items are General Inquiries.  General Inquiries are scored either zero (No), one (qualified yes or sometimes), or two (definite yes).  Each General Inquiry measures only one malingering detection strategy.  Nine SIRS scales receive all of their scores on General Inquiries.  About 20% of the General Inquiry items consist of two parts, the second of which is asked only if the client responds affirmatively to the first.  These two-part questions may take the form of either divided questions or rule-out questions.  An example of a divided question would be item 35 on page 5 of the SIRS booklet. The threshold question is, “Do you have any unusual beliefs about automobiles?”  If the subject answers “yes”, the follow-up-probe, is “Do you believe they have their own religion?”  Only if the subject says “yes” to the follow-up-probe is the item scored in the positive direction.  An example of a rule-out question is item 62 on page 7 in which the first question is, “Do you sometimes feel you are physically outside of your body?”  If that question is answered “yes”, the following rule-out question is asked, “Was this only because you were taking drugs or didn’t get enough sleep?”  These items are scored positively only if the follow-up question is answered “no”.  Some General Inquiries combine both the divided and rule-out formats, which results in a three part question.

In the Detailed Inquiries the subject is asked about specific symptoms, and whether he perceives a particular symptom to be a “major” problem.  After completion of a block of four of these items, the evaluator returns to any item endorsed as a “major” problem to ask whether that symptom is “unbearable”.  If the subject endorses the items as “unbearable” it is scored as a two.  If the client does not endorse an item as “unbearable” although he had previously endorsed it as a “major” problem, it is scored as a one.  If it was not endorsed as a “major” problem it is scored as a zero.  Four scales on the SIRS are derived from the Detailed Inquiries questions.

The Repeated Inquiries are identical to the Detailed Inquires.  Their purpose is to measure the consistency of self-reporting between two administrations of the same items.  This scale derived from these items is really a scale measuring inconsistency, comparable to VRIN and TRIN on the MMPI and to the three consistency ratios used on the VIP. This scale, called the “Inconsistency of Symptoms Scale (INC)” is one of the supplemental scales on the SIRS.  Studies have shown that subjects feigning a mental disorder have difficulty in consistently reporting “major” and “unbearable” problems.[69] The rest of the Supplemental Scales on the SIRS are really validity scales also since they are used chiefly for the interpretation of response styles.  The Direct Appraisal of Honestly scale (DA) asks about the honesty and completeness of the subjects’ reports, e.g., “Do you sometimes like to keep doctors guessing about what is really gong on with you?”  The Defensive Symptoms Scales (DS) asks about problems most people experience to some degree, and denial of those symptoms is indicative of defensiveness, e.g., “Are you sometimes too critical of other people?”  The Symptom Onset scale (OS) detects uncharacteristic onset of symptoms, e.g., “Did your emotional problems come suddenly so that one day you were completely normal and the next day you were very troubled.”  The Overly Specified Symptoms scale (OS) attempts to detect malingerers who report an unrealistic degree of precision in their symptoms, e.g., “Do you have exactly two nightmares every evening?”

As opposed to the supplemental scales, the eight primary scales on the SIRS yielded consistent differences between feigners and honest responders in four separate validation studies; thus, the primary scales are used in the classification of subjects as honest or feigning.

Above is a copy of the front page of the interview booklet for the SIRS.  Raw scores for each primary sub-test are designated into four categories: Definite, Probable, Indeterminate, and Honest.  The adult cut-off for the SIRS is if any one scale is in the “Definite” range, or if more than two scales are in the “Probable” range, the person is thought to be malingering.  Richard Rogers, the developer of the test, considers this a “conservative” cut-off in the sense it is weighted more towards increasing specificity (not calling those who are honest responders malingerers) than it is weighted towards sensitivity (calling all those who are malingering malingerers).  This is the opposite cut-off strategy from that used by Frederick, who believes sensitivity and specificity should be equally weighted. The decision as where to make the cut-off score depends on a value judgment.  On the one hand, some feel that calling someone a malingerer is a serious decision which we do not wish to take lightly, since there are serious consequences to mis-identifying someone as a malingerer when they are not.  Those who adopt this position would rather let some malingerers go undetected than mis-identify a valid responder as a malingerer.  This is the position taken by the SIRS.  Frederick, on the other hand, feels that the consequences to society if some malingerers are allowed to succeed are also serious consequences, and therefore is more willing to risk calling someone a malingerer who is not.  That is the position taken by the VIP.

The SIRS manual does not encourage using the SIRS with adolescents, but research done since the SIRS was published has shown utility of the SIRS with adolescents.

Figure 20[70]

                                                         Adults                           Adolescents

 

Established Criteria

 

Optimal Criteria

 

     PPP

NPP

 
PPP

NPP

SIRS Composite Index

         

1 Definite or >3 Probable

0.66

1.00

 

0.81

0.96

1 Definite or >2 Probable

0.79

0.98

 

0.89

0.94

           

Note: PPP = positive predictive power; NPP = negative predictive power.

Composite Index identifies an individual as feigning if one SIRS scale falls in the Definite range or a given number fall in the Probable range. Established criteria are based on cutting scores from the SIRS manual (Rogers, Bagby, & Dickens, 1992).  Optimal criteria are cutting scores based on the adolescent sample from Rogers, Hinds and Sewell (1996).

Figure 20 sets forth the PPP and NPP for the SIRS on both adults and adolescents.  Notice that for adults using the cut-off described above the test has fairly poor PPP (0.79) but very good NPP (0.98).  Using an even more conservative cut-off of one definite or more than three probable, the PPP is reduced (0.66), but the NPP becomes perfect (1.00).  Thus, those who feel strongly they never want to get any false positives might use that cut-off score.

Figure 21[71]

   
Adults 
 
Adolescents 
   

Established Criteria

 

Optimal Criteria

SIRS scale

 

Cutoff

PPP

NPP

 

Cutoff

PPP

NPP

                 

Rare Symptoms

 

>4

0.55

0.94

 

>4

0.55

0.94

Symptom Combination

 

>6

0.40

0.98

 

>4

0.51

0.98

Improbable/Absurd

 

>5

0.36

1.00

 

>2

0.55

0.98

Blatant

 

>10

0.60

0.98

 

>9

0.66

0.98

Subtle

 

>15

0.64

0.98

 

>13

0.74

0.94

Selectivity

 

>17

0.49

1.00

 

>13

0.77

0.94

Severity

 

>9

0.68

1.00

 

>7

0.75

0.94

Reported versus Observed

 

>6

0.59

0.96

 

>4

0.72

0.91

NOTE: PPP = positive predictive power; NPP = negative predictive power.

Established criteria are based on cutting scores from the SIRS manual (Rogers, Bagby, & Dickens, 1992).

Optimal criteria are cutting scores based on the adolescent sample from Rogers, Hinds, and Sewell (1996).

Figure 21 shows some PPP’s and NPP’s for each SIRS subscale, along with cut-off scores for each scale.  Notice that all of the cut-offs are made to maximize NPP, and that the PPP scores are relatively lower, consistent with Roger’s philosophy on the SIRS that false positives are to be avoided at all costs.

Rogers, et al (1995)[72]performed an analysis in order to determine if SIRS and MMPI-A data could enhance predictive capacity.  For the SIRS alone the NPP was .98 and the PPP was .78, for an overall classification rate of 87.8%.  The MMPI-A validity scales alone produced an NPP of .85 and a PPP of .87, with an overall classification rate of 85.8%.  Although these classification rates appear comparable, the base rate for malingering in clinical or forensic settings is likely to be much lower than the 50% rate used in this within-subjects design.  As the authors calculate, using the estimated prevalence of malingering among adult forensic populations of 15.7%[73], the MMPI-A had an overall classification accuracy of 85.3%, compared to the SIRS at 94.9%.

Factitious Disorder

At the beginning of this paper we discussed the differences between malingering and factitious disorder.  Malingering is the conscious fabrication or exaggeration of physical and/or psychological symptoms.  These symptoms are produced in pursuit of a goal that is easily recognizable with an understanding of the individual’s circumstances rather than his or her individual psychology.  Factious disorder is intentional production or feigning of psychological (but not physical) symptoms when there is a psychological need to assume the sick role, as evidenced by the absence of external incentives for the behavior, such as economic gain, better care, etc.  The critical issue in the malingering vs. factitious disorder distinction is the motivational basis for the feigning.  However, the distinction is not always clear, since easily recognizable external incentives do not preclude psychological needs, and psychological needs do not necessarily occur in the absence of external incentives.  The definition of factitious disorder is confusing, since it assumes that feigning is voluntary but is also in the service of unconscious motivation.  Rogers, et.al conducted a study comparing malingerers, factitious disorder patients, and bona fide inpatients, and found no significant differences in overall elevation of SIRS scale scores between malingerers and factitious disorder patients.[74]

Presumably the same inability to distinguish malingering from factitious disorder exists for all of the malingering tests we have discussed.  Thus, after a finding that invalid responding is occurring, interviewers need to interview the subject in detail regarding his or her motivation.  Rogers[75]suggests the following sets of questions in Figure 22 to use in attempting to distinguish between factitious disorder and malingering. 

Figure 22

General Malingering

  1. What would be the best outcome you could hope for from this evaluation?
  2. What is at stake (e.g., job, psychiatric treatment, money, benefits)?
  3. Do you feel you have to prove yourself? … Why?
  4. What will happen if you are disappointed?
  5. Are others trying to prevent you from getting what you deserve?

Factitious

  1. Do you admire doctors and what they can do for you?
  2. Are doctors the only people who really understand and care about you?
  3. Do others think that you are too dependant on mental health professionals for your feelings of well being?
  4. Are you sometimes concerned that mental health professionals are “too busy” to give you the time you truly need?
  5. Are you sometimes forced to “play up” your symptoms to get the attention that you deserve?

Summary: Tests Used to Detect Malingering of Cognitive Impairment

The Rey tests do not have demonstrated accuracy rates high enough to meet Heilbrun’s guidelines for test selection in forensic evaluations.  The TOMM and the VSVT are promising, but they do not have any methods of response style evaluation.  The VIP is probably the most sophisticated and best-validated test to detect malingering of cognitive impairment, and it does employ methods of response style evaluation.  Cutting scores are selected to balance sensitivity and specificity.                 

Summary: Tests Used to Detect Malingering of Psychopathology

The MMPI was the first test of psychopathology to measure “response sets”, and thus was the first test for detecting psychopathology that arguably met Helibrun’s criteria.  However, although research shows that F is the best single scale or index on the MMPI-2 to use in attempting to distinguish between true psychopathology and malingering there is significant contamination, such that a high level of fake-positives will inevitably occur.

The M-FAST and the SIRS represent major advances over the multi-scale inventories in terms of detecting malingering.  The M-FAST is appropriate as less time-intensive test, but it really only has one “scale”, and thus has limited ability to assess response set.  The SIRS is time-intensive, but does have sufficient items for a number of true scales, as well as scales measuring response set.  The SIRS uses cutting scores which are selected to optimize specificity at the expense of sensitivity.

Conclusions

Remember that Roger’s model for the classification of malingering required that the individual report unusual symptom patterns, but also required some form of corroboration, such as collateral interviews with family members showing evidence of relatively good adjustment in contrast to self-described gross impairment.  Almost all of the test authors suggest that malingering should never be diagnosed solely on the basis of any test.  It is always important to analyze as much collateral information as is possible, and to compare that information with the findings from the testing.

Evaluations need to estimate the base-rate of malingering among those they interview before it is possible to make any probability statements about whether a person is malingering based on test-data.  One commonly used approach is to assume the average base-rate of malingering during forensic evaluations reported by Rogers, et. al. in 1994, which was 15.7%.[76] Such estimates are probably under-estimates of the actual prevalence of malingering, because they do not include individuals who successfully feigned mental illness.

Because of the small N’s used in some of the validations studies, and because of the different definitions used for “known” and “simulated” malingerers, comparing sensitivity and specificity figures between different tests may not always be appropriate.  These issues are further confounded by the different approaches used in setting cutting scores, with some balancing sensitivity and specificity, and some attempting to avoid false positives. 

Measures of response consistency are very useful in determining random response styles and the better tests have them.  Measures of response consistency indicate only whether the subjects has endorsed the items consistency, not necessarily whether the answers have been accurate, since a sophisticated malingerer may be able to malinger consistently.

Valid performances on some measures do not rule out malingering.  Malingered performances on some measures do not rule out valid performances on others, and do not rule out genuine disability.  Malingerers of the “hybrid” type may be exaggerating true symptoms.  Malingering tests will likely not distinguish malingering from factitious disorder; so further questioning of motivations of patients may be necessary after reviewing test results. For most forensic evaluators, this distinction may not by an important issue.


[1] The Random House Dictionary of the English Language (1987).  2nd Edition Random House, New York.

[2] American Psychiatric Association (1994).  Diagnostic and Statistical Manual of Mental Disorders (4th ed.).  Washington D.C.

[3] Rogers, R., Sewell, K.W., & Goldstein, A. (1994).  Explanatory models of malingering: A prototypical analysis.  Law and Human Behavior, 18, 543-552

[4] Rogers, R., Harrell, E.H., Liff, C.D. (1993).  Feigning neuropsychological impairment: A critical review of methodological and clinical considerations.  Clinical Psychological Review, 13, 255-274

[5] Miller, H. (2001).  Miller forensic assessment of symptoms test.  Odessa, FL: PAR.

[6] Rogers, R. (1986).  Conducting insanity evaluations, New York, NY: Van Nostrand Reinhold.

[7] Binder, L.  (1993).  Assessment of malingering after mild head trauma with the Portland digit recognition test.  Journal of Clinical and Experimental Neuropsychology; 15, 170-82

[8] Rogers, R. & Cruise, K. (2000).  Malingering and deception among psychopaths, In Gacona, G. (Ed.).  The clinical and forensic assessment of psychopathology: A practitioner’s guide. New York, NY: Erlbaum (pp. 269-84).

[9] Rohling, M., Binder, L., Langhin Richsen-Rohling (1995).  Money matters: A meta-analytic review of the association between financial compensation and the experience and treatment of chronic pain. Health Psychology, 14:10, 537-47; One author cites estimates for the incidence of malingered psychological problems ranging from one percent to fifty percentResnick, P. (1988).  Malingering of post-traumatic disorders, in Rogers, R. (ed.) Clinical assessment of malingering and deception.  New York, NY: Guilford Press; cited in Binder, L. (1990).  Malingering following minor head trauma.  The Clinical Neuropsychologist, 14:1, 25-36

[10] Palmer, B., Boone, K.B., Allman, L. & Castro, D. (1995).  Co-occurrence of brain injury and cognitive deficit exaggeration.  The Clinical Neuropsychologist, 9 68-73.

[11] Rogers, R. (Ed.) (1988).  Clinical assessment of malingering and deception, New York, NY: Guilford Press. 

[12] Rogers, R. (1997).  Clinical assessment of malingering and deception 2nd Ed.  New York, NY: Guilford Press (pp. 385).

[13] Rogers, R., Gillis, J. & Bagby, R. (1990).  The SIRS as a measure of malingering: A validation study.  Behavioral Sciences and the Law, 8, 85-92.

[14] Rogers, R. (1990).  Development of a new classificatory model for malingering.  Bulletin of the American Academy of Psychiatry and Law, 18, 323-333.

[15] See generally, McCann, J. (1998).  Malingering and deception in adolescents.  Washington, D.C.:  APA (pp. 28-32).

[16] Rogers, R. (1990).  Development of a new classificatory model for malingering.  Bulletin of the American Academy of Psychiatry and Law, 18, 323-33.

[17] Ziskin, J. (1984).  Malingering of psychological disorders.  Behavioral Sciences and the Law, 12, 39-50.

[18] Faust, D., Hart, K.  & Guilmette, T.C. (1988).  Pediatric malingering: The capacity of children to fake believable deficit’s on neuropsychological testing. Journal of Consulting and Clinical Psychology; 56, 578-582

[19] Faust, D., Hart, K., Guilmette, T. & Arkes, H. (1988).  Neuropsychologist’s capacity to detect adolescent malingering.  Professional Psychology: Research and Practice; 19, 508-515

[20] Bernard, L.C. (1990) Prospects for faking believable memory deficits on neuropsychological tests, and the use of incentives in simulation research.  Journal of Clinical and Experimental Neuropsychology, 12; 715-28.  However, there is support for the idea that comparing the Attention/Concentration to the General Memory Index on the WMS-R is a useful way to detect malingerers when the Attention/Concentration Index is suspiciously low compared to the General Memory Index.  Mittenberg, W., Azrin, R., Millsaps, C., & Heilbronner, R. (1993).  Identification of malingered head injury on the Wechsler Memory Scale-Revised.  Psychological Assessment, 5:1, 24-40.

[21] Trueblood, W. and Binder, L. (1997).  Psychologist’ accuracy in identifying neuropsychological test protocols of clinical malingerers.  Archives of Clinical Neuropsychology, 12:1, 12-77.

[22] Heilbrun, K. (1992).  The role of psychological testing in forensic assessment.  Law and Human Behavior, 16,257-72.

[23] Rogers, R. (1990).  Models of feigned mental illness.  Professional Psychology; 21, 182-88.

[24] Rogers, R. (1984).  Towards an empirical model of malingering and deception.  Behavioral Sciences and the Law, 2, 93-112.

[25] See Goebel, R. (1983).  Detection of Faking on the Halstead-Reitan Neuropsychological Test Battery.  Journal of Clinical Psychology, 39, 731-42.

[26] E.g. Rogers, R., Harrell, E. and Liff, C. (1993).  Feigning neuropsychological impairment: A critical review of methodological and clinical considerations.  Clinical Psychology Review, 13, 258.

[27] See Gouvier, W.D., Hayes, J.S., and Smiroldo, BB (1998).  The significance of base-rates, test sensitivity, test specificity, and subjects knowledge of symptoms in assessing TBI sequelae, and malingering; in Reynolds, C.R. (ed.) Detection of malingering during head injury litigation.  New York, NY: Plenum Press.

[28] From McCann, J. (1998).  Malingering and deception in adolescents.  Washington, D.C.: APA.

[29]   See, McCann, J.T. (1998).  Malingering and deception in adolescents: Assessing credibility in clinical and forensic settings.  Washington, D.C.: APA (pp. 144), for further explanation of the derivation of these equations.

[30] Faust, D and Nurcombe, B. (1989). Improving the accuracy of clinical judgment.  Psychiatry, 52; 197-208.  

[31] For base-rates below 50% relying on the test will increase diagnostic accuracy when base-rate > false positives + false negatives.  Using Dr. Jones’ approach to DID diagnosis .001 is not > .01 + 0.  Therefore, he ought not count on the diagnostic utility of his test in circumstances in which maximizing overall diagnostic accuracy is the most important issue.  For base rates above 50%, the test will increase diagnostic accuracy when 1-base-rate > false positives + false negatives.

[32] Lezak, M. (1983).  Neuropsychological Assessment.  Oxford University Press, New York; 618

[33] From Rogers, R., Harrell, E. and Liff, C. (1993).  Feigning neuropsychological impairment: A critical review of methodological and clinical considerations.  Clinical Psychology Review, 13, 255-74.

[34] Suhr, J. et.al. (1997).  Memory performance after head injury: Contributions of malingering, litigation status, psychological factors, and medications use.  Journal of Clinical and Experimental Neuropsychology, 19, 500-14.

[35] Van Gorp, W., et.al. (1999).  How well do standard neuropsychological tests identify malingering?  A preliminary analysis.  Journal of Clinical and Experimental Neuropsychological, 21:2, 245-50.

[36] Gothard, S., et.al. (1995).  Detection of malingering in competency to stand trial evaluations.  Law and Human Behavior, 19:5, 493-05.

[37] See for example Wiggins, E. & Brandt, J.  (1988). The detection of simulated amnesia.  Law and Human Behavior; 12, 57-78, which sets forth The Wiggins and Brandt Personal History Interview.

a The malingering criteria suggested by Rawling and Brooks (1990) were found by Milanovich, Axelrod, and Millis (1996) to have poor specificity (i.e., numerous individuals from clinical populations were misclassified).  Further research prior to clinical use has been suggested by these authors.

[38] Frederick, R. (2000).  Introduction to the Development and interpretation of the VIP test.  Minneapolis, MN: NCS Assessment (pp. 10).

[39] Binks, G., Gouvier, W., & Waters, W. (1997).  Malingering detection with the dot counting test.  Archives of Clinical Neurospsychology, 12:1, 41-46.  No suggested cut-off score is offered in the article.

[40] Pankratz, L., etal. (1975).  A forced choice technique to evaluate deafness in a hysterical or malingering patient.  Journal of Consulting and Clinical Psychology, 43, 421-22.

[41] Pankratz, L. et. al. (1979).  Symptom validity testing and symptom retraining: Procedures for the assessment and treatment of functional sensory deficits. Journal of Consulting and Clinical Psychology, 47, 409-410.

[42] Tombaugh, T. (1996).  Test of memory malingering. North Tonawanda, NY: Multi-Health Systems, Inc.

[43] Frederick, R. & Crosby, R. (2000).  Development and validation of the validity indicator profile.  Law and Human Behavior, 24:1.

[44] Frederick, R., Crosby, R., & Wynloop. (2000).  Performance curve classification of invalid responding on the validity indicator profile.  Archives of Clinical Neuropsychology, 15:4, 281-300.

[45] Reitan, R. & Wolfson, D. (1995).  Consistency on responses on re-testing among head-injured subjects in litigation versus head-injured subjects not in litigation.  Applied Neuropsychology, 2, 67-71.

[46] Reitan, R., Wolfson, D. (1997).  Consistency of neuropsychological test scores on head-injured subjects involved in litigation compared with head-injured subjects not involved in litigation: Development of the Retest Consistency Index.  The Clinical Neuropsychologist, 11:1, 69-76.

[47] Greene, R. (1997).  Assessment of malingering and defensiveness by multiscale personality inventories, in Rogers, R. (ed.), Clinical assessment of malingering and deception, 2nd EdNew York, NY: Guilford Press.

[48] Ibid.

[49] Hathaway, S. & Mckinley, R. (1943).  The Minnesota mutiphasic personality inventory.  Minneapolis, MN: University of Minnesota Press.

[50] Meehl, P. & Hathaway, S. (1946).  The k factor as a suppressor variable in the MMPI.  Journal of Applied Psychology, 30, 525-64.

[51] Rogers, R., et.al. (1994).  A meta-analysis of malingering on the MMPI-2.  Assessment, 1:3, 227

[52] Ibid.

[53] Butcher, J., et.al. (1989). Manual for the administration and scoring of the MMPI-2.  Minneapolis, MN: University of Minnesota Press.

[54] From Greene, R. (1997).  Assessment of malingering and defensiveness by multiscale personality inventories.  In Rogers, R.  Clinical assessment of malingering and deception (2nd Ed.). New York: NY: Guilford Press (pp. 172).

[55] Arbisi, P. and Ben-Porath, Y. (1995).  An MMPI-2 infrequent response scale for use with psychopathological populations: The infrequency-psychopathology scale, FpPsychological Assessment, 7:4, 424-31.

[56] Ibid.

[57] Rogers, R. et. al. (1994).  A meta-analysis of malingering on the MMPI-2.  Assessment, 1:3, 227-37.

[58] Stein, L., Graham, J. & Williams, C. (1995).  Detecting fake-bad MMPI-A profiles.  Journal of Personality Assessment, 65, 415-27.

[59] From McCann, J. (1998).  Malingering and deception in adolescents. Washington, D.C.: APA.

[60] Bagby, R., et. al. (1997). Detecting feigned depression and schizophrenia on the MMPI-2.  Journal of Personality Assessment, 68:3, 650-64.

[61] Bagby, R., et. al (2000). Can the MMPI-2 validity scale detect depression feigned by experts? Assessment, 7:1, 55-62.

[62] MCCann, J. (1998Malingering and deception in adolescents. Washington,D.C.: APA (pp.100).

[63] From Miller, H. (2001).  Miller forensic assessment of symptom test. Odessa, FL: PAR.

[64] Rogers, R. (1992).  Professional manual for the structured interview of reported symptoms. Odessa, FL: PAR.

[65] Ibid.

[66] Rogers, R. (1997).  Clinical assessment of malingering and deception.  New York, NY: Guilford Press.

[67] Rogers, R., et.al. (1992)Structured interview of reported symptoms.  Odessa, FL: PAR.

[68] Rogers, R., et.al. (1992)Structured interview of reported symptoms. Odessa, FL: PAR (pp. 9).

[69] Figure 20 and 21 are adapted from McCann, J. (1998).  Malingering and deception in adolescents.  Washington, D.C.: APA (pp. 60- 61).

[70] From McCann, J. (1998).  Malingering and deception in adolescents.  Washington, D.C.: APA.

[71] Rogers, R., Hinds, J. & Sewell, K. (1995).  Feigning psychopathology among adolescent offenders: Validation of the SIRS MMPI-A and SIMS.  Paper presented at the 103rd meeting of the American Psychological Association in New York City, cited in Shaw, L and Bagby, R. (1997).  Children and deception, in Rogers (Ed.) Clinical assessment of malingering and deception, New York, NY: Guilford Press.

[72] Rogers, R., Sewell, K & Goldstein, A. (1994).  Explanatory models of malingering: A proto-typical analysis.  Law and Human Behavior, 18, 543-552.

[73] Rogers, R., et. al. (1992).  Structured interview of reported symptoms.  Odessa, FL: PAR (pp. 26).

[74] Ibid.

[75] Rogers, R., Sewell, K.  Goldstein, A. (1994).  Explanatory models of malingering:  A prototypical.  Law and Human Behavior, 18, 543-552.