• The Rorschach Inkblot Method – Science or Pseudoscience?

    Rorschach image
    What do you see here?

    As mentioned in my previous post, this is the second in a seven part series examining science and pseudoscience in the field of personality and psychopathology assessment. My goal is to expose the less than stellar underbelly of some of the most commonly-used and well known measures, particularly those that fall under the grouping of “projective” assessments. The first measure I will turn a careful and critical eye to is one of the most easily recognized psychological measures, the Rorschach Inkblot Method. Be warned, this is a long read.


    To gain an understanding of the strength of beliefs for and against the use of our first test, the Rorschach (also called the Rorschach Inkblot Method [RIM] has been described as being “the most cherished and the most reviled of all psychological assessment tools” (Hunsley & Bailey, 1999, p. 267). Frequently listed as one of the most commonly used psychological measures by clinical and school psychologists (Archer & Newsom, 2000; Hojnoski et al., 2006) and frequently taught in the clinical psychology doctoral programs (Belter & Piotrowski, 2001), although anecdotal evidence suggests a decline across the past decade. The Rorschach also holds a grip on the public imagination, as evidenced by the use of similar inkblots in media from comic books (“Watchmen” by Alan Moore and Dave Gibbons) to music videos (“Crazy” by Gnarls Barkley).

    Herman Rorscach
    (photo from 1910)

    Hermann Rorschach’s development of the test that would bear his name is an interesting story. He was apparently intrigued as a youth (as was much of Germany) by a popular parlor game called Klecksographie (roughly “Blotto” in English), where one would drip ink onto a piece of paper, fold it in half, and then compete to give the most numerous or interesting answers (Exner, 2003). Psychological research using inkblots had been conducted by a number of researchers in the early part of the 20th century, but had primarily confined itself to the areas of visual perception and memory processes, although Alfred Binet researched their use in measuring intelligence (Zubin et al., 1965). Rorschach, however, was either unaware or ignored these lines of research when, in 1918, he created his blots and developed their usage. He did, however, appear to be inspired by work conducted by a medical student in Zurich, who was unable to show success in distinguishing psychotic patients from non?patients using responses to inkblots (Gurvitz, 1951).

    Rorschach’s inkblots were not what one would have seen in a game of Blotto, however. He appears to have painstakingly constructed them using ink and watercolors, rather than relying purely on chance or random drips and patterns (Exner, 2002; Morganthaler, 1954). Based on his only major work (he died at age 37, only nine months after publication of it), Rorschach was particularly concerned with two factors in a person’s response to the blots: movement and color (Rorschach, 1921/1964). He does not appear to have been influenced by Freudian theories in constructing the inkblots or their interpretation, and instead had his own theory that the perception of movement and color would give insight into personality. In particular, he thought movement responses were related to introversion, while color responses were related to extraversion (“extratension” in his terminology).

    The idea that perception of movement and introversion were related appears to be based in part on muscle movement and dream research by a philosopher in the 1800s named John Mourly Vold (Ellenberger, 1993). Rorschach took Mourly Vold’s idea that inhibition of movement during sleep would cause more dream imagery involving movement and applied it to the responses generated by his inkblots. In other words, his theory was that introverts should see more images that are moving in the blots, due to their being psychologically inhibited. Rorschach also outlined a theory that the perception and use of color in descriptions of the inkblots was related to affect and extraversion. In particular, those who used more color responses were more extraverted and likely to show high levels of emotion. Unlike with his ideas about movement, however, his theory about color seems to have been pulled from common vernacular (“black moods” for example) and personal opinion rather than any research or previous theories (Rapaport, Gill & Shafer, 1946). Rorschach also seemed particularly interested in the balance of introversion and extraversion, called “Experience Balance” in English (abbreviated EB). The ratio of movement (M) to color responses, he believed, would reveal a person’s “basic experience and orientation toward reality” (Wood, Nezworski, Lilienfeld, & Garb, 2003).

    Rorschach’s reasons for focusing on color and movement, therefore, need to be examined to see if they are actually supported by a preponderance of scientific evidence. A review of the literature shows that the answer is, for the most part, “no.” EB, for example, has not consistently been demonstrated to be related to introversion or extraversion (see Holtzman, 1950 or Wysocki, 1956 for disconfirming evidence; Allen, Richer, & Plotnick, 1964 for confirming), and Color responses have not been consistently related to any particular diagnosis such as depression (for a review see Stevens, Edwards, Hunter & Bridgman, 1993). It should be noted, however, that some of Rorschach’s hypotheses do have some consistent support. For example, that a more intelligent person would provide higher numbers of M responses has been supported to a moderate degree (see Frank, 1979 for a review), as have some indicators of psychotic disorders (see Dawes, 1994; Lilienfeld, Wood, & Garb, 2001).

    So, was Rorschach right? The answer is “mostly not” with the occasional “yes.” While his major hypotheses have not been shown to be correct, some minor ones have support. What does this mean for the test as a whole, then? Should it all be thrown out? These inconsistencies and concerns led to numerous within?group conflicts during the 1930s and beyond, as different groups of researchers and clinicians developed further types of scores, or refined the meaning of certain scores (see Exner, 1969 for a review of major systems of interpretation). It was during these conflicts that some began to use the Rorschach as a more psychoanalytically?oriented test, interpreting responses to blots as if they were dreams (content approach) rather than relying on a more formal structural approach (e.g., following Rorschach’s methods).

    Furthermore, well?conducted research in the 1950s showed that the Rorschach was not more useful (and was in fact slightly less useful) than a more objective measure of personality, the MMPI, and appeared to highly overpathologize normal individuals (e.g., Little & Schneidman, 1959). Further research showed that it added little to nothing in the way of incremental validity if one already had access to biographical information and a person’s history (see Garb, 1998 for a review). By the beginning of the 1960s, most research?oriented and scientifically?based psychologists thought the Rorschach was not a useful instrument (see critiques by Chronbach, 1949; Jensen, 1958).

    Exner's CS
    The last edition of the CS prior to Exner’s death in 2006

    Such criticism and lack of scientific support led directly to a number of reform attempts for the Rorschach. The most complete one, and the one that likely saved the Rorschach from being consigned to the graveyard of psychological tests, was John Exner’s Comprehensive System (CS; 1974, 1993). The CS included reviews of the literature, norms, and administration guidelines – all things that were lacking at the time. Exner also led extensive research into reliability and validity of the traditional scores, while at the same time developing new ones. Exner has been described as having “almost single?handedly rescued the Rorschach and brought it back to life” (American Psychological Association Board of Professional Affairs, 1998, p. 392). All the while, though, findings by researchers other than Exner or his associates began to appear, with results often in sharp contrast to those reported in the CS’s manual. In fact, the vast majority of the supportive studies cited in the latest CS manual (Exner, 1993) are unpublished studies conducted by Exner and his research team at Rorschach Workshops (Wood et al., 2003).

    As research on the CS conducted by those without ties to Exner and the Rorschach Workshop began to accumulate in the 1980s and 1990s, numerous concerns that were identical to those raised by research in the 1950s and 1960s were raised: overpathologizing, low diagnostic accuracy outside of psychotic disorders, lack of relationship to objective measures of psychopathology and personality (for a review see Hunsley & Bailey, 2001; Lilienfeld, Wood, & Garb, 2000). Even the norms of the CS were found to be seriously different from the results of other studies (e.g., Shaffer, Erdberg, & Haroian, 1999; Wood et al., 2001). Flaws within Exner’s own norms were even found, as over a third of his normative sample was found to not exist; from his own report, 221 of the 700 normative subjects were actually duplicate records (Exner, 2001). Of special note, the majority of supportive studies for the Rorschach have recently been published in the Journal of Personality Assessment, a well?respected journal that publishes large amounts of high quality research. It also happens to be the official journal of the Society for Personality Assessment, which originated as the Rorschach Institute, and is almost exclusively staffed by editors who are very strong proponents of the Rorschach’s use.

    What, then, can be said about the usage of the Rorschach in clinical settings? Interestingly, both opponents (Wood, Lilienfeld, Garb, & Nezworski, 2000) and proponents (Weiner, 1999) conclude that it should not be used diagnostically. To wit, “Rorschach data are of little use in determining the particular symptoms a person is manifesting….Accordingly, the nature of these symptoms is better determined from observing or asking directly about them than by speculating about their presence ” (Weiner & Greene, 2008, p. 396). Clinically, there are some CS scores that are related to intelligence and psychotic disorders, just as Rorschach’s original system found almost 90 years ago (Wood, Nezworski, & Garb, 2003). But in terms of relationship to currently used diagnostic categories, there is currently no solid scientific evidence that using the Rorschach under the CS can accurately and consistently assist with the diagnosis of mental disorders in general (outside of psychotic disorders), or any specific category of anxiety, mood, or eating disorders, to name a few (Wood et al., 2000). There is, however, a non?CS scale – the Elizur Anxiety scale – that relates to realworld anxious behaviors (Aronow & Reznikoff, 1976; Goldfried, Stricker, & Weiner, 1971), although not to specific disorders. Unfortunately, it is best regarded as a research instrument, given the lack of standardized norms or methods of administration (Wood, Nezworksi, & Garb, 2003).

    In summary, then, the Rorschach began life in 1922 as a theoretically shaky, non-empirically supported test for the majority of psychopathology (psychotic disorders being the exception). Despite almost 90 years of research and usage on it, and various iterations of scoring and administration criteria, the preponderance of evidence today indicates that it has changed little over years. There is not any reasonable, empirically?supported reason to use the Rorschach as a tool to assist in the diagnosis of any mental disorder.

    Verdict – Pseudoscience for the vast majority of claimed uses


    Next time, I will be writing on the Thematic Apperception Test. Then I will move onto projective drawings, sentence completion tasks, and the (non-projective but very widely used) Myers-Briggs Type Indicator. I will conclude the series with a look at scientifically reliable and valid measures of personality.

    (For a full list of the works I’ve cited above, feel free to email me)

    • Great article. Whenever I hear about Rorschach ink blots, it brings to mind this episode of the show The Big Bang Theory:


      Rorschach ink blots are so vague, and as you show, unproven. They’re useless….

      • gps

        I’ve not seen that clip before, pretty funny stuff!

    • Amy

      As somebody in a Clinical Psychology PhD program I can say from experience that evidence based programs don’t use project tests — ever. It’s a very harmful statement for the field of psychology to imply that most of us blindly use these assessments that lack validity.

      • You are correct, Amy. Unfortunately, programs training practitioners in EBP are still in a minority. I’m a clinical psychologist from a highly EBP graduate school and pre-doc internship who is now in charge of a program that focuses on training master’s level mental health practitioners in EBP. As such, I interface with many practitioners and other schools, doctoral and master’s level, and can say with no hesitation that EBP is only seen in a small percentage of the practicing population. I know numerous licensed psychologists who regularly use projective measures, which is why this post is needed – to show the public that these are NOT useful measures.

        I’m pretty sure that no harm is coming to the field of psychology by me pointing out that this measure (and others) are, for the most part, useless. If anything, it could only make people more aware of EBP (which a number of my other posts focus on).

    • Martine

      Hey! I would really like to have the full list of the works you cited!
      What is your email?? 🙂

      • I’ll send it to your email, Martine 🙂

        • Martine

          Thank you! 🙂 Writing an assignment in my Psychology and science class at UiO, decided to write about the Rorschach tests scientific status and the projective paradox. Doing a litterature search at the moment and saw this, and thought maybe I could be inspired by some of your sources! 🙂

        • Jenny

          I loved your article! Please may I have a copy of your reference list for your Rorschach and TAT articles? 🙂

    • Grace Jung

      Please, Please delete the picture which is out of the psychological test. Because if you once took a closer look to one of the pictures you can never do the real test. The real test could not be evaluated properly anymore. (Sorry for my English I am no native).

      • Grace, these images are actually all in the public domain now. In addition, as I review above, the Rorschach has failed the test for decades as a reasonable measure of psychological symptoms. People shouldn’t be using it…period.

        Finally, the Wikipedia article has all the images – http://en.wikipedia.org/wiki/Rorschach_test

        • Grace Jung

          Yes I found the pictures also on other sides – all over the web – as you said, and that is realy sad. That way we have to add “do you looked at these cards bevore” to the criterion of exlusion, which you ask everybody, bevor doing the test.
          So that I understand you properly – you think, that the rorschach should not be used anymore? Is that right or did I understud something wrong?

          • Like I outline in the above post, there’s not any compelling evidence showing that the Rorschach should be used in diagnostic work.

            • Grace Jung

              You were to fast for me. I wanted to edit that one…
              Now I notice that I should have red the whole article and
              the over Posts before “jumping into the water”.
              now I am here. I am afraid my time (writing here) is waisted, but…
              it can be at leased finally another opinion. I hope that you will
              answer adequate and not just rude and inappropriate. But that is up
              to you.
              As I said I am no native speaker and I hope that no
              misunderstandings will accure. (…and my spell check just stopped
              Rorschach as a project Test is able to determine information without the possibility that the proband answers ‘social desired’.

              The testing reliability showed, that… it was ~r=.7 or .8 I have to look that up. …means very, very reliable.
              There was a study there they did the rorschach several times with the same persons. They told the probands that they should answer in the 1. Test to answer normal; in the 2. so they make a good impression; in the 3. a ‘bad’ impression. In the end the results nearly stayed the same. There was no relevant difference, they wasn’t able to fake the result.

              Objectivity… not so good, because that test has a
              high complex-ability for the probe. You also should not use the
              American “cards” which say: If the proband did see the
              ‘figurebetween'(?) German “Zwischenfigur” as “a Ghost” it have to be signed as “Detailzwischenfigur; gute Form; Mensch”.
              (I have to admit that that is not the best example but their are Forms there you can discuss.)
              You should not do that.
              Because you have to sign the answers in their individual
              context. Rorschach did the masterpice to give us the possibility to objectify a great range of answers – with the ‘issue’ that we have to involve the context.

              About Validity: I also have to look it up again.
              Project test are often more open with the answers you are
              “allowed” interpret. A questionnaire has borders,
              especially if you give them to the proband on a computer.

              You will always have “week” Items in a psychological test.
              “Do you have suicidal thoughts?” is always – from the testing
              background a week item, but you would and should never delete that from a depression scale.
              You will find such things in nearly every test.
              My last point for today, its always a little like the
              difference between going through a questionnaire in an interview and an ‘open’ interview.
              You will always get multifaceted answers and a picture that is closer to the word “complete” as if you would only use the questionnaire.