Understanding Real-World Visual Expertise in the Multidimensional Space

This is my ongoing thesis project. Please see on right the slide I used during 5th Annual Vanderbilt Three Minute Thesis (3MT) Competition. See my talk here. See my talk transcript below. See the code I use for my online experiments here (similarity ratings task), here (bird expertise test) and here (bird identification task).


The Quest for Visual Expertise

Given an X-ray image, Ms. Red, an expert radiologist, can diagnose Alzheimers from a single look. In contrast, Mr. Green, an aspiring medical student, still has to guess. This ability that Ms. Red has, the exceptional ability to make judgments with images, is called visual expertise. Beyond radiology, visual experts like Ms. Red play important roles in many domains in our society, such as airport baggage screening and Forensic fingerprint identification. My research goal is to understand such visual expertise so that researchers can design more effective training systems for aspiring students like Mr. Green to develop their visual expertise. Specifically, I study what experts and novices see & how that relates to their performance. To answer these questions about general visual expertise, I use bird identification ability as example to study, because bird identification is a commonly-used and accessible example of visual expertise, unlike radiology and forensics.

First, I study what bird experts and novices see in their mental space. In our mind’s eye, we see things as points located in a multidimensional space. For example, everyone has a unique mental space for puppies, with dimensions such as the puppy’s size, cuteness, and intelligence, In your puppy space, you might see your puppy as located lower than your neighbor's on the dimension of size, but much higher on the dimension of cuteness. And your neighbor would certainly have a different puppy space. Your personal puppy space would influence the similarities you observe among different puppies, providing a good metric for me to identify the dimensions of your puppy space. Using this logic, I model and measure the experts' and novices’ bird space by gathering their ratings of similarities among different birds. This will answer the question of "what they see”. Then I relate “what they see” to “how they perform”, in this case bird identification performance. This relationship between “what they see” and “how they perform” has significant implications. To be specific, if there are dimensions in the experts’ mental space that are critical for the efficient diagnoses using X-ray images, which novices like Mr. Green are not aware of, we can highlight those dimensions in their training to improve the training efficiency.

Modeling the Dynamics of Visual Object Categorization (with Dr. Jeff Annis & Dr. Thomas Palmeri)

See below an abstract about this project.


Novices are faster and more accurate to verify category membership at an intermediate level of abstraction, the so-called the basic or entry level (e.g., “bird”), than a superordinate (e.g., “animal”) or subordinate level (e.g., “Blue Jay”). One explanation for the relative speed of basic-level categorization is that categorization at this intermediate level is a prerequisite for more superordinate and subordinate categorizations – you need to know that it is a bird before you can tell whether it is an animal or a Blue Jay. An alternative explanation is that basic-level categorizations are fast because the basic level is more differentiated and informative, not that it happens first. We evaluated these two hypotheses by fitting the well-known drift-diffusion model of perceptual decision making to accuracy and response time data from a large sample of online participants, including both novices and individuals with varying levels of birding expertise. We identified these two hypotheses with differences in process parameters within the diffusion model: variability in non-decisional, perceptual processing time across category levels would indicate the former hypothesis, whereas variability in drift rate across category levels would indicate the latter hypothesis. We applied the diffusion model using a Bayesian hierarchical framework, which provides a powerful account of individual differences in the model parameters across conditions. Behaviorally, we replicated the basic-level advantage for novices in our online experiments. Theoretically, we found that variability in categorization speed across levels of categorization were well captured by variability in the drift rate across levels without any changes in the non-decisional, perceptual processing time across levels. Our results help to unravel the psychological processes that give rise to the behavioral pattern in speeded categorization and inform the understanding of individual differences in visual object categorization.

The perception of a face can be greater than the sum of its parts (with Dr. Thomas Palmeri)

See below an abstract about this project.


Holistic processing is often used as a construct to characterize face recognition. An important recent study by Gold, Mundy, and Tjan (2012) quantified holistic processing by computing a facial-feature integration index derived from an ideal observer model. This index was mathematically defined as the ratio of the psychophysical contrast sensitivities squared for recognizing a whole face versus the sum of contrast sensitivities squared for individual face parts (left eye, right eye, nose, andmouth). They observed that this index was not significantly different from 1, leading to the provocative conclusion that the perception of a face is no more than the sum of its parts. What may not be obvious to all readers of this work is that these conclusions were based on a collection of faces that shared essentially the same configuration of face parts. We tested whether the facial-feature integration index would also equal 1 when faces have a range of configurations mirroring the range of variability in real-world faces, using the same experimental procedure and calculating the same integration index asGold et al.When tested on faces with the same configuration, we also observed an integration index similar to what Gold et al. reported. But when tested on faces with variable configurations, we observed an integration index significantly greater than 1. Combing our results with those of Gold et al. further clarifies the theoretical construct of holistic processing in face recognition and what it means for the whole to be greater than the sum of its parts.