Your Brain on Blogs

An exploration of the brain, mind, and related phenomena by

the Neuroscience Graduate Forum at the University of Southern California

MOST RECENT ARTICLE -- Spring 2021

The Science of Art: A New Microscope

for Understanding the Visual Percept

(By Jong Woo Nam, NGP 2019 Cohort)

I've occasionally struggled with heightened anxiety and this past year, COVID-19 and all the uncertainty it brought with it, caused my anxiety to return. Being a faithful neuroscientist, I sought solutions in my past learnings, specifically on the lessons learned during Prof. Mara Mather’s lecture on the therapeutic effects of meditation during our core course. I’ve personally found her research on the benefits of rhythmic breathing and its connection to default-mode network particularly interesting, and I was in the right emotional state to experiment and personally validate her findings.

After some brief research, I discovered that USC offered its own platform for meditation: the Mindful USC app. (There were plenty of other meditation apps available, but the USC’s one seemed to be backed with more scientific research). So I downloaded the app and selected a program best suited for my disturbed psychological state. The app gives a list of questions that I had to answer, and it recommended me to follow something called the grounding practice.

Briefly, the grounding practice helps you tame your anxiety by focusing not on your thoughts but on your senses. For several minutes, the program asks you to do so by naming colors you see, sounds you hear, textures your skin is touching, etc. Though I have not done research on the exact science behind the design of the program, it was really effective in battling my anxiety. Those interested should do research themselves on the scientific literature, since this is not really the topic I want to cover in this blog post.

How drawing evolved from a therapeutic hobby to a scientific inquiry on the visual percept.

As my anxiety got more under control, I eventually moved away from practicing the same meditation program over and over, but carried the important lesson - that focusing on your senses reduces your anxiety - going forward. Drawing was what I settled on instead, which was the perfect hobby for me because 1) I had an iPad ( + Procreate - the app I used), 2) it required me to focus on my vision for an extended period of time (the “grounding on your senses” part), and 3) it was entertaining.

The first thing I drew was my family.

My attempt at reproducing a photo of my sister taken at a cruise ship. Trust me, I’ve gotten better at drawing since.

This is my attempt at copying a photo of my sister (taken on a cruise ship, if that was not obvious!). I remember sending a version of my drawing to my sister, only to be scolded because it did not look like her. So the version you are looking at is a heavily-edited version of my first attempt. I practiced drawing a couple more photos of people, and realized that it was MUCH HARDER to draw the face of someone with just the pencil tool in comparison to, say, the buildings you see behind.

Similarly, it was much more difficult to draw someone’s face from memory, while drawing an object from memory was relatively easy. To demonstrate how easy the latter is, I made a scribble on a new page, and tried converting the scribble to a whale.

A scribble transformed to a whale.

A friend took pity on the gigantic whale eating so little, so I drew in a couple more fish to fill it up.

It was rather entertaining to see how many of my friends seemed to like my whale drawing more (which took only a couple seconds to draw), in comparison to how they responded to my hours-long efforts in trying to draw a person with a face intact.

With more practice, I’ve eventually improved at drawing objects, but still struggled with drawing faces. So the vision scientist in me had to ask: what causes this discrepancy? Why is drawing an object so much easier than drawing a face?

Computation - Recognition Complexity Paradox: The harder to draw, the easier to compute.

Well, as a starter, our brain designates two separate areas for the two tasks: the Fusiform Face Area (FFA), and Lateral Occipital Complex (LOC) for face and object recognition respectively. Those who have had a neuroscience course in high school or college may have already thought about these two areas of the brain when reading the past section. However, the mere designation of these capacities to two separate brain regions is an easy yet incomplete answer because it does not truly reveal why these visual stimuli are treated separately.

A better answer actually surfaces while observing the attempts at computationally replicating the brain’s performance on facial and object recognition capabilities. Recreating the human-level perception of face recognition has been successful (Xiaomin Yue et al., 2011; Margalit E. et al., 2006), while we still struggle to create a human-level object recognition algorithm. In other words, while faces are harder to draw, recognition of them is a much easier computational task.

Let me briefly explain the Gabor-Jet Model for faces to demonstrate the simplicity of the computation. Gabor-Jets are the 2-D equivalent of sinusoidal waves, performing a 2D Fourier transform (well, wavelet transform to be exact) on the images. To those more familiar with biology and less math, you can think of the Gabor-Jets as the set of oriented receptive fields you see in V1. Using a set of Gabor-Jet filters, one can represent the face as a set of oriented lines of varying size and frequency at some set of positions (red dots). It was found that taking Euclidean distance of these representations of the faces accurately predicts how humans will score similarity between two faces.

Using Gabor-Jet representation, one can simply take the Euclidean distance to predict how similar humans will score two faces (Xiaomin Yue et al. 2012).

You can imagine how sensitive such a representation would be to the location and size of what composes a face. It is also known that face perception is configural, meaning one identifies a face as uniquely different when a single component (i.e the nose) is different while the others (i.e. eyes, mouth, ears) stay the same. Being an inexperienced drawer, it is thus very difficult to draw a face that could be uniquely identified as someone with sub-computer level precision.

On the other hand, even with the recent advent of deep-learning, engineers still struggle to build a human-level object recognition algorithm. Why is it so much harder? Objects can take a large variety of forms in the wild, while faces follow a set of rules. (Think, for example, of how a chair could look versus how a face looks. A chair can take a large variety of shapes while faces are much more similar looking stimuli). Thus, a good object recognition algorithm must be able to know how to abstract out what defines an object to be able to correctly label an extensive range of stimuli to one thing.

To put it back in the perspective of our original question, drawing an object is much easier because your brain does the hard computation by abstracting the representation for you! This allows me to ultimately afford a high degree of artistic freedom (or a sub-computer level precision in drawing) to easily transform a scribbled line into a whale.

Art History: the fight between closer-to-retina vs closer-to-LOC representations.

Looking at the history of art, artists have secretly been ancient cognitive neuroscientists experimenting with our visual percept. Ernst Gombrich, in his book “The Story of Art”, summarizes the history of art as the fight between the Egyptian and the Greek way of representing the world.

Let’s look at an Egyptian painting of a person.

An Egyptian painting of two people (unknown artist). A person is almost always drawn following the same viewpoint format.

Egyptians thought that an object’s beauty is best captured at its best viewing angle. For example, a person is best captured when each component of a person is drawn at its best viewing angle. So in all Egyptian paintings, you almost always see a person’s face viewed from the side, the torso facing the front, and the legs and arms viewed from the side.

An Egyptian painting of a lake (unknown artist). You can see how the painting combines several viewpoints: lake seen from above, and trees seen from the side.

You can observe the same in the above painting of a lake. The trees are always represented facing sideways, while the lake is viewed from above. Such a representation is not what we observe in the real world, but it perhaps captures best the abstraction performed automatically by our Lateral Occipital Complex (LOC).

Therefore, it is not surprising that all languages use characters that are line-based - LOC is sensitive to the contours (lines) and rather insensitive to the textures and the colors of the visual stimulus. The characters used in written language, being the most abstracted form of representing the world, should thus be attuned to the characteristics of what is the abstracting center for visual stimulus in our brain. Had the LOC been more texture-sensitive, our written language may have looked more like the QR codes.

Meanwhile, Greek art represents the real world by capturing what is being displayed on our retina. Let’s look at the sculpture “Laocoon and His Sons”. This piece captures the moment when Poseidon intervenes in the Trojan war by killing Laocoon, the man trying to expose the scheme behind the Trojan Horse (go Trojans!).

“Laocoon and His Sons” displayed in the Museo Pio Clementino of the Vatican Museums in Rome, Italy (Unknown artist, Hellenistic Period: 323 BCE – 31 CE). You can see how dynamic the depiction of human bodies are in comparison to the Egyptian pieces presented earlier.

In comparison to the Egyptian paintings shown previously, Greek art almost always depicts a human in its most dynamic pose. In addition, the details in the art are closer to reality. It is rather amazing how the sculptor was able to capture the texture of the cloth on a stone in this example.

The remaining history of art has been a constant battle between these two distinct ways of abstract representation. It is interesting to map out the history of art on the visual pathway in the brain, specifically whether particular representations lay closer to the retina or the LOC. During the Medieval period, because the story and the abstracted messages in art were more important, the representation lay closer to the LOC. During the Renaissance, the Greek way of representation became more prevalent. Then, with the advent of photography, which most closely represents reality as-is, the impressionists, as well as abstract artists like Claude Monet, Vincent van Gogh, and Pablo Picasso, emerged with new and diverse forms of representation. Artists have been, and always will be, on the frontier of exploring what forms of representations are allowed by our visual percept.

The current state of object recognition algorithms: the quest of engineering a robust visual perception and why neuroscience is important.

What is the current status of object recognition algorithms? What are the challenges in developing a robust visual percept? How should robustness be defined for visual systems even?

One could argue that our visual percept is robust, but our visual system is not without faults. We have all seen many optical illusions where we perceive what is stationary to be moving, etc. A recent and poignant example of optical illusions that took the world by storm a few years ago is this picture.

Dress photograph from the creator Cecilia Bleasdale (Dressgate, 2016).

Remember the heated debate about the color of this dress? People were divided on whether they see a blue dress with black laces versus a white dress with gold laces. This is a great demonstration of our visual percept failing. The reason why we have been taken for granted that our visual system is robust is because our visual percept has been consistent across our species, rarely creating a disagreement in what each other sees.

Nonetheless, consistency to human vision should be the standard an object recognition algorithm should strive to achieve. Imagine being an engineer building an algorithm for a self-driving car. You run into a bug where you observe your algorithm exhibiting a weird driving behavior. It would be extremely difficult to solve the problem if you knew that the algorithm’s visual percept is vastly different from that of ours - does it behave weirdly because it sees something that you do not see? Or is its decision-making process broken? When teaching a human driver, it is easy to correct such errors since we can reliably trust that the student sees what you see. Therefore, characterizing the human’s visual percept is a critical foundation to creating a reliable vision algorithm, and this is why (both cognitive and systems) neuroscience matters.

How similar is the current state-of-the-art object recognition system to our object recognition capability? Take a look at the example below.

The cue-conflict experiment (Geirhos et al., 2019). When asked to classify an image with elephant-skin texture with a drawing of a cat, most artificial neural networks classified it as an elephant, showing they are heavily texture-dependent.

(C) is an image where the texture is from an elephant (A), while the underlying edge draws a cat from (B). When shown cue-conflict images like that of (C), one can test whether a vision system relies more heavily on texture or edge. While our recognition system (LOC) relies on the contours (edges) of the visual stimuli, the deep networks seem to be classifying (C) as an elephant, signifying how texture-biased the system is (Geirhos et al., 2019). Humans almost always labeled (C) as a cat.

This is only a simple demonstration, but it raises an alarming question: how can you trust a system’s decision when you know that what it sees is different from what you see? Though there is currently excitement bordering on hype about deep learning and its capabilities, the vision problem remains yet to be solved, let alone be defined properly.

Conclusion

So what is the take-away message? I am not sure, really. This blog post was meant to serve as a written summary of the chain-of-thoughts I had during the past year, since I had a ton of time to do nothing but think.

I hope though that I was able to develop a neuroscientist’s eye towards appreciating an art piece in all of you. In fact, the observant attitude of the grounding practice will take you on many exciting journeys exploring frontiers of neuroscience that you don’t really see in your textbook nor under an fMRI. I would argue that many parts of society or the culture surrounding you is an externalization of the homogeneity in our brain structures. Think about this for a second - “culture”, or anything that is being “cultivated” by many, has to be the fingerprint of what is being resonated amongst the individual brains that are the arbiters of ourselves. So next time you visit the Museum of Contemporary Art in Los Angeles (or any art museum/gallery near you), you should consider taking the opportunity to further explore and define the commonalities of our visual percept.

References

Yue, X., Biederman, I., Mangini, M. C., Malsburg, C. V., & Amir, O. (2012). Predicting the psychophysical similarity of faces and non-face complex shapes by image-based measures. Vision Research, 55, 41-46. doi:10.1016/j.visres.2011.12.012
Margalit, E., Biederman, I., Herald, S. B., Yue, X., & von der Malsburg, C. (2016). An applet for the Gabor scaling of the differences between complex stimuli. Attention, Perception, & Psychophysics. 78(8), 2298-2306. doi:10.3758/s13414-016-1191-7.
Geirhos, R., Rubisch P., Michaelis C., Bethge M., Wichmann F. A., & Brendel W. (2019) Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. ICLR, 2019.
Dressgate (2016). Dressgate (Viral Phenomenon). Available at: https://en.wikipedia.org/wiki/The_dress_(viral_phenomenon).

ABOUT THE AUTHOR -- Jong Woo Nam

Jong is a second year NGP student working as a part of Dr. Bartlett Mel's lab. His current research involves quantifying a neural network's shape recognition capabilities. Connect with him on his LinkedIn page!

A Place To Share Science, Success, and Interests

Welcome to the Blog of the Neuroscience Graduate Forum from the University of Southern California! This site serves as a medium for students to submit their own writings on a plethora of topics that span everything from the broader scientific community to the occurrences of day-to-day life working as a graduate student in Los Angeles.

Additionally, this blog provides a variety of resources that new and current students in the Neuroscience Graduate Program can use to ensure everyone can get the most out of the opportunities that USC provides!

(Note: none of the views expressed in the writing of these blogs are a depiction of the opinion of the Neurscience Graduate Program, Forum, or of the University of Southern California.)

Page updated

Report abuse