Our memory is only revealed through our actions, many of which rely on making a decision about whether you recognize something or someone, that is, whether you have experienced something similar in the past to what you are experiencing now. From that simple recognition decision flows many subsequent choices about, for example, what actions may be appropriate to perform in context or whether to seek additional information. For example, if you are wandering an unfamiliar city but recognize having been at a specific intersection before, you can use that information to help find your destination. Of course, you may also fail to recognize that intersection even if you'd been there before, or you might falsely recognize an intersection that might be similar to, but not identical, to one you've experienced before. Understanding how recognition operates---whether correctly or incorrectly---is thus critical to understanding how memory is involved in all the kinds of decisions we have to make from moment to moment.
The problem of recognition illustrates how any inferences we can draw regarding the structure and contents of memory are necessarily indirect. Learning about memory from any particular decision requires taking into account what an individual is attending to in their environment as well as the context and goals an individual is attempting to satisfy. The importance of distinguishing between retrieval and decision has long been acknowledged in theories of memory, notably approaches based on the theory of signal detection. According to such theories, the outcome of retrieval is a quantity known as a "memory strength" which is then compared against different criteria in order to make a recognition decision. This kind of theory is, however, limited in that it it does not tell us anything about how or why a particular value of "memory strength" arises in a given situation, making it impossible to draw deeper inferences about the structure and content of memory. We want to know why someone might fail to recognize an intersection they've been to before or why they might falsely recognize one they've never been to.
My dynamic approach to recognition memory (Cox & Shiffrin, 2012, 2017, revised) addresses these issues by explicitly modeling the processes by which features of the environment are attended and perceived, how these features are used to probe memory as they are accumulated, and how the result of this probe is used to guide recognition decisions. According to this model, previous experiences are stored in memory in the form of a "memory trace" consisting of features of both the content of the experience (e.g., aspects of the face and voice of someone you conversed with along with the semantic content of what they said) and the context in which you experienced it (e.g., aspects of the timeframe and location of the conversation along with your mood and internal state at the time). When you experience a new event, features of that event gradually accumulate in working memory. At any given time, these features are compared against those in the memory traces, such that traces containing matching features will tend to gain activation over time while those containing mismatching features will tend to lose activation over time. To make a recognition decision, you keep track of the average change in activation across memory traces. If average trace activation increases enough to cross an upper threshold, you decide you recognize having experienced a similar event before. If instead the accumulated change decreases enough to cross a lower threshold, you decide you haven't experienced something like that before, at least not in your present context.
The model outlined above makes precise quantitative predictions regarding not only the likelihood that an event will or will not be recognized---including both correct recognition and false recognition---but the time required for someone to make that recognition decision. Because the model explicitly accounts for the processes involved in accumulating features from the environment, it is able to explain how similarity and/or distinctiveness along different dimensions affects the speed and accuracy of recognition decisions. False recognition, for example, can occur if a new event happens to share features with ones you have previously experienced, something which might be exacerbated if you fail to pay attention to distinguishing features or try to act too quickly by setting your recognition thresholds too low. Because this is a dynamic model, it explains why allowing more time to make a recognition decision counteracts this tendency for false recognition---with less time pressure, you can set your thresholds higher and give yourself the opportunity to attend to distinctive features that might mark an event as new rather than old. Finally, the model also explains how priming influences the timing and accuracy of recognition decisions by virtue of features "leaking" from the prime into working memory, thereby changing the initial activation levels of different memory traces and affecting their subsequent activation trajectories which, in turn, are the basis for recognition decisions.
According to the dynamic approach, the speed and accuracy with which we recognize an item or event depend on the dynamics by which features of that item/event are attended and perceived. This approach represents an alternative way of understanding certain phenomena that have previously required "dual-process" theories to explain. One such area involves the formation and retrieval of associations between items/events. As described above, memory for individual items/events can be assessed using a recognition task that requires participants to distinguish between item/events they had experienced in a specific context from items/events they did not experience in that context. Memory for associations between items/events is typically assessed using an "associative recognition" paradigm. In this paradigm, participants experience items/events in pairs. Aftewards, they are asked to distinguish between pairs of items/events that they experienced at the same time from pairs that they experienced at different times, albeit still within the same overarching context. For example, say you meet two couples at a party---Alex and Blair (couple 1) and Chris and Dakota (couple 2). Associative recognition would entail recognizing that you had met Alex and Blair together and not Alex and Dakota.
Dual process theories propose that items/events are represented and retrieved in a fundamentally different way from how you remember and recognize associations between item/events. With Rich Shiffrin, I proposed instead that associations take the form of extra "associative features" that are present in the memory traces of items/events that were experienced at the same time. As such, associations were represented in the same basic form as other kinds of information in memory and could be retrieved using the same dynamic process of feature accumulation. Cox & Shiffrin (2017) gave a proof-of-concept that this approach could explain existing data regarding speed-accuracy tradeoff in associative recognition. However, additional work was needed to place this theory on stronger footing.
With Amy Criss (Cox & Criss, 2017), I developed a novel experimental paradigm that allowed us to use response times to test for qualitative distinctions between my proposal and a dual-process theory. The logic of the methodology came from "Systems Factorial Technology" (Townsend & Nozawa, 1995), a set of measures from psychophysics that use qualitative properties of response time distributions to distinguish between serial and parallel processes as well as between processes with limited and unlimited capacity (among many other factors, as outlined in our paper). The results from this study showed that item and associative recognition decisions rely on processes that run in parallel with one another, invalidating some dual-process models that say you can only retrieve an association after retrieving information about its component items. In addition, we showed that these parallel retrieval processes interacted with one another to give a "boost" to pairs of items that had been experienced together---memory for such a pair was "greater than the sum of its parts". This second finding invalidated several dual-process theories that assumed, incorrectly, that item and associative recognition decisions rely on separate processes.
Armed with this more fine-grained view of the nature of the processes involved in item and associative recognition, Amy and I extended the original dynamic model to explain the dynamics by which associative features arise from the parallel encoding of features of individual items (Cox & Criss, 2020). According to this model, "associative features" represent conjunctions of item features and are shared between memory traces formed by experiencing multiple items in the same context. Because associative features are built from item features, this explains the facilitatory interaction we found in our 2017 experiment. Our theory of the formation of associative features also explained a phenomenon first documented by Barbara Dosher in 1984, but until then unexplained. When you experience pairs of items that are similar to one another, you are more likely to correctly recognize them as a pair and less likely to falsely recognize them if they later appear in different pairs. In a new experiment, we replicated this phenomenon using items comprised of words, images, and abstract forms and where similarity was defined in terms of both perceptual and semantic features. A dual-process approach could not explain these results because it assumes independence between items and associations. Instead, our dynamic approach naturally accounts for the intimate relationship between these two types of information and how they interact during retrieval and decision making.
So far, the dynamic approach to recognition memory was able to explain the speed and accuracy of recognition of individual items/events (Cox & Shiffrin, 2012, 2017) and the speed and accuracy of recognition of associations (Cox & Criss, 2020). However, it remained to be seen whether the same theory could explain the speed and accuracy of individual subjects who performed both tasks. If so, then this would show that memory for individual items/events and memory for their associations were both stored and retrieved using the same set of processes and representations. This is a tall order for any theory, not least because there is considerable variability in how different individuals engage in each task.
To put the joint theory to the test, I fit the trial-by-trial performance of over 450 individual participants who engaged in both item and associative recognition tasks using data from a "mega study" I had previously published (Cox, Hemmer, Aue, & Criss, 2018). The model passed with flying colors (Cox, revised), accounting for over 80% of the variance in both speed and accuracy of individual performance in both tasks. This successful theory, which I term Dynamic Retrieval of Events and Associations from Memory (DREAM), represents an important advance toward an integrative and cumulative theory of memory. These results illustrate how the speed and accuracy of decisions about both individual items and about how they are associated are ultimately the result of the dynamics by which features of those items are perceived and then conjoined to form associative features.
Two threads of work we are currently undertaking in the lab leverage the success of the DREAM theory to understand the ways by which people form memories of repeated events as well as how people recognize multiple events at a time.
I am currently working with Sam Wang, one of the talented undergraduate students in my lab, to understand the strategies by which people recognize multiple items at a time. For example, imagine meeting several people at an event in the morning. At lunch, you sit at a table with two people. How do you decide if you met those people earlier that morning? Do you make separate decisions for each person, or a single holistic decision on the basis of their overall familiarity? If you make separate decisions, do you make them sequentially or in parallel? Sam and I worked together to develop a study using lists of words that replicated this exact scenario. During the Spring and Fall of 2023, we collected data from a total of over 300 participants. We fit the accuracy and response times of each participant with nine different models representing different strategies by which people could make and combine recognition decisions for multiple items, using DREAM to model those individual decisions. When participants only had to decide if they had seen at least one item already, they did so predominantly by making separate decisions for each item in parallel, responding if either decision resulted in positive recognition. On the other hand, when participants had to decide if they had seen all the items already, many did so by averaging the familiarity of each item and giving a positive response if and when that average familiarity was high enough. That said, there were substantial individual differences in recognition strategies, and we are now beginning to use explore why different people may adopt these different strategies.
Meanwhile, with another outstanding undergraduate, Priya Samaroo, I am working to use the dynamics of memory retrieval to reveal the ways in which people remember repeated events. In the experiment we developed, participants view a set of objects, each of which has a different color. Some of these objects are repeated; the job of the participants is to remember and later report the color of each object. Over Spring and Fall of 2023, we collected data from over 400 participants in different versions of this task. We found that people were, as one might expect, faster and more accurate at recalling the colors of repeated objects. What was especially intriguing, however, was that this memory boost was larger than would be expected if repetition simply gave you an extra chance to successfully recall the color---people had formed integrated memories of the two times they encountered the object, such that the resulting memory was more than the sum of its parts. While this could be a good thing for helping to learn from repetition, it also opens the door for false or misleading memories to be formed; if the repetition is not exact, forming an integrated memory could distort the information in the original memory. Priya and I are currently working on a new experiment to study when and how these kinds of false memories may be formed.
The thread of research that resulted in the development of the DREAM model was focused on the dynamics of the processes involved in storing and retrieving information from memory, but the information itself was assumed to be about a static item or pair. In many naturalistic contexts, memory is called upon to deal with information that is inherently dynamic, arriving over time---music and language being quintessential examples. It is therefore critical to understand how the dynamics of memory encoding and retrieval are engaged when dealing with information that is itself dynamic.
To better understand how memory for the order of events is represented and retrieved from memory, I have been working with Gordon Logan on a series of experiments and models that pertain to the recall and recognition of items that are embedded in a sequence. Our initial experiments in this vein used a task we called an "episodic flanker" task (Logan et al., 2021), by analogy to the attentional "flanker" task used by Ericksen and Ericksen. In this task, participants see a short sequence of items like letters, either arranged spatially (left to right) or temporally (one after the other). Participants are then shown an item along with a position cue (e.g., a marker at a spatial location) and are asked to determine whether that item had occurred in that position in the sequence they just experienced. In some conditions, we also present "episodic flankers"---items in other positions which may or may not match those that occurred in those positions during the earlier sequence. By analyzing the speed and accuracy of decisions in the episodic flanker task, we can thus examine the degree to which memory for an item's position within a sequence depends on its relationship to other items in that same sequence. We found that, just like in a perceptual flanker task, people's memory for items and their positions was significantly impacted by the context in which those items were presented. This is just like what we found in the case of memory for items and associations---when items are encountered repeatedly in the same configuration, memory for that configuration is greater than the sum of its parts.
These results run counter to a set of theories that explain memory for serial order in terms of direct associations between items and "position codes" that do not depend on other nearby items. We have argued that position coding is not a universal theory of serial order memory, but is instead a strategy that participants can adopt for specific tasks (Logan & Cox, 2023). However, particularly in light of the relationship between items and associations explored above, it seems likely that a position coding strategy is secondary to a strategy based on associating items with their surrounding context in sequence. This theoretical viewpoint is bolstered by the fact that there are a number of simple cognitive mechanisms that could construct position codes on the basis of associations between items and their context (Logan & Cox, 2021). Furthermore, we have recently shown in a sequence of 12 experiments (Logan et al., revised) that a critical signature of position coding---a tendency to misremember an item from the ith position in one sequence as having occurred in position i of the next sequence---is entirely absent in our episodic flanker paradigm. Thus, the way in which sequences are represented in and retrieved from memory likely depends on strategies that convert an initial item-context association into a different form.
As noted above, a naturalistic setting in which serial order is critical is that of music. This is true both for those listening to music and for those reading and performing music. I am currently working with two graduate students to study how the basic cognitive mechanisms described above manifest in the context of music, and, in turn, how music can be used as a tool for studying those mechanisms.
Since Fall 2021, graduate student Nate Gillespie and I have been working to develop a set of novel auditory stimuli with two purposes: First, we want to understand how the basic mechanisms of perception, attention, and memory described above are deployed when dealing with stimuli from this novel domain. Second, we want to be able to combine these stimuli, either in sequence or simultaneously, to construct simple artificial sequences that mimic the structure of more naturalistic stimuli like language or music. By mimicking this structure within an unfamiliar setting, we are able to study the mechanisms by which people perceive and remember these complex structures without any interference from experiences outside the lab.
Nate constructed a set of artificial "timbre" stimuli using additive synthesis---the superimposition of sound waves with different frequences at different amplitudes. We have now conducted three sets of experiments using these stimuli, involving hundreds of participants from a variety of backgrounds. We have found that, across all these studies, participants perceive these stimuli in terms of three basic psychological dimensions---one dimension pertains to the "roughness" of the sounds, one dimension to their "brightness", and the third dimension defies a simple verbal description. Differences in the ability to perceive and remember combinations of these sounds depends on the degree to which people attend to these different dimensions. We used a model of recognition similar to DREAM to show that perceived similarity along these three dimensions explains individual differences in the speed and accuracy with which people recognize individual sounds. Thus, these novel stimuli demonstrate the tight web of interactions between perception, attention, memory, and decision making. We are currently conducting experiments to study memory for more complex configurations of these sounds, configurations that begin to approach the complex structures present in linguistic or musical passages. Thus, these stimuli can serve as a test-bed for understanding the mechanisms by which perception and memory of individual items acts as the foundation for perceiving and understanding the meaningful materials that shape our daily existence.
Recently, I began working with graduate student Caroline Rafizadeh and Heather Sheridan to understand perception and memory not for the sounds of music, but for how it is written. Western musical notation is visually complex and represents meaningful configurations of sounds arrayed both in sequence (like a melody) and simultaneously (like a chord). As in my work using the "episodic flanker" task, we are currently developing a study that uses eye tracking to understand how people encode and retrieve information about the position of a short segment of music within a larger context. Unlike the episodic flanker work, the context in this experiment is meaningful---at least to people who can read music. Therefore, by comparing the performance of novice and expert music readers in this experiment, we have an opportunity to understand how memory for serial order this experiment provides a way to directly assess how meaning interacts with more basic perceptual and attentional processes to support memory for material that is extended in time.
Another new strand of work, begun recently with graduate student Pierce Johnson and Ron Friedman, looks at how the structure of a sequence supports not just memory, but the development of preferences. Theories of musical preference have, for a long time, assumed that our aesthetic responses are related to our ability to make predictions and form expectations about what will happen next, with the expectations being either violated or confirmed. However, it remains an open question whether people prefer passages of music that engender expectations---predictive passages---or whether they tend to prefer passages that conform to their expectations---predicted passages. We worked together during Fall 2023 to develop an experiment in which people are exposed to a long stream of artificial musical passages, some of which always occur in pairs with one strictly following the other. Thus, within such pairs, the first item is predictive while the second item is predicted. In addition to assessing memory for these pairings, we are interested in which of those two types of passages people come to prefer more. This line of work thus extends my work from the cognitive domain toward understanding how the cognitive processes involved in perceiving and remembering music give rise to meaningful aesthetic experiences.
One of the main challenges in the study of cognition is understanding how complex neural systems realize the representations and processes involved in attention, perception, retrieval, and decision making. A dynamic approach lends itself to building a bridge between these levels of description, because cognitive processes and neural processes are extended in time. Dynamic computational cognitive models make it possible to understand the ebb and flow of activity between neurons in terms of which representations are being processed at what time and how those processes interact. A dynamic approach thus goes beyond understanding how neurons do what they do to tell us why they do what they do. Establishing this connection between "how" and "why" is crucial in applications as well, for understanding how cognition is affected by disease or injury and for designing systems to interpret neural activity (enabled by new developments in prosthetics).
Recently, we worked with a network of collaborators to bridge the gap between neural dynamics ("how") and cognitive dynamics ("why") in the domain of visual attention and decision making (Cox et al., 2022). We focused on "visual search" in which you are looking for a known "target" object in a scene that also contains "distractors" which can be more or less confusable with the target; the main behavioral outcome in visual search is response time—how long it takes to make a saccadic eye movement to the target in the scene. Our participants were rhesus macaque monkeys, for whom it is possible to record activity from individual neurons while they are engaged in visual search. These neurons were located in the Frontal Eye Fields (FEF), a region containing neurons known to be involved in selecting important visual stimuli and directing eye movements. As a result, we were able to jointly observe the dynamics of behavior (the timing of eye movements) along with the dynamics of neurons involved in producing that behavior.
This unique data made it possible to develop SCRI, short for Salience by Competitive and Recurrent Interaction, a model that jointly explains the selection of target objects as well as the dynamic spiking activity of the FEF neurons that enact that selection process and direct eye movements to the selected target. According to SCRI, visually responsive neurons in FEF represent the relative salience of objects in their receptive fields in terms of their "firing rate", that is, how often they generate "spikes" or "action potentials". SCRI explains the moment-by-moment changes in firing rates as neurons localize objects and ascertain whether they could be targets or not. SCRI explains how firing rate dynamics are affected by things like increasing the number of distractors and increasing the similarity between targets and distractors, as well as how these differences in neural dynamics manifest in behavior via the way movement-related neurons get excited by visual neurons. In this way, SCRI accounts for both neural and behavioral dynamics in terms of the representations and processes embodied by FEF neurons.
While SCRI can account for the dynamics of individual neurons as they process visual information to form a representation of salience, recordings of individual neurons are difficult to obtain in human participants. Instead, neural dynamics in humans are often measured using EEG. However, the EEG signal is measured using electrodes on the scalp, each of which reflects the aggregate activity of many thousands or millions of individual neurons. To better understand whether the same basic neural computations underly human visual cognition, it is therefore important to establish the relationship between the activity of individual neurons that are explained by SCRI with the broader EEG signal in which that activity is embedded. That way, just like SCRI was able to bridge the gap between neural and cognitive dynamics in monkeys, it may also be able to bridge that gap in humans. I am currently working with Giwon Bahg to develop simple models that relate SCRI's neural activity to EEG. Preliminary results (in preparation) have found that the dynamics of SCRI can account for a considerable degree of variance (about 90%) in the EEG signal, especially the ways in which EEG signals evolve dynamically depending on whether a visual search target is present or not. This result goes a long way toward promoting an integrative and cumulative understanding of both the neural and cognitive mechanisms involved in visual attention.
As part of a large interdisciplinary grant funded by NOAA, I am working with fellow Albany faculty Heather Sheridan and Jeanette Sutton (in CEHC) on a project that aims to use the same principles of visual attention codified in the SCRI model to help improve the next generation of emergency alert displays. These displays, which are primarily disseminated on television although they can also be sent to smartphones and car dashboard displays, are critical for providing timely alert and warning information about weather emergencies. Our project seeks to identify best practices for the visual layout of these displays, as well as what kind of text and graphics are most effective when it comes to encouraging people to make good decisions in response to warning information (e.g., to seek shelter or avoid certain areas). We have conducted a qualitative study of the ways that different television networks display these warnings and are currently designing an eye tracking study that will study, for the first time, how people distribute their attention over these emergency alert displays and how efficiently they are able to search for the information they need to make good decisions.
Psychology must continually grapple with fundamental scientific issues related to inference and research design. Because the objects of study in psychology---like processes of attention, perception, retrieval, and decision making---are only revealed indirectly via behavior and/or neural activity, it is crucial to be explicit about how cognitive processes and representations are related to what we can observe. One of the powers of formal computational/mathematical models is that they make these relationships explicit. This explicitness makes it possible to use formal models to draw robust inferences about the mind from data and to design experiments that can clearly distinguish between different hypotheses.
One of our main research interests is thus in developing methods and frameworks that help other researchers tap into the power of formal modeling to their own ends. As standalone software (Cox & Criss, 2019), we shared the response time analysis methods we had used to determine the parallel interactive nature of item and associative retrieval (Cox & Criss, 2017). These methods are quite general and so can help researchers use response time data to make similar inferences in other domains. Similarly, we worked with Michael Kalish (Cox & Kalish, 2019) to develop and share software for a novel Bayesian version of State Trace Analysis, a method for using qualitative features of data to determine whether it could have been produced by a single process. Such identifications are critical for establishing whether a dual-process explanation is required for a set of data, or, as in clinical settings, determine whether two or more groups differ from one another in a qualitative rather than only quantitative manner. Finally, my student Nate Gillespie and I have recently been developing methods for relating qualitative and quantitative data in the study of memory and perception, using tools from natural language processing to characterize how people describe their perceptual experiences and decision strategies in a way that can be related to model-based measures (Gillespie & Cox, submitted).
In addition to their immediate practical uses, formal models have an important role to play in enabling scientific progress more generally (Cox & Shiffrin, in press; Cox et al., in press). In addition to enabling statistical inference, formal models forge precise connections between theoretical causal mechanisms and observables. In this latter role, that of a "causal" model, a model does not merely describe data, it describes the processes that generate data. Unfortunately, the criteria for assessing how well a model performs a causal role are not always aligned with the criteria for how well a model describes a given dataset. Confusing these different criteria is, in part, behind the recent crisis of confidence in many fields of science, particularly psychology (e.g., Starns et al., 2019). As statistical methods evolve to try to bring statistical practice better into line with scientific practice, it is critical to keep these different types of modeling in mind, particularly given that the ultimate goal of science is understanding causes, rather than merely describing effects (Singmann, Kellen, Cox, et al., 2022; van Doorn et al., 2023). Failure to take causal mechanisms into account when developing a model can result in inferences that fail to generalize/replicate, as has been the case with many applications of "significance testing". In the worst case, descriptive models may be used that have, at best, an obscure relationship with putative causal mechanisms, leading to conclusions that may be numerically strong but scientifically meaningless. Failures of this kind abound in many applications of "artificial intelligence" to things like educational, hiring, and policy decisions, where descriptive models may succeed in reproducing existing patterns---including patterns of bias---but fail to provide any meaningful guidance because of their disconnect from the causal mechanisms the produced those patterns in the first place.