Integrated versus independent processing of auditory features in speech sounds

1 INTRODUCTION

This project investigates the conditions for integrated versus independent processing of complex auditory information, with speech and language as the primary experimental domain. Integrated processing is found with correlated cues (Parise & Ernst 2016; Lerousseau et al. 2022) and appears to serve as one method of sensory dimensionality reduction. For example, in vision, participants disregard redundant dimensions such as hue and chroma of a color when organizing decks of colored stimuli cards, suggesting that when color perception happens, the stimuli are processed in an integrated, holistic fashion (Garner & Felfoldy 1970). However, some dimensions, such the size of the color circle and the angle of an inscribed line, slowed the organization of the stimuli decks, suggesting that attention to noncorrelated dimensions is required and that processing of these visual stimuli proceeds in an independent, additive fashion. The empirical domain is language because we understand much about speech sounds in terms of their production and perception, in addition to their neuro-cortical correlates.

The central question is the following: can language-specific properties drive a shift from independent to integrated perceptual processing? Across different languages, speech sounds may form complete or incomplete cross-classifications. We propose a methodological approach in which independent processing of multiple auditory cues are distinguished from integrated processing through the identification of distinct neural evoked responses (ERPs) elicited by speech-sound stimuli (Janssen et al. 2020; Obleser et al. 2003; see §2.1). We hypothesize that whether the multiple cues of speech sounds are processed in an integrated or in an independent fashion is determined by language-specific experience. Specifically, we predict that complete classifications lead to independent processing of speech properties, whereas incomplete classifications lead to integrated processing.

In order to test this hypothesis, we introduce an innovative cross-language neural recording design that contrasts neural responses to speech in English and in the Campidanese variety of Sardinian (see Wagner 1997; Virdis 1978; Contini 1986; Jones 1988; Bolognesi 1998; Lai 2021 for descriptions). These speakers provide an ideal comparison population for evaluating the impact of experience on auditory integration processes, because Campidanese exhibits incomplete cross-classification of its speech-sound cues (see §2.3), meaning that we can experimentally control whether cues are correlated or not in this language. Conducting a neurophysiological research study on an understudied and threatened language in a rural region will create opportunities for significant broader impacts in terms of education, training, outreach, and language documentation and resource building.

The project proceeds in two stages. The first is verifying the linking hypothesis by which neural additivity is correlated to independent processing of phonetic features of English (see §3). The second is to examine whether Campidanese also displays neural additivity correlated to the same kinds of features given that integrated processing of auditory cues is also possible in some domains and under some circumstances (Han et al. 2023). There are strong behavioral and theoretical reasons to assume that Campidanese speakers may perceive these same features in an integrated fashion. In sum, this project aims to contribute to our understanding of how human processing of speech sounds correlates to component elements of the acoustic signal and to neuro-cortical activity, as well as how this processing is influenced by language-specific experience.

The proposed project thus has significance for both cognitive neuroscience and language science (see Embick & Poeppel 2015). Where cognitive neuroscience is concerned, it will provide specific understanding of the neurobiological correlates of speech perception, as well as a more general understanding of additivity effects in neuro-cortical activity. As for language science, it will help contribute to our understanding of how phonetic features in speech sounds are perceived and the influence of language-specific experience on the perception of speech cues.

2 BACKGROUND

2.1 Independent and integrated processing

In auditory and visual neuroscience, the Mismatch Negativity (MMN, Jääskeläinen et al. 2004; Näätänen et al. 2005) is a response in electro-cortical activity elicited by changes in multiple stimuli within the time span of sensory memory, detectable in both EEG and MEG. When two or more comparable stimuli are presented in the same context, aspects of those incoming stimuli are compared to traces stored in working memory where divergences from expected cues elicit robust MMN responses (see Näätänen et al. 2019 for an overview). As the perceptual system builds representations of environmental stimuli, it makes predictions about what kinds of cues will be most salient. Divergences from those predictions—for example when a series of red squares followed by a green square is presented as visual stimuli—will result in a robust MMN evoked response. Thus, MMN is a correlate for aspects of perceptual representations which diverge from a standard in a perceptible and unexpected fashion across domains; in audition this includes such cues such as duration, intensity, pitch, and timbre. These cues are present in any given speech signal, and may vary independently of each other. For example, Wolff and Schröger (2001) show that auditory stimuli that vary along multiple dimensions including duration, frequency, and intensity, can be compared to each other. When one dimension is changed between stimuli, this is sufficient to elicit an MMN.

When partially different neural populations are involved in the processing of different but related stimuli (Allen et al 2017, 2022), the MMNs elicited by divergent features demonstrate neural additivity (Paavilainen et al. 2001), a correlate for differences across a wide range of perceptual domains, including music (Hansen et al. 2022) and vision (Stefanics et al 2014). When divergences along multiple dimensions of features in an auditory stimulus are relevant to perceivers, the ERPs those divergences elicit are generated by multiple populations of neurons, producing a cumulative ERP (as in Figure 1). Our hypothesis is that the additive property of ERPs can be used as a reliable way of determining if elements of speech sounds are perceived in a holistic, integrated fashion or as independent percepts that are only parts of a whole (Han et al. 2023).

For example, Caclin et al. (2006) demonstrate that three dimensions of timbre can be independently varied above the threshold of discernability, and each dimension elicits distinct MMN responses in partially separate neuron populations. The cumulative effect of varying multiple dimensions simultaneously is an additive effect in the ERPs themselves, demonstrating that the three dimensions of timbre are perceived in an independent fashion. Such additive effects in MMN responses have been shown to hold in the processing of frequency and location (Schröger 1995), frequency and intensity (Paavilainen et al. 2003), pitch and location (Takegata et al. 2001), and vowels and pitch (Lidji et al. 2010).

When this relationship was deviated from or reversed, an MMN response was observed with apparently no contribution from the variable frequency and intensity. This suggests that the difference in feature changing direction was perceived as the relevant cue in the stimuli, while the irrelevant differences in the physical features went unperceived. One possible interpretation of this result is that the different cues are not fully cross-classified. That is to say, since frequency and intensity co-varied in a redundant way, they were processed in an integrated way.

Figure 1: ERPs demonstrating neural additivity. Each waveform is correlated to an individual speech sound. A difference along one dimension (blue and green) produces a waveform with greater amplitude relative to the standard stimulus. A difference along two dimensions produces a waveform with an amplitude greater than either of the single-dimension deviants.

2.2 Speech-sound classifications

Where linguistically-significant sounds are concerned, Yu et al. (2022) found that vowels, consonants, and tones are all processed independently by speakers of Cantonese. This means that tones and vowels, for example, are perceived as independent percepts even though they co-occur simultaneously in the speech signal. This project investigates these sorts of acoustic features: we seek to determine if the processing of these features as independent components is a universal fact of human perception or if individual language experience plays a role in their processing as integrated or independent.

The empirical basis for the project is two different languages, American English and CS. The use of Campidanese (see §2.3) is an optimal choice because this language exhibits a typologically unusual pattern of contextually conditioned sound changes that results in incomplete cross-classification in a specific context (Chabot forthcoming). As such, not only does this project address questions central to cognitive neuroscience and to theoretical language science, it also makes an important contribution to the documentation of an understudied minority language. This project will also serve as the basis for an exchange between locals and researchers in cognitive neuroscience and language science that is intended to contribute to ongoing efforts to boost the status of Sardinian, demonstrating its interest to the wider world and its relevance to a number of scientific questions that occupy researchers across domains, improving the social status of members of the local community (see §6.2 for further details).

The perception of speech sounds requires the processing of multiple cues which co-occur simultaneously, such as degree of occlusion or manner, voicing, and place of articulation. Linguists typically organize speech-sound inventories according to these phonetic cues. For example, in Table 1, speech sounds are organized according a cross-classification of their phonetic features: both p and b are bilabial stops produced with total occlusion, different from each other in their voicing properties. On the other hand, f and v are both fricatives produced with partial occlusion, different from each other in the same way as p and b, but also different from that latter set of stops in their realization as fricatives.

Figure 2: Spectrograms of stops and fricatives, showing the cross-classification of noise and periodicity

Manner and voicing have distinct auditory cues, as broadband aperiodic noise and as periodicity respectively. Voiceless fricatives such as f are characterized by broadband noise during articulation. In a similar way, v is characterized by noise and periodicity, the latter of which is the acoustic correlate of voicing (see Figure 2). The stops p and b are different from fricatives in that the brief but total occlusion of the pulmonic air stream used to produce them is characterized by lack of noise, though b can be distinguished from p through the presence versus absence of noise, exactly as for the fricatives. In this way, the four sounds in Table 1 are decomposable into different configurations of noise and periodicity, where discrete elements in the signal are distinctive in cognition.

Table 1 demonstrates that, in English, the four speech sounds can be cross-classified according to the presence or absence of both noise and periodicity. This suggests that cross-classification results in processing of these speech cues in an independent fashion, where the cues do not depend on each other and may variably co-occur or not. That is, their presence or absence is the basis by which speakers distinguish between the various sounds.

However, this full cross-classification configuration does not hold across all languages, since not all languages distinguish between the same sound categories (see Table 2. The PHOIBLE cross-linguistic inventory of language sound systems is a catalog of 3,020 speech-sound inventories, of which 2,594 (86%) contain p, 1,906 (63%) contain b, 1,329 (44%) contain f, and only 816 (27%) contain v. For example, the Acehnese language, spoken in Indonesia, has p, b, and f, but not v. This means that, in this language, the cross-classification which holds in Table 1 is incomplete. In such cases of incomplete cross-classification, processing of acoustically different speech sounds may proceed in an integrated manner, since the discrete elements in the acoustic signal are not distinctive in cognition.

2.3 Sardinia and Campidanese Sardinian

Sardinia is an island off the west coast of Italy, where an autochthonous Romance language is spoken. The brief description of the Campidanese variety of Sardinian (CS) presented here is based on field-work I conducted with Simone Pisano in and around the village of Genoni, in the province of Sud Sardegna, located on the high plane of Giara di Gesture in south-central Sardinia (see map, Figure 3). Sardinian in general and Campidanese in particular is a minority language with no standardized orthography; faced with sociolinguistic and socio-economic pressures, Campidanese is spoken by less people every year (Mensching & Remberger 2016). Sardinian is thus a regional language that is threatened, and part of the ambition of this project is not only to contribute to its documentation and description, but also to highlight its interest to people both inside the local community and outside of its immediate context. That is, far from just being a dying language with no relevance beyond the countryside, it is as valid an object of research and scientific and cultural interest as any other language.

Because Sardinian has never been standardized, the Campidanese dialectical area is characterized by a great deal of variation, especially in the phonetic details of pronunciation. It is important to keep in mind that descriptions may differ due to this variation, and what is true of the variety spoken in one village may not be true of the variety spoken in another. The consonants of Campidanese are shown in Table 3. Certain consonants have restricted distributions (e.g. ɖ which is only used in a few words, mostly pronouns) or occur only in recent in loan words (e.g. dz, v, ɲ; Lai 2021). Sounds in parenthesis are only possible in contextuallyconditioned variation.

Figure 3: Dialects of Sardinian (adapted from Virdis 1978)

Campidanese is, in part, characterized by variation between voiceless stops, such as p, and voiced fricatives, such as v. Both of these classes of sounds may occur at the beginning of a word, but only the latter may appear between vowels. This can be seen in the data in (1), where any consonant may appear in the initial position (as on the left), but only voiced fricatives may appear in the intervocalic context (as on the right).

What is interesting about this change is that the number of modified dimensions will vary depending on which consonant is in the initial position. Voiceless fricatives, as in (1a), only change their voicing when appearing between vowels. But voiceless stops, as in (1b), changing to voiced fricatives requires a change in both voice and manner. In other words, the changes in (1a) involve only one dimension of modification, while those in (1b) involve two. This thus presents the possibility that, since speakers must be able to produce and perceive this contextual variation along multiple axes, they may do it in an integrative manner. Note that b is not like p or f in the intervocalic context, since it remains contrastive in intervocalic contexts in Campidanese; as such, it provides a kind of control in this condition.

This situation of contextually-determined changes allows for a substantive hypothesis to be made regarding the nature of the processing of the changes in Campidanese by native speakers. Since the difference between p and f is along the single dimension of noise, it should be possible to elicit an MMN response correlated to that difference in the two speech sounds (Table 4). Meanwhile, a change between p and v concerns multiple changes in noise and periodicity simultaneously, but since this cross-category auditory difference is not relevant in intervocalic contexts, the MMN elicited by their comparison in such contexts should not produce an additive effect since Sardinian speakers perceive such changes in an integrated way.

Since the same is not true of English, where noise and periodicity are always relevant and salient distinctions in word-initial contexts and in intervocalic contexts, there is no reason to assume that English speakers process these changes in an integrated way: they are predicted to exhibit additivity when changes happen along the multiple axes of noise and periodicity simultaneously, since each is distinct and thus independently perceived.

The study proposed in this project is novel, in that we know of no experimental work on additivity in the perception of phonetic factors in Campidanese and English speech sound. However, one assumption that our MMN design rests on is that the representation of different phonetic feature dimensions are supported by at least partially separable neural populations. Some prior evidence for this assumption comes from an fMRI study by Lawyer and Corina (2013), which provides hemodynamic evidence that place of articulation, voicing, and manner differences in speech-sound perception are all correlated to discrete areas of activation in the mid to posterior Superior Temporal Gyrus (STG) and Superior Temporal Sulcus (STS), suggesting the potential for additive ERPs when the same dimensions are varied in MMN (Figure 4).

Scharinger et al. 2011 conducted an MEG study of cortical maps of the vowel space of Turkish speakers in an effort to understand how dimensions of acoustic cues are encoded in brains. Turkish, notably, has a complete set of contrasts for the vowels, Table 5. They found that cortical maps reflect spatial coding schemes where individual acoustic cues are mapped onto cortical space relatively independently, (Figure 5, left). In contrast, Manca et al al. 2019 found that the cortical maps of the Italian vowel space have a much less robust rounding distinction, with the back vowels a and ɔ being relatively closer together in the cortical map (Figure 5, right). These differences suggest that Turkish speakers process rounding as an independent cue on both the front and back dimensions, since the Turkish vowel system makes a complete cross-classification, while Italian speakers process rounding as a cue integrated with backness, since the Italian vowel system has incomplete cross-classification.

2.4 Previous neural results

Figure 4: Activation areas for phonetic features showing both distinct areas and areas of partial overlap (Lawyer & Corina 2013).

They found that cortical maps reflect spatial coding schemes where individual acoustic cues are mapped onto cortical space relatively independently, (Figure 5, left). In contrast, Manca et al al. 2019 found that the cortical maps of the Italian vowel space have a much less robust rounding distinction, with the back vowels a and ɔ being relatively closer together in the cortical map (Figure 5, right). These differences suggest that Turkish speakers process rounding as an independent cue on both the front and back dimensions, since the Turkish vowel system makes a complete cross-classification, while Italian speakers process rounding as a cue integrated with backness, since the Italian vowel system has incomplete cross-classification.

Figure 5: Left: Cortical map of the four Turkish back vowels. Round vowels are separated along the lateral gradient, with a clear separation between a and ɔ (from Scharinger et al. 2011). Right: Cortical map for the five Italian vowels, with a reduced lateral gradient for round vowels, and closer positions for a and ɔ (from Manca et al. 2019).

2.5 Pilot results

A MEG-MMN pilot study was conducted using speakers of American English on the sounds of interest (pa, ba, fa, va). In the pilot we tested the sounds in onset position only, to confirm that our MMN protocol demonstrates sensitivity to feature contrasts as expected, as well as to determine how MMN latency and amplitude is modulated by one vs two dimensions of change. We confirmed that electro-cortical techniques are sensitive to the characteristic auditory properties of these sounds, over an acceptable time course. The experiment employed an oddball detection paradigm using a standard syllable, pa, and three deviant syllables differing in periodicity ba, noise fa, and periodicity and noise va. In this way, a standard speech sound p is compared to deviants which differ across one dimension each (f and b) and a deviant v, which differs across both dimensions simultaneously.

To record ERPs, we placed participants in a 160 channel MEG system (KIT) and exposed them to prerecorded, normalized recordings of an adult, native speaker of English pronouncing each of the syllables. In a single session we exposed participants to between four and seven exposures of the standard stimulus, pa, with a staggered onset time of between 350-1000ms. After another random onset time, a single random deviant stimuli from the set of ba, fa, va was then used to evoke an MMN. Each participant was exposed to approximately 2,400 standard stimuli, and approximately 150 of each deviant stimulus, or about 45 minutes in the MEG machine. Since this task does not require conscious attention, participants were given the option of watching a movie play silently during data collection.

In this way, we tested the relative MMN produced by speech sounds which differed compared to a standard through a potentially integrated process of perception. Our results show that in normal perception, noise and periodicity are phonetic features available to speakers to make fine-grained distinctions between the building blocks of sounds. Topographic maps produced from our pilot data (see Figure 6), show different latency responses between p, b, f, and v, meaning that in each condition MEG can be used to distinguish between noise and periodicity. Further, we confirmed a post-stimulus reaction to the stimuli in the expected time-course for phonological activity, and we localized that activity to a subset of channels corresponding to known language-processing centers in the posterior portion of the left-temporal lobe. This is an important preliminary step to the current proposed project, which demonstrates the validity of electrocortical techniques in investigating our question.

Figure 6: Topographic maps of MMN responses, top to bottom: pa ba fa va. Latency differences are most clear 200-350ms (black box), but differ in their detailed timing and topography across conditions. The differences in the responses suggest that changes in frication noise, and periodicity are processed independently in English, and that these differences can be detected using MEG.

3 PROJECT ACTIVITIES

3.1 Design

This study proposes an MMN protocol to test the perception of speech sounds to determine if additivity can be used to measure to what extent speakers perceive features in the acoustic signal in an independent or integrated way, and if this is impacted by language-specific experience. We establish two conditions, a speech and a non-speech condition. In the speech condition, participants are exposed to a series of varying standard stimuli and a constant deviant in two contexts: a “word initial” position and an “intervocalic” position (see Table 6 for examples).

Stimuli are presented in a counterbalanced block. In the non-speech condition, the same stimuli are spectrally rotated (Blesser 1972). Spectral rotation inverts the spectral information around a center frequency in the auditory signal while preserving spectrotemporal complexity (see Table 7). The result is a set of sounds which contain the same amount of auditory information as speech sounds, but are perceived by listeners as non-speech instead (Green et al. 2013; Obleser et al. 2007).

3.2 Procedure

Figure 7: Stimuli presentation in the initial context

In order to test this, our protocol proposes exposing speakers to sequences of standard syllables followed by a deviant syllable which is held constant. Since the sounds used in these stimuli are present in both Campidanese and English, the same stimuli can be used across blocks. Participants will be exposed to these stimuli in two contexts: in a “initial” context and an “intervocalic” context. Participants will wear an EEG cap while the EEG machine makes continuous recordings of their electro-cortical reactions to the varying stimuli during trials (see Figure 7). Stimuli are of four different syllables, pa ba fa va. Each of these stimuli forms consists of some configuration of noise and periodicity, or the absence of these cues. Several instantiations of each syllable type are recorded in order to introduce nonsignificant variation and ensure that participants process each stimulus as a category type.

In the initial context, speakers are presented with a random number, between 4 and 8, of the same standard stimulus syllable, either pa, fa, or ba. After this standard is established, participants are exposed to a deviant va in a block design. This means that participants are continuously comparing a standard to a deviant along either one dimension (either noise ba → va or periodicity fa → va) or both dimensions simultaneously (pa → va).

In the intervocalic context, stimuli presentation proceeds much as it does for the initial context. The only difference is that the syllable stimuli are preceded by a brief vowel, a. This means that stimuli are presented in an intervocalic context; instead of hearing a plain consonant, participants hear a consonant in the context of two vowels (e.g. apa, afa, aba, ava, etc). In English, this is not expected to change participants’ processing of the speech sounds, because all four consonants are distinctive in both contexts. In Campidanese, however, this situation does not hold: as shown in (1), speakers do not produce p or f in this context. Instead, p and f are both produced as v, resulting in a situation of incomplete cross classification (see Table 7). Our prediction is that Campidanese speakers will not process the dimensional divergences across stimuli in an independent way for precisely this reason. Thus, we expect no additive ERP in this condition.

An important aspect of our project concerns establishing a baseline for measuring MMN in audition, before speech-sound categories are relevant to perception. This is done by using the speech stimuli from the natural-language protocol, but rotating their spectral information (Blesser 1972; see Figure 10) such that they are perceived by listeners as non-speech (Green et al. 2013; Obleser et al. 2007). This means that the same protocol can be used to test speakers’ perception of the same differences in a quantitative sense, but in auditory objects that differ qualitatively from speech sounds.

Figure 8: Left: spectrally rotated speech; Right: same, unaltered speech (from Marklund et al. 2018: 33).

The purpose of this non-speech condition is to determine if additivity holds across speech and nonspeech conditions (see Marklund et al. 2020). Our hypothesis is that additivity holds when linguistic categories can be meaningfully compared based on independent components in complete cross classification. This is true at higher levels of perception, as well as the earliest stages of auditory processing in speech and non-speech (see Di Dona et al. 2022). When there is a gap in a crossclassificatory design, whether in the speech condition or the non-speech condition, there will be integrated processing and no additivity.

3.3 Participants

For the proposed experimental project on Campidanese, approximately 36 adult, native speakers will be recruited for this study from among the speakers who participated in a field-work project conducted by myself and Simone Pisano in mid-February of 2020. This number of participants allows for attrition due to EEG recording problems or other potential issues that may arise during the fieldwork. Informed consent is acquired by all participants, who will be screened to ensure right-handedness and the absence of any history of speech or neurological pathologies. Participants are tested in all four conditions: speech/non-speech and initial/intervocalic contexts in a counter-balanced block design. Each participant will be exposed to approximately 2,500 trials, lasting approximately 50 minutes. For the project on American English, the same number of adult, native speakers will be recruited locally at the University of Maryland, and the experimental procedure will proceed exactly as for the Campidanese speakers.

3.4 Data treatment and analysis

EEG data will be acquired with a portable BrainVision V-Amp system (see similar methods our group has used for EEG data collection in the field in Lau et al. 2023). The EEG signal will be analyzed using the MNE-Python toolbox. Epochs will then be extracted and average ERP components calculated for each of the four conditions. In our pilot results, we have had success removing artifacts and noise such as eye movements, heartbeats, and line noise, using the Picard algorithm for Independent Component Analysis (ICA) in MNE-Python. While we have already arranged for suitable workspace provided by the Figure 8: Left: spectrally rotated speech; Right: same, unaltered speech (from Marklund et al. 2018: 33). municipality, we expect that ICA will be useful in the field given the nature of electrophysiological work done outside of laboratory environments. Pairwise comparisons will be conducted between conditions in 100-450 ms time-window, within the expected time-course identified for the processing of speech sounds (see Sahin et al. 2009). Differences will be measured in terms of both latency and amplitude between conditions. In addition, in a second trip to Sardinia, a selected number of participants will be re-run to assess the stability and reproducibility of the electro-cortical correlates recorded.

3.5 Interpretation of results

In our common-deviant protocol, we test for additivity in two contexts (speech and non-speech), two languages (English and Campidanese), and two conditions (word initial and intervocalic). We predict independent, additive processing in almost all conditions, but integrated processing in Campidanese intervocalic position where the crossclassification of sounds is suspended. In the speech context (see Figure 9), we expect perception of noise and periodicity to be perceived independently in both conditions for English speakers, and in the wordinitial condition for speakers of Campidanese. Where independent perception is predicted, we predict an additive effect measurable from the elicitation of MMN responses. We predict that integrated perception will happen in the non-speech context and the intervocalic condition in Campidanese, since this language is characterized by a contextuallyconditioned change that modifies speech sounds along the dimensions of noise and periodicity simultaneously, as shown in (1).

Figure 9: Integrated and independent perception of phonetic features in speech sounds. Green and blue arrows indicate a change along a single dimension, either noise (green) or periodicity (blue). The double arrow indicates a change along both dimensions. In cases where we predict an additive effect, the arrowhead is in red. In the intervocalic context for Campidanese speakers, we expect no additive change since cross-classification is incomplete in this context, where b and v are possible, but p and f are not.

3.6 Overall anticipated results

The anticipated results of the experiments are:

In the non-speech condition, English and Campidanese speakers will be able to distinguish between noise and periodicity in the spectrally flipped stimuli, meaning there should be a complete set of comparisons in the 2x2 design. Thus, we predict independent processing of the cues in all non-speech conditions for all speakers.
In the speech condition, for English speakers we expect to find independent processing of the four speech stimuli in both the word-initial and intervocalic positions. That is, there will be an additive effect in their MMN responses to the stimuli, since each of the phonetic dimensions concerned is linguistically distinctive in both conditions.
In the speech condition, for Campidanese speakers we expect to find independent processing only in the word-initial position, where speakers need to be able to differentiate between all four speech sounds to meaningfully distinguish between words. Conversely, we expect to find integrated processing in the intervocalic position, since the differences between sounds is eliminated in this position. Specifically, we expect to find attenuated MMN response for the p vs v and the f vs v comparisons, with the b vs v case serving as the control condition, since the distinction between b and v is maintained in inter-vocalic position. We thus expect to find no additive correlate for the phonetic dimensions concerned, since the salience of these cues is drastically reduced in this position.

In short, we expect language-specific experience to play a critical role in whether or not phonetic cues are perceived in an independent or integrated fashion. In languages which a given set of cues is needed to make a meaningful difference between speech sounds, there will be independent processing of those cues. However, in languages why do not rely on those cues to make or maintain meaningful differences between speech sounds, there will be integrated perception of those speech sounds, even when they present important physical differences.

6 SUMMARY

6.1 Intellectual merit

One of the principle questions in cognitive neuroscience of language is what kinds of neural correlates for linguistic processing can be identified. This project thus seeks to investigate aspects of speech processing that are fundamental to larger questions in cognitive neuroscience. Since speech sounds can be decomposed, their perception in terms of discrete but simultaneously occurring phonetic features is a function of basic human cognition. Neural correlates for cognition are valuable as a means of confirming behavioral evidence and for their explanatory value. The successful identification of neural correlates of phonological activity can be used as a baseline a further comparison: as a metric by which contentious alternations which do not seem to be phonetically motivated can be compared. The techniques used should provide a way of determining the role of phonetic naturalness in phonological patterns, thus serving in delimiting the empirical remit of phonology. The proposed experiment makes a novel and valuable contribution to understanding the neural correlates of dynamic phonological processes, and thus to a more general understanding of language and cognition.

6.2 Broader impacts

This project also contributes to something tangible which speaks directly to the human experience: the documentation and valorization of an understudied, regional minority language. Campidanese Sardinian (Wagner 1997[1950]; Virdis 1978; Contini 1986; Jones 1988; Bolognesi 1998; Lai 2021) is a minority language with no standardized orthography, associated with a rural, old-fashioned lifestyle that is stigmatized nationally and even locally. Faced with socio-linguistic and socio-economic pressures, Campidanese is spoken by fewer people every year (Mensching & Remberger 2016). Given its association with a rural way of life, and without a written standard, Sardinian is spoken by a mostly rural population with a relatively low level of education compared to the national average.

Sardinian is thus a minority language that is threatened, and our project is a concrete example of documentation with the ambition of collecting and making publicly available a corpus of Campidanese conversations discussing local history, traditions, customs, folktales, rhymes, and so forth. To do this, we propose building a navigable website of the corpus data, with transcriptions and complete audio files. This website will be translated into English, Italian, and an acceptable written form of Sardinian; when complete the website material will be given to the University of Cagliari. The purpose of this is not just to make the data available, but also to demonstrate the utility of Sardinian as a language that is on par with English and Italian in terms of its suitability to modern topics of current scientific interest.

Part of project involves hiring language consultants from the local community to help with the transcription and translation work involved in building the website, as well as language consultants with a personal interest in Sardinian. Sardinia has two universities, one at Cagliari and one at Sassari. Our project aims to provide opportunities for students interested in language documentation, language science, and cognitive neuroscience to collaborate with an international team, thereby providing local training in techniques and expertise that might not otherwise have been available. Our intention is to help train students in theoretical and practical matters in these domains of science such that they may continue investigation locally on their own after the project has been completed.

Economically, the per capita GDP of Sardinia was about 28.74% less than the national average in 2021. Only 11.9% of Sardinians hold a college degree, compared to the national average of Italy at 13.54% (while the level of university education for Italy is low compared to European standards, this average is pulled down by a “North-South divide”: in the more affluent northern region of Lombardia, for example, the average is 14.2% across all age groups). This means that educational opportunities are more restricted on Sardinia relative to Italy and to Europe. As a consequence opportunities to improve local conditions and standards through education are also relatively restricted on Sardinia.

This reliance on local collaboration is intended to provide opportunities for students to participate in constructive, active, and essential ways in an international scholarly project they would not otherwise have had. The use of Sardinian as a contemporary language spoken by people living today is meant to be complimentary to the historical approach that typically characterizes university treatment of Sardinian in Italy. By hiring students, it is also hoped that this project will serve to build a connection between researchers at the University of Maryland and the local universities.

In the United States, we plan to use aspects of this project in courses that are a long-standing part of our curriculum concerning language documentation, theoretical linguistics, and neuro-cognition of language. This project will produce a great deal of linguistic and neuro-physiological data, which will provide students at our institutions in the US the opportunity to apply skills and techniques they study in class to real-world data collected in the field. On the socio-linguistic side, contact with Sardinian can demonstrate to US students who may have been directly confronted with language discrimination that the notion of an ideal standard is subjective: since there is no standard form of Sardinian, the notion of an inherently better or correct form of the language is nonsensical. Thus, the project is relevant to our students in several important ways.

More locally, the project facilitates increased engagement between the researchers and the local community, beyond the usual practice of extracting data and removing those data from the community. In addition to website to be purpose built for hosting the corpus material, the linguistic data will be archived and preserved at the Biblioteca comunale Edmondo de Amicis in Genoni, a local Multimedia library and cultural center that hosts cultural events of local interest and, importantly, provides public access to computers, films, books, music, and other kinds of recordings to members of the community.

My collaborator Simone Pisano and I have already worked with the Biblioteca, and we wish to continue this collaboration by handing over the language data to the archivists at the Biblioteca, so that they can be preserved and made locally available to the entire community. Once the project is complete, the team intends to organize a small event for the dissemination of the research project, involving local collaborators and a public demonstration of the kinds of questions asked by the project, as well as any relevant results. This public seminar will be held in the public space at the Biblioteca and conducted in Sardinian, thus further demonstrating the use of Sardinian in discussing topics of current scientific interest, re-enforcing its social status locally. The intent is that the community be able to benefit from the data collected from them as it can help to dispel negative connotations stemming from the rural and antiquated characterization of Sardinian. Our ultimate goal is to enable outreach to a broader public by communicating the utility of language science and neuroscience in understanding cognition and human nature more generally.

References

Blesser, Barry. 1972. Speech Perception Under Conditions of Spectral Transformation: I. Phonetic Characteristics. Journal of Speech and Hearing Research 15. 5-41.

Bolognesi, Roberto. 1998. The phonology of Campidanian Sardinian. Dordrecht: HIL.

Caclin, Anne, Elvira Brattico, Mari Tervaniemi, Risto Näätänen, Dominique Morlet, Marie-Hélène Giard & Stephen McAdams. 2006. Separate Neural Processing of Timbre Dimensions in Auditory Sensory Memory. Journal of Cognitive Neuroscience 18(12). 1959-1972.

Chabot. Forthcoming. Prosodic strength in Campidanese Sardinian as Substance-Free Phonology. Phonology.

Contini, Michel. 1986. Les phénomènes de sandhi dans le domaine sarde. In Henning Andersen (ed.), Sandhi phenomena in the languages of Europe, 519–550. Berlin, New York, & Amsterdam: De Gruyter Mouton.

Embick, David & David Poeppel. 2015. Towards a computational(ist) neurobiology of language: Correlational, integrated and explanatory neurolinguistics. Language, Cognition and Neuroscience 30(4). 357–366.

Garner, W.R & Gary L. Felfoldy. 1970. Integrality of stimulus dimensions in various types of information processing. Cognitive Psychology 1(3). 225-241.

Green Tim, Stuart Rosen, Andrew Faulkner & Ruth Paterson. 2013. Adaptation to spectrally-rotated speech. Journal of the Acoustical Society of America 134(2). 1369-1377.

Han, Zhili, Hao Zhu, Yunyun Shen & Xing Tian. 2023. Segregation and integration of sensory features by flexible temporal characteristics of independent neural representations. Cerebral Cortex 2023. 1-12.

Hansen, Niels Chr, Andreas Højlund, Cecilie Møller, Marcus Pearce, Peter Vuust. 2022. Musicians show more integrated neural processing of contextually relevant acoustic features. Frontiers in Neuroscience 18. 1-18.

Jääskeläinen, Liro, Jurki Ahveninen, Giorgio Bonmassar, Anders M. Dale, Risto J. Ilmoniemi, Sari Levänen, Fa-Hsuan Lin, Patrick May, Jennifer Melcher, Steven Shufflebeam, Hannu Tiitinen &Job W. Belliveau. 2004. Human posterior auditory cortex gates novel sounds to consciousness. Proceedings of the National Academy of Sciences USA 101. 6809–6814.

Janssen, Niels, Maartje van der Meij, Pedro Javier López-Pérez & Horacio A. Barber. 2020. Exploring the temporal dynamics of speech production with EEG and group ICA. Scientific Reports 10. 1-14.

Jones, Michael. 1988. Sardinian. In Martin Harris & Nigel Vincent (eds.), Romance languages, 314–350. London & New York: Routledge.

Lai, Rosangela. 2021. Sardinian. In Christoph Gabriel, Randall Gess & Trudel Meisenburg (eds.), Manual of Romance phonetics and phonology, 597–627. Berlin & Boston: De Gruyter Mouton.

Lawyer, Laurel & David Corina. 2013. An investigation of place and voice features using fMRI-adaptation. Journal of Neurolinguistics 27(1). 18-30.

Lerousseau, Jacques Pesnot, Cesare V. Parise, Marc O. Ernst & Virginie van Wassenhove. 2022. Multisensory correlation computations in the human brain identified by a time-resolved encoding model. Nature Communications 13. 1-12.

Lidji, Pascale, Pierre Jolicœur, Régine Kolinsky, Patricia Moreau, John F. Connolly & Isabelle Peretz. 2010. Early integration of vowel and pitch ocessing: A mismatch negativity study. Clinical Neurophysiology 121(4). 533-541.

Manca, Anna Dora, Francesco Di Russo, Francesco Sigona & Mirko Grimaldi. 2019. Electrophysiological evidence of phonemotopic representations of vowels in the primary and secondary auditory cortex. Cortex 121. 385-398.

Marklund, Ellen, Francisco Lacerda & Iris-Corinna Schwartz. 2020. Using rotated speech to approximate the acoustic mismatch negativity response to speech. Brain and Language 176. 28-35.

Mensching, Guido & Eva-Maria Remberger. 2016. Sardinian. In Adam Ledgeway & Martin Maiden (eds.), The Oxford guide to the Romance languages, 270–291. Oxford: Oxford University Press.

Näätänen, Risto, Thomas Jacobsen & István Winkler. 2005. Memory-based or afferent processes in mismatch negativity (MMN): A review of the evidence. Pyschophysiology 42(1). 25-32.

Näätänen, Risto, Teija Kujala & Gregory A. Light. 2019. The Mismatch Negativity (MMN): A window to the brain. Oxford: Oxford University Press.

Oblesser, Jonas, Aditi Lahiri & Carsten Eulitz. 2003. Auditory-evoked magnetic field codes place of articulation in timing and topography around 100 milliseconds post syllable onset. NeuroImage 20(3). 1839-1847.

Obleser, Jonas, Jonas Zimmermann, John Van Meter, Josef P. Rauschecker. 2007. Multiple Stages of Auditory Speech Perception Reflected in Event-Related fMRI. Cerebral Cortex 17(10). 2251-2257.

Paavilainen, Petri, Sanna Valppu & Risto Näätänen. 2001. The additivity of the auditory feature analysis in the human brain as indexed by the mismatch negativity: 1+1≈2 but 1+1+1<3. Neuroscience Letters 301. 179-182

Paavilainen, Petri, Mikko Mikkonen, Marku Kilpeläinen, Reia Lehtinen, Miiamaaria Saarela & Lauri Tapola. 2003. Evidence for the different additivity of the temporal and frontal generators of mismatch negativity: a human auditory event-related potential study. Neuroscience Letters 2(2). 79-82.

Parise, Cesare & Marc O. Ernst. 2016. Correlation detection as a general mechanism for multisensory integration. Nature Communications. 1-9.

Scharinger, Mathias, William J. Idsardi & Samantha Poe. 2011. Neuromagnetic reflections of harmony and constraint violations in Turkish. Journal of Laboratory Phonology 2. 99-123.

Schröger, Erich. 1995. Processing of auditory deviants with changes in one versus two stimulus dimensions. Psychophysiology 32(1). 55-65.

Stefanics, Gábor, Jan Kremláček and István Czigler. 2014. Visual mismatch negativity: a predictive coding view. Frontiers in Human Neuroscience 8. 1-19.

Takegata, Rika, Minna Huotilainen, Teemu Rinne, Risto Näätänen & István Winkler. 2001. Changes in acoustic features and their conjunctions are processed by separate neuronal populations. NeuroReport 12(3). 525-529.

Virdis, Maurizio. 1978. Fonetica del dialetto sardo campidanese. Cagliari: Della Torre.

Wagner, Max Leopold. 1950 [1997]. La lingua sarda. Nuoro: Ilisso.

Wolff, Christian & Erich Schröger. 2001. Human pre-attentive auditory change-detection with single, double, and triple deviations as revealed by mismatch negativity additivity. Neurocience Letters 311(1). 37-40.

Yu, Keke, Yuan Chen, Menglin Wang, Ruiming Want & Li Li. 2022. Distinct but integrated processing of lexical tones, vowels, and consonants in tonal language speech perception: Evidence from mismatch negativity. Journal of Neurolinguistics 61. 101039.

Google Sites

Report abuse