Information processing in the brain operates hierarchically. In speech, at least three levels of processing are involved: auditory (of the physical properties of sounds), linguistic (the differentiation between /d/ and /t/ in English), and domain-general pattern detection (combining instances of English /d/ and /t/ to create an ad hoc supra-group). Previous work indicates that the processing of consonants is predominantly linguistic (linguistic > auditory & domain-general pattern detection), so English brains refuse to mix instances of /d/ and /t/ to create an ad hoc supra-group (Phillips et al., 2000). Auditory processing is also overshadowed by linguistic processing, as small acoustic differences between instances of /d/ have limited influence (Rhodes et al., 2019).
My work on Chinese lexical tone, whose primary acoustic correlate is the fundamental frequency pattern over a syllable, reveals that the perception of Chinese lexical tone is not predominantly linguistic (domain-general pattern detection > linguistic > auditory), so Chinese brains can mix two distinct tone categories (/ī/ and /ì/) to create an ad hoc supra-group. Evidence for linguistic processing of lexical tone is only indirectly observed via the disruption of auditory processing, where the brain responses are expected to correlate with acoustic properties.