Auditory Spreadsheet Horizon

Paradigm 1 Paradigm 2 Paradigm 3 Paradigm 4

Information Communication Technologies (ICTs) present information to Blind and Low Vision Individuals (BLVIs) sequentially (e.g., via screen readable text descriptions). These lack the affordances of spatial information foraging (e.g., graphical user interfaces provide non-sequential access to diagrammatically arranged elements) that sighted individuals obtain through visual scanning. Despite the commercial availability of non-visual alternatives (e.g., binaural audio cues, haptic and tangible elements of gaming controllers), designers of accessible ICTs underutilize them, defaulting to tags and text-descriptions in most cases. The Auditory Spreadsheet Horizon (ASH) paradigm employs spatial audio to convey spatial-topologically-arranged elements of a virtual “ground plane” consisting of an editable spreadsheet as a means to translate maps and graphical user interface elements to BLVIs. This paradigm emerged through longitudinal co-design and usability testing with BLVIs to discover pain points from ICT use in everyday life. The results yielded design recommendations concerning information foraging, varying information granularity, and construction or recollection of mental models, which foreshadow future guidelines for accessible ICT design.

Train Station Spreadsheet Map Prototype

Recommendations

2D Navigation and Detection of Information From Range

1. Spreadsheet Navigation Format

Information-dense ICT interfaces, such as maps or large inventories (e.g., shopping websites) should be represented using the spreadsheet navigation format. This would allow 2D navigation along two axes rather than strictly sequential navigation, easily achieved via use of a set of keys on keyboards, buttons or joysticks on a physical controller, directional swiping on a touchscreen, and other methods depending on the medium. Further, it provides a paradigm for retaining spatial relations non-visually, by making use of the position of “cells” relative to one another, which would minimize the possibility of missing information or the user getting lost.

2. Speech Labels

Along with 2D navigation, labeling each “cell” with speech output allows non-visual awareness and retention of the content, instead of requiring a search for specific data every time it is needed. Speech labels provide affordances for conveying abstract conceptual information (Coppin, 2014) and attention to possible actions; for example, “Train station Gate 12”, or “Press ‘Enter’ to teleport”. This is in contrast to concrete perceptual information (Coppin, 2014), such as the hum of a refrigerator located in the dairy aisle of a grocery store, or the trickling sound of a running faucet, each of which represent scenarios in which non-linguistic perceptual cues would be preferable to minimize cognitive load and still be comprehensible by users within a specific context.

3. Binaural Audio Features

Binaural audio produced by specific items and features within an ICT interface allows information perception from range and diffuses the cognitive load of a user listening to multiple points on a virtual ground plane around them rather than forcing each audio source to compete with one another by being layered together indistinctly (i.e., sounds as though they all come from the same location), and help users with targeted navigation in specific directions. Participants in the prototype usability testing sessions were observed to quickly attach directional audio sources to items and features once they have been encountered at least once, making this a useful strategy for distinguishing between different items and features. Finally, ambient binaural audio sources were shown to be useful in item and feature identification (given a contextual relationship, such as the sound of a large crowd of people used for mall entrances, or the sound of trains pulling into and out of a station for entrances to train station platforms), as well as for continuous orientation within a constructed mental model of the interface.

4. “Tracing”

A “tracing” feature allows for detection of the full size and shape of an item that is part of an ICT interface without vision, and without requiring precise navigation to every “cell” that is part of the item without knowing this in advance. An effective “tracing” feature should convey this information non-linguistically to avoid cognitive overload - for example, when an item such as a wall in a physical map is encountered, a single linguistic label outputting “wall” in speech to identify the item should then be followed by a descriptive sound effect (something used specifically for this purpose, or something that would “sound like a wall” for the user) coming from each adjacent cell that represents the item, all delivered in binaural audio that would convey the distance and direction of the item’s full size and shape relative to the user’s location at that moment. This would avoid an alternative approach with linguistic descriptions that would repeat the name of the item (e.g., rather than a ‘click’ sound, the user would hear “wall” many times in a row, which is overwhelming).

5. “Teleportation”

Direct access to information or a feature in an ICT interface should be provided in a way that circumvents sequential menus, hierarchies and spatial relations. This is necessary for usability, though should always be paired with other information detection features that would provide feedback to establish the change in position once teleportation has occurred (e.g., ambient binaural audio would shift to represent a new position, allowing the user to maintain a mental model despite the instantaneous movement, or at least allow for comparisons to be made to connect segments of a mental model later).

Range of Information Granularity

1. Hierarchical linked layers

Layers of spreadsheet interfaces should be built as hierarchical information patches (Pirolli and Card, 1999) based on working memory limits (e.g., working memory model, clustering similar information to facilitate chunking whenever possible [Baddeley and Hitch, 1974]). This allows users to avoid information overload from having too much content to explore at any given moment, as well as to efficiently perceive spatial relations between items that increase in complexity at lower levels of the hierarchy. For example, if a User Interface (UI) is comprised of three items, their spatial relations can be easily minimized (e.g., two items above and to the right, one item below and to the left), but still retained at the lower levels (e.g., top-right item expands into a new layer containing 6 items in a 4x9 grid, and the user only encounters these items after discovering that there are two more “patches”, one to the left, and one below and further to the left, allowing them to anticipate where items and features elsewhere in the UI would be found in relation to their current location).

2. Contextual Non-Spatial “Index”

To avoid requiring users to explore an ICT interface aimlessly in order to discover what items and features it contains, its items and features should all be provided via a hierarchical list. This list should only serve to make the user aware of the UI’s content, so it should not include spatial relations in the fashion of the hierarchical layers outlined above. Instead, the “index” should list items and features of the UI in a contextual hierarchical relationship - for example, in a map of a train station, items such as “ticket counter” or “washroom” could be found within categories such as “information” or “facilities” respectively, in order to contextually guide the user to something that they may be explicitly seeking. A balance between the need to contextually describe content in the UI via categorization should be struck with the need for the list to be limited in the number of items any one level should contain, and in how many levels there should be: in general, both numbers should be within 7 +/- 3, reflecting known working memory limitations (Baddeley and Hitch, 1974).

Foster Mental Model Construction and Recollection

1. “Guided Tours”

The items and features of the ICT interface should be summarized at a high level, and this summary should be made available as soon as the user interacts with the interface. This should be a linguistic description that is concise (e.g., not every item or feature should be listed - only the ones most likely to be relevant to the user, or ones that do not already have a contextual relationship with another item or feature in the description that would allow prediction of its existence and rough location - such as “door to train gates in the top-left”, which foreshadows the relative location of “Train Gate 5”, so the latter is not needed in the summary).

2. “Landmarks”

To aid in orientation and navigation within this paradigm, “landmarks” should be used: this refers to individual items in the interface that can be referenced via non-visual perceptual feedback that corresponds with the item’s location and distance in relation to another point. One or more “landmarks” serve to allow users to keep track of the position of other items in comparison, and thus foster both construction and recollection of the spatial arrangement and relations among the interface’s contents, and plot out a path to follow as a guide to items and features in a particular order, for example (e.g., in a spreadsheet navigation format for a map of a train station, street entrances and train gates should be able to be tagged and feedback on their positions obtained at any time, to provide points of reference for the user on where other items and features of the map are in comparison, such as a fare machine, or a set of benches).

3. Non-Linguistic Audio Cues

Finally, the use of non-linguistic audio cues is an important consideration for minimization of cognitive load. Similar to “tracing”, every opportunity to provide contextually relevant audio cues to convey meaning rather than speech should be taken. Consider experimenting with the balance between which items and features in an ICT interface produce speech labels (e.g., for features that are used for specific functions that are difficult to describe with a non-linguistic sound, such as the name of something, or the turning on and off of a microphone in a virtual meeting - what sound effects would communicate this information completely enough for users to know what actions could be taken? If this is ambiguous, a speech label is likely preferable) as opposed to audio cues (e.g. clicks, footsteps, a door being shut, etc.)

Rationale

The spreadsheet paradigm began as a convergence between several early themes developed during the retrospective narrative inquiry (RNI; Clandinin and Connelly, 2004) interviews and longitudinal co-design workshops phases (e.g. grid navigation, spatial audio feedback, contextual ambient audio encountered in lived experiences) and the observed difficulties participants had with what we predicted would be beneficial and desirable new paradigms in audio Virtual Reality (VR) and screen-readable interfaces such as online grocery websites. Knowing that shared intentionality is constrained in working and learning environments that feature ICTs (Lee, Sukhai and Coppin, 2022) in part due to a lack of spatial-topological properties, audio VR seemed ideal for addressing many of these pain points; instead, it was found that Blind or Low-vision (BLV) individuals have significant challenges with orientation, navigation, and temporally compressed cognitive load:

Orientation and feedback: Without continuous audio feedback, or feedback available upon request, BLV users of VR interfaces quickly lose track of where they are, and what is around them.

Inability to “rehearse”: Lack of perceivable spatial relations and high-level summaries of content harms the ability of BLV users to “rehearse” interactions for real spaces remotely or plan in advance.

Lack of awareness of system status: Related to the lack of perceptual feedback indicated above, BLV users often have little to no information to foster awareness of a system’s status and possible actions that may be taken when using ICTs.

High cognitive load from unnecessary repetition: Information foraging (Pirolli and Card, 1999) efforts are frequently rendered inefficient and mentally taxing due to repetitions of the same information in the same sensory modality (e.g., navigating menus with screen readers that require going through the same sequences every time to reach specific items, repeated labels of features that the user is already aware of from context, etc.)

Need for navigable, searchable lists of all content: In order to forage directly, rather than through exploration of unknown structures or spaces, users need a list of everything that they could find that can be quickly scanned for relevant items.

Information overload: Sheer quantity of information (even if not repetitive) is often encountered when using ICTs, causing overload if there is no additional organizational structure or limitation of the quantity being pushed on the user at any given moment.

The pain points outlined above provided initial and iterative improvements to the VR spreadsheet prototypes produced throughout this phase, which included the following features:

Speech output of the content of each cell, triggered by moving the focus to it with the keyboard arrow keys,
Ambient spatial audio that is perceivable from a set cell distance away from the focus cell, serving as a cue to users when a prominent item is near via auditory spatial relations,
Division of the map into hierarchical patches (Pirolli and Card, 1999) that are linked, to minimize the quantity of information presented at once, while still maintaining spatial relations,
Collation of all items in all of the patches (Pirolli and Card, 1999) into an index that is divided and organized into hierarchical layers based on contextual relations, rather than spatial information granularity, allowing the user to directly move to each item if foraging for a specific item of interest,
“Jumping” (skipping over consecutive cells with the same content), e.g. adjacent blank cells, 2 or more cells in a row called “Wall”,
Non-linguistic audio cues that replace linguistic labels for feedback whenever appropriate (e.g. to indicate movement between cells), and
Relevant mapping of keyboard shortcuts for features (e.g. the ‘I’ key for the index feature).

This prototype and its features were the outcome of an iterative design process , building on insights from previous works on accessible eXtended and Virtual Reality (XR and VR), theories of information representation that emphasize the distinct affordances of pictorial, diagrammatic and text-sentence formats (Larkin and Simon, 1987; Coppin, 2014; Coppin, Li and Carnevale, 2016), and studies including the audio mapping approach of Biggs et al. (2021). We conducted a longitudinal co-design methodology (mirroring that of Hagen, Collin, Metcalf, Nicholas, Rahilly and Swainstrom, 2012) in which we collaborated with BLV participants to identify and minimize accessibility barriers in everyday information-dense environments (e.g., virtual meetings, learning, shopping) in which they predominantly use screen reader technology to forage sequentially, or avoid entirely and ask for help if possible, due to the time and attention-intense requirements to succeed. After conducting the iterative testing phase, design recommendations were proposed concerning two dimensions (2D) navigation and detection of information from range, control over information granularity, and methods for fostering mental model construction and recollection, which designers could use to improve a wide range of ICT interfaces.

References (in order of appearance):

Coppin, P. (2014). Perceptual-cognitive properties of pictures, diagrams, and sentences: Toward a science of visual information design. Doctoral dissertation, University of Toronto.
Pirolli, P., & Card, S. (1999). Information foraging. Psychological review, 106(4), 643.
Baddeley, A. D., & Hitch, G.J. (1974). Working memory. The psychology of learning and motivation, 8, 47-89.
Clandinin, D. J., & Connelly, F. M. (2004). Narrative inquiry: Experience and story in qualitative research. John Wiley & Sons.
Lee, E., Sukhai, M., & Coppin, P. (2022). How virtual work environments convey perceptual cues to foster shared intentionality during Covid-19 for blind and partially sighted employees. In Proceedings of the Annual Meeting of the Cognitive Science Society (Vol. 44, No. 44).
Larkin, J., & Simon, H. (1987). Why a Diagram is (Sometimes) Worth Ten Thousand Words. Cognitive Science, 11(1), 65–100.
Coppin, P.W., Li, A., & Carnevale, M. (2016). Iconic Properties are Lost when Translating Visual Graphics to Text for Accessibility. Cognitive Semiotics. http://openresearch.ocadu.ca/id/eprint/1035/1/Coppin_IcoNic_2016_preprint.pdf.

Biggs, B., Coughlan, J. M., & Coppin, P. (2021). Design and evaluation of an interactive 3D map. Rehabilitation Engineering and Assistive Technology Society of North America, 2021.
Hagen, P, Collin, P, Metcalf, A, Nicholas, M, Rahilly, K, & Swainston, N. (2012). Participatory design of evidence-based online youth mental health promotion, prevention, early intervention and treatment. Young and Well Cooperative Research Centre. Retrieved from https://www.westernsydney.edu.au/__data/assets/pdf_file/0005/476330/Young_and_Well_CRC_IM_PD_Guide.pdf.