A Non-Pointer-based Mid-Air Text Correction System with Speech and Numeric Gestures
Md Ehtesham-Ul-Haque, Lay Patel, and Syed Masum Billah
Md Ehtesham-Ul-Haque, Lay Patel, and Syed Masum Billah
Speech interaction has become increasingly popular due to its naturalness and improved recognition accuracy. However, speech-based text correction still faces major usability issues, including difficulty in target acquisition with multiple occurrences and typographical errors. Moreover, overusing speech for both commands and dictation can be error-prone, cumbersome, and demanding. To address these issues, we propose NumeroMotorCorrect, a multi-modal system that combines speech only for inputting correction arguments and non-pointer-based mid-air counting gestures for target selection and command invocation. This approach reduces the overuse of speech. We attribute the target acquisition to a separate modality of counting gestures, allowing easier word selection with a unique number for each word, regardless of repetition and typos. We conducted a within-subject evaluation with 12 participants, which showed that our system was 45% faster for target acquisition and 55% faster overall for text correction tasks compared to speech-only text correction in the desktop platform. We performed another between-subject evaluation with an additional 12 participants, which indicated that our technique is equally effective on desktop and VR platforms, strongly suggesting NumeroMotorCorrect enables users to transfer their skills across computing platforms. Subjective feedback indicated that participants experienced reduced cognitive and temporal demands, more predictability, and improved usability over speech-only text correction.
The interface of the NumeroMotorCorrect system in Virtual Reality. (Left) Each word in the text is tagged with an auto-incremented number. The user selects the typographical error ‘freqwuent’ with index 24 using finger counting gestures. (Center) Three correction commands are available - Delete, Replace, and Insert, each associated with a number. The user selects the ‘Replace’ command with gesture 55, which turns on the microphone (Right). The user speaks the correct word, ‘frequent’. The microphone closes automatically, and the spoken word replaces the selected error.