We did the in-lab user study and Wild experiment in Frameless XR Symposium. For the in-lab study part, we designed a process with pre-survey, system experience, post-survey, and interview. For the wild experiment part, we have system experience and quick interviews.
Participants: 4 graduate students with ML, NLP, Vision backgrounds and 1 parent who has experience of playing with small kids and learning ASL experience.
Previous experience in communication with DHH people and ASL learning:
2 have experience with communicating with DHH people
1 have learned ASL from university class.
Post-survey is about the effectiveness and the interface convenience.
Since we only had 5 participants, the results were kind of divergent, the number of agreeing and disagreeing are relatively equivalent.
If the participant answeres "disagree" for a question, we will ask follow-up questions on it.
This part includes open-ended questions about the system and follow-up questions about the post-survey.
We found:
Participants felt upset when the system failed to detect their speech because they think the system is perfect and they are doing something wrong.
Real parents showed autonomy through tones, words, and gestures during the experiment.
Participants: 7 students, professors, and parents from various universities.
Quick interview after they experience the system
ASL recommendation strategy
In the parent-child toy-playing scenario, when the parent speaks a sentence or a paragraph, what kind of information do you want to show in the ASL video?
Phrase vs. single word (“gray elephant” vs. “elephant”)
Other words that associated with the main object (“duck” -> “duck swimming”)
Difficult to remember a sign for each word, could use common concepts (animals, food, colors)
show a single word the first time, then customized the number of words
Interface design
To sum up, we got the following feedback:
Cannot maintain parent-child eye contact while looking at the ASL videos, may move the projection after the child
The video is too small
The video is out of the projection area when the toy is on the edge
Technical issues
Time latency between the speech and ASL video shows up
Didn’t consider the parent-children tone/pitch (duck vs. duckie, quack)
Cannot find synonyms (boat vs. ship)
System’s response - ~ 800 ms
Instantaneous system’s response time - 300 ms
Object of interest detection - 75 %
Context appropriateness of ASL recommendation - not done yet
ASR Engine accuracy - 4.1% (WER)