The purpose of this evaluation is to assess the usability, accuracy, and effectiveness of the user interface and core functionality of our real-time messaging application, ConvoSense, that translates messages between users in different languages. The evaluation goal is to measure how intuitive and reliable the app is for multilingual communication in real time.
Native language of the user (English, Spanish, Mandarin)
Message complexity (simple vs. complex sentence structures)
Conversational context (casual conversation vs. professional dialogue)
Device used (desktop vs. mobile)
Translation accuracy (measured via bilingual reviewer ratings)
Task completion success rate (whether users could understand and respond correctly)
Message delay (latency in message translation and delivery)
User satisfaction (post-task surveys)
Total Participants: 12-20 users (the more the better)
Recruitment: A mix of multilingual and monolingual users, ideally with native proficiency in at least 3-4 different languages
Diversity Criteria: Age, device familiarity, and language background will be balanced to reflect a wide range of target users
Users are given specific tasks (plan an event with a peer who speaks a different language)
Sessions are screen-recorded and audio-recorded
Participants verbalize their thoughts as they interact with the app
Create controlled pairs of users who speak different languages and have them perform predefined communication tasks
Measure message delay, task success, and translation accuracy
Survey Items: To assess satisfaction with translation quality, ease of use, and perceived effectiveness of real-time interaction
Semi-structured interviews to gather qualitative feedback on interface layout, translation clarity, and any confusion during the interactions
Translation accuracy score (rated 1-5 by bilingual reviewers)
Time taken to complete tasks
Number of message corrections or clarifications
Latency in message delivery (measured in ms)
System Usability Scale scores and Likert-scale responses
Observations from think-aloud sessions
Open-ended survey responses
Interview transcripts highlighting pain points or suggestions
Use descriptive statistics to analyze usability scores, accuracy, and latency
Categorize user feedback to identify common usability issues and patterns of confusion or satisfaction
Evaluating this app involves both functional and experiential factors. The real-time aspect introduces concerns around latency and live user comprehension, while the translation feature necessitates assessing both linguistic accuracy and contextual understanding. Combining task-based observation with both subjective (user satisfaction) and objective (accuracy and speed) metrics provides a well-rounded evaluation of the user experience.