Empirical Evaluation Setup
Benchmark Tasks:
A benchmark task is an action taken by a participant of evaluation in order to observe how participants perform. We had 6 benchmark tasks for our target table. Each task revolved around the user navigating through the app in order to access one of the many interactive features. For example, one of our tasks was for the user to add an event/place to their 'wishlist.' We wanted to test these features in action as they are quite significant elements to our app that we think users would benefit from.
Connection to UX Table:
We previously mentioned, we decided on empirical evaluation because our benchmark tasks are best achieved by physical interaction with the prototype. For this reason, we believed that direct input by potential users from various user classes was the most appropriate source of data. Additionally, we chose to conduct the evaluation through a questionnaire for two main reasons. First, having a questionnaire with questions that utilize a Likert scale response provides us with quantitative responses even for seemingly qualitative questions, so determining whether we have met a certain UX target is simple. Second, utilizing a questionnaire with instructions enables us to reach out to a wider range of evaluation participants because all we have to do is share the questionnaire link.
Goals of Study:
The purpose of this empirical evaluation is to gain insights into various aspects of our current prototype with the help of users. For this reason, our evaluation will cover all aspects of our prototype such as instructions, features, and designs, just to name a few. The short-term benefit of evaluation is that we are able to get a taste of the viewpoints and expectations of the potential users. As a result of the evaluation, we identify a list of strengths and weaknesses and adjust the current system to emphasize the strengths and improve upon the weaknesses. Hence, a long-term benefit of evaluation is that catching errors and making adjustments as early as possible will reduce costs such as time.
Method for Recruiting Participants
We began the evaluation process by seeking the evaluation participants. Most of the evaluation participants were friends, family members, etc. Although we are aware that our Benchmark Tasks involve multiple user classes, we had each participant complete all 7 Benchmark Tasks. The reason is that each user is capable of representing multiple user classes; for example, a business owner who advertisement his/her business through StreetSmart may use StreetSmart as a traveler during vacation.
List of Participants:
Jason
Ryan
Jackson
Cindy
Peter
Zach
Evan
George
Angelica
Antonio
Quincy
Procedure for Tasks
User clicks a link to get to Google Forms which is where our questionnaire is hosted.
The user will read the description of the questionnaire and click on the link to our prototype to begin.
A team member will start the timer when the user is ready.
The team member will ask for the user to complete a set number of tasks and record the time it takes them to complete said task.
The team member will also jot down observations whilst watching the user complete tasks.
The team member will stop the timer once the task has been completed.
After the team member has finished asking the user to click through different features on the prototype, the user will return back to the questionnaire.
The user will answer all questions on the questionnaire.
Apparatus:
Prototype: The prototype was designed using Figma. Each participant was able to access the prototype through the Post-Interaction Questionnaire that contained the link to our prototype.
Location:
In-Person Evaluation: If a participant preferred conducting the evaluation in person, then we adjusted to the participant's location preference.
Remote Evaluation: If a participant preferred conducting the evaluation remotely, we scheduled an online meeting; the example video above provides a sense of how the online meetings generally went.
Data Collection Method:
Every time we scheduled an evaluation session, as many team members as possible tried to attend the evaluation session.
During each evaluation session, each team member took a role:
Instructor, who described the tasks to the evaluation participant.
Timer, who measured the amount of time the evaluation participant took to complete a task.
Counter, who measured the number of times the evaluation participant made a mistake during each task.
Observer, who observed the evaluation participant behaviors during the session.
Recorder, who recorded the online meetings.
Depending on the number of members present, some members had to take multiple roles.
After the evaluation participant completed all tasks, the evaluation participant moved on to submitting the responses to the Post-Interaction Questionnaire based on their personal experience with StreetSmart.
Observations:
Some participants were naturally faster at completing the tasks.
Most participants were able to complete tasks much faster later into the evaluation session.
As the participants were completing the tasks, some participants unconsciously pointed out some flaws.
An example is when a participant said, "How do I go back?"
The more steps away from the start screen a task required, the longer time and more mistakes the participants took to complete a task.
There was a correlation between metrics, such as completion time and mistake frequency, and Post-Interaction Questionnaire ratings.
Participants who struggled with the system were less happy with the system.
The empirical evaluation is created directly based on the UX target table. We first came up with a series of different things about the app that we wanted to test. These things are shown about in the UX goal section of the UX target table. However, we needed a way to actually measure how well our app fulfilled these goals. Because of this, we had to come up with tasks for the user to complete to show how good our app is at accomplishing these goals. These things are shown as "Measuring instruments" in the UX target table.
When creating these tasks, we made sure to come up with ones that could reasonably be accomplished by our users within the span of a couple minutes while also being able to be done using our prototype. This was done so that we could use the exact metrics from the UX target table in our empirical evaluation without having to adapt them to fit our prototype. To test for these tasks we would time the amount of time it takes for our user to complete each task. While timing them we would also count the amount of errors users would make when trying to accomplish these tasks. To find more information about how users were feeling about our app we came up with questions we wanted to have the user answer after experiencing our app. This was another measuring instrument as shown in the UX target table that we could use to properly gauge customer satisfaction. To do this though, we would have to come up with a way for the users to give us feedback that would could perform analytical analysis on. The best process of doing this was by providing the users a questionnaire after they had finished performing the tasks we had asked them to do with questions that allowed the users to give us numerical insight on how they felt.
To do this, we created a "Post Interaction Questionnaire" using a Google Forum. This Forum asked users questions about their experience using the app and they way users would respond would be by picking a number from either 1-5 or 1-10 with each question explaining what the numbers mean. This allowed us to generate analytical feedback on the general experience users had using our app. Our app was designed to be clear and easy to use so we made sure to ask questions on how good our users felt on these topics especially. We set minimum values for the data and goals that we wanted to achieve as shown by the "Baseline Level" and "Target Level" in the UX target table respectively. After collecting all the data from our users, we averaged the numbers we got from each metric and compared them to the base line and target levels that we set for ourselves.
Instruments:
Google survey link
The questions & response options are described below.
Timer
To record how long it takes the user to complete benchmark tasks.
Paper & Pencil
To record observations.
Electronic Device
To access the questionnaire & StreetSmart Prototype.
Post-Interaction Questionnaire Questions:
Note: The participant answers all questions by picking a value in a linear scale from 1 to 5; please refer to below for the meanings of 1 and 5 in each question.
Labeling and navigation icons were intuitive when completing my desired task.
1 = Distantly; 5 = Closely
Instructions were clear in describing each task.
1 = Confusing; 5 = Clear
Instructions were consistent.
1 = Not at all; 5 = Extremely
Buttons match the tasks they performed on the application.
1 = Not at all; 5 = Extremely well
Did you feel that StreetSmart was able to assist or suggest alternatives when performing a given task?
1 = Never; 5 = Always
The design layout of StreetSmart made the process of a given task easy to comprehend and quick to complete.
1 = Not at all; 5 = Yes, completely
Did you find that selecting one option led you to your intended page in StreetSmart?
1 = Not at all; 5 = Yes, completely
How hard was it for you to learn how to use all of StreetSmart's features?
1 = Difficult; 5 = Easy
The amount of content I had to learn and remember in order to perform tasks was ______.
1 = Overwhelming; 5 = Very minimal
How inclined were you to explore all the StreetSmart features?
1 = Not at all; 5 = A lot
What are your overall reactions to the StreetSmart app?
1 = Terrible; 5 = Wonderful
How many days would you use the Journal feature on a 10-day trip?
1 = 1 Day; 10 = 10 Days