Virtual mouse interface based on gesture detection: index finger indicates; the index and the middle finger coming closer indicates mouse click.
Sign-language detection interface based on Python using computer Vision and is able to detect Hello, Thank You, and Yes.
The discipline of human-computer interaction (HCI) is a subset of the broader field of human-factors engineering, which is essentially the application of information on the physical and psychological human characteristics (like cognition, motivation, personality, experience, and emotions) to the design of devices, products, management systems, and tasks for human use. HCI mainly deals with computers and how humans interact with them. HCI emerged as a field in the early 1980s with the increasing popularity of computers and to make them usable by everyone, irrespective of expertise.
With the digitization of products and the advent of artificial intelligence, HCI is ubiquitous in every domain, e.g., interacting with a computer (e.g., browsing), using mobile phones and tablets, making a card payment while shopping, using the microwave for food, in healthcare devices and treatments, etc. Earlier, Interaction was mainly about what people see on the screen (visual UI), but with AI it has evolved over the years. Besides visual UI, it includes the senses like touch (tactile UI) and sound (auditory UI). Examples are conversing with a voice assistant in natural language, and operating smartphones by touch. Researchers strive to find the optimal combination of these interactions that fits the purpose and context of the product.
According to Prof. David Joyner, Georgia Tech, HCI design has three principal areas: User Interface (UI) design, User Experience (UX) design, and Interaction Design.
During the early days of HCI, the focus was mainly on the user interface design and the invention of the light pen, and the mouse was primarily to ease the interaction with the things on-screen. Therefore, researchers developed principles of screen design that led to the grid view arrangement of contents on the screen to guide the user's eyes around the interfaces, helping interfaces to adapt to different screen sizes and many others.
HCI and UX design have a symbiotic relationship, i.e., understanding the principles of HCI can inform how we design the interface and the user experience; on the other hand, based on the user experience (and feedback) we learn more about HCI and re-iterate our designs. It is essentially a closed loop of understanding HCI, designing interfaces, and user experience. This field is often referred to as design-based research.
Designing a product to help people do some task needs a deeper understanding of human thinking and behavior. That is why human psychology has an indispensable role in HCI design, i.e., understanding human behavior, perception, and cognition can help us design better products; conversely, product feedback helps us understand psychology even better.
Thus, HCI is the heart of Research and Design, one informing the other in a symbiotic way. Typically, research involves need-finding, prototyping, and evaluation, whereas design involves psychological factors like distributed cognition, mental models, and universal design.
The ultimate goal of HCI design is to make things useable, irrespective of the user's expertise. Usefulness is when an interface allows the user to achieve a goal or complete a task with minimum effort. For example, a paper map and navigation system; A paper map is useful that allows people to see places and directions, and the user has to navigate manually by marking and constantly verifying the route with the map. Those using a paper map have to put in too much cognitive effort to navigate. However, a navigation system is useable because it offers the ease with which users can navigate with lesser cognitive effort.
The basis of HCI design is focusing on users and tasks and not on tools and interfaces. The interface design must be informed by the user behavior and the ease with which they can complete the task using the interface. The building blocks of HCI have the following main components: the Task and Goals, the User and the Context, and the Interface.
Understanding the task is the first building block of HCI design. The key idea is: Users use interfaces to accomplish some task. In the absence of the task, the interface design wouldn't have arisen in the first place, and the question of a user using an interface will be redundant. For example, if we want to write something (task/goal) on paper, we need a pencil/pen (interface). If the task (writing) was not present, then the interface (pencil) will have little or no use. Thus, we need to first understand what are the users' goals and what are they trying to accomplish through the task.
Tasks can be identified and better understood by observing and talking to users and understanding why, in a task process, a user performs an action (however small it may be) and what operators are the user using. This will help us gain insight into the task and understand the bigger picture of it.
The User is what we are concerned about and comprehending the role of the human in the overall system can unleash new aspects of HCI design. According to Prof. David Joyner, there are three possible views of the user: user as a processor, user as a predictor, and user as a participant.
Processor model: Defines the human brain as a sensory processor that can sense, touch, respond to stimuli, etc., i.e., taking inputs and providing outputs. While designing with this model in mind, we need to be cautious about the interface being within known human limits, such as users can touch all buttons, see all colors, and so on. In this model, we assess our interfaces quantitatively by measuring how quickly users complete a task or respond to a stimulus, etc. In other words, this model mainly focuses on the behaviorism of the user.
Predictor model: On the other hand, this model accounts for the knowledge, experience, expectations, and entire thought process of the user, i.e., cognitivism of the user. This model helps us understand and map the input and the respective output from the human brain. In other words, given the input, what thought process led to the choice of that particular output while performing a task? While designing with this model perspective, we must ensure that the interface fits within the human knowledge, and if not, the interface teaches the human how to use it. We can evaluate such interfaces qualitatively by understanding the user's thought process or by a cognitive walk-through while performing a task.
Participant model: Though we have established how behaviorism and cognitivism dictate the design of interfaces, we were solely focused on the user and the task. But, it is equally important to study the collective behavior of the user and the interface in a larger environment, i.e., what's happening around the user while interacting with the interface; who/what is the user parallelly interacting with; or who/what is competing for the user's attention while using the interface, etc. This model dictates that the interface must fit with the context. We must evaluate the interface in a contextual setup, e.g., if we are testing a new navigation system interface, we cannot test the user and the interface in a laboratory setup, instead, the testing must be done in the context of real drivers driving on real roads, to understand how actually the product is being used under those circumstances. The core of the participant model is the functionalism view of psychology.
That we have established the role of tasks and users in HCI design, now the question is: How hard is it to complete a task using the interface? This is when the third component of HCI design comes in, the graphical interface. An interface must facilitate the interaction and in no way become a barrier between the user and the task. We must strive to build an interface that is INVISIBLE BY DESIGN. The interaction with the interface has two stages: execution and evaluation.
Execution: This includes identifying the goal, the action needed to achieve the goal, and executing these actions on the interface. To allow users to perform an action, it is crucial that the user understands the interface, which is an inherent quality of good design. Thus, in order to make intuitive interfaces, we must ensure the following:
Discoverability: The functions are easily discoverable and self-explanatory. This is a huge challenge in gesture-based interaction design, e.g., the Search bar is commonly placed at the top of the interface irrespective of design.
Simplicity: Users should be able to understand the design regardless of their experience, knowledge, language skills, etc, e.g., Google webpage
Affordances: A well-designed product tells the user, by its very design, how it is meant to be used. But again, affordances are subjective and depend on the background of the user, e.g., a door handle has an affordance that tells us if it is to be pushed, pulled, or rotated.
Mapping: A good design must always be able to map real-world things to the interface, e.g., the icon of the recycle bin used by Microsoft looks like real trash can so that users can relate.
Consistency: Leveraging transfer learning and using conventions in design lets the user immediately know the use of the button. E.g., the terms Copy, Paste, Search, etc., are used across all digital products.
Flexibility: A good design must cater to the need of both novice and experienced users. Teaching new users and offering shortcuts to experienced users to improve productivity is a characteristic of a good design, e.g., hotkeys or accelerators like Ctrl+C and Ctrl+V for copying and pasting.
Ease and comfort: Design should be such that it can be used without fatigue, irrespective of approach, reach, manipulation, user's body size, postures, mobility, and so on. E.g., using a mobile phone while eating, standing, lying down, etc.
Structure: The overall architecture of the interface must be such that it follows the user's mental models. For example, grouping similar items and isolating dissimilar parts can conform to the user's mental model. Besides, the information architecture should be such that it presents a logical flow of steps to the users. Additionally, the visual hierarchy must conform to the user's mental model. E.g., ordering food in a food-delivery app: item selection, addition to cart, payment, and delivery is a logical flow of information conforming to the user's mental model.
Sense of control: Users, when using an interface, must feel that they are directly manipulating the interface to perform a task and not the other way around. For example, when we want to zoom a picture on a smartphone, we directly pinch the image and zoom in. Hence, this makes the user feel that they are directly interacting/engaging with the task (and not with the device), thereby reducing the distance between the user goals and the system by imparting lesser cognitive load on the user to understand the system rather than performing the task.
Evaluation: Once an action is performed, the user must immediately be made aware of the state of the system and whether the system recognized their actions or not. Not only that, the design should be such that it allows the user to interpret the state of the system and predict if that is what they wanted. This is called the bridge of evaluation. The feedback should be easily understandable by the user indicating the problem and suggesting a solution.
Tolerance to human errors: A good design ensures the possibility of recovery from states due to accidental errors and unintended actions without causing much trouble to the user. Interfaces can be made more resistant to human errors by either eliminating error-prone conditions or with a confirmation option before committing to that action, e.g., Undo, Redo options, "Save before closing" warnings, etc.
The above picture shows a screenshot of a popular interface: MS PowerPoint. The numbers indicate the principles of HCI design and their corresponding explanations. It is evident how religiously the principles of HCI have been followed to design this beautiful product.
Natural User Interface (NUI) is an intuitive interface that completely eliminates the need for any mechanical devices such as a keyboard, mouse, stylus, etc. Users can interact with these interfaces using their hands and other sensory organs like touch and voice.
Gesture recognition uses sensors to detect and understand human gestures and movements and convert them into actions. For example, mobile phones allow users to interact with the screen elements using touch, flipping the ringing phone to silence it; in the sign-language detection interface, the computer detects the gestures of the user through a camera and then predicts the meaning of the gesture; defining custom gestures to control the mouse virtually and mimic its functionalities; modern cars allow volume control of the music by moving a finger clockwise towards the music system. The Gesture-based design includes eye-tracking to understand the level of attention of the user, hand movement, and emotion tracking such as smile, sad, etc.
Gestures can be classified into three categories: navigational gestures to move through the product, action gestures to perform an action such as scrolling, and transform gestures to allow transformations of objects.
Unlike conventional interaction design, which has known signifiers and affordances, gesture design is relatively new with no such established conventions. Discoverability can be challenging for such designs. For example, in the gesture-based virtual mouse interface (above, video -1), the user needs to know the gestures before using the interface. Upon research, scientists have laid down some commonly used design principles that apply to gesture-based interaction design.
WIMP is Windows, Icons, Menus, and Pointers, and designers often rely on such designs for standard interaction designs. Replacing a mouse with a human finger (as in video-1) can be uncomfortable for the user in order to move around the interface to achieve a task. It is evident from the above video that sometimes the user was having difficulty using the interface and sometimes even slowed down in pointing to the right menu, button, etc. Designers must avoid applying touch-based techniques to gesture-based designs.
Making the user comfortable with gesture-based interfaces is the main priority of the designers. Since users will interact with their eyeball movements, and physical arm and fingers movement, we must ensure that the user doesn't tire while using these interfaces. Considering the human body ergonomics is crucial while designing because if a gesture is too uncomfortable for the user, the experience won't be great, and they will most likely abandon the product. Additionally, avoiding gestures that require a lot of physical work (except for games and exercises) can become annoying really quickly. The involvement of physical work hints that we must also account for session times in our design. Also, since these designs heavily rely on computer vision and deep learning models, the designers must ensure that the response time of the interface is short enough for the user to comfortably use it.
Choice of the gestures while designing such interfaces can be critical, as there is no conventional interface-gesture language. It is always good to use intuitive gestures borrowed from real-life examples and map them to the gesture interface. Suppose, in a hypothetical design, we can wave (goodbye gesture) to the computer or the mobile device to put it to sleep. In such a design, besides complying with the user's mental model, the user is able to use what they already know (no need of learning new gestures).
Even when interfaces are intuitive enough, the designers must educate the users on what's possible and what's not. This idea is borrowed from conventional design, where interfaces themselves educate the users on how to use them by using animations, showing visual cues that indicate the possibility of gesture recognition, etc.
Though this type of interaction has a wide range of possibilities for the abled, there will be situations where some gestures will not be possible for differently-abled users. So, designers must also have the provision for either conventional controls or other gesture-based controls.
HCI is the core of designing invisible interfaces, and leveraging these principles can drive the better design of products. Besides traditional interaction, gesture-based UI design can unleash fantastic opportunities and have the potential to pilot user experience and interaction with a computer to a whole new level. But, the design can be challenging depending on the users' physical abilities and other human factors.
HCI Courses – NYU, Georgia Tech
Product Design – NYU, Georgia Tech
Brave NUI World: Designing Natural User Interfaces for touch and gesture