The Human Subtitles

Project 4

Created by: Haris Ahmad, Anish Pokhrel, Haider Tawfik

Design Goal

Augmenting In-person or Remote Communication and Collaboration
Create a novel interface to augment in-person and/or remote communication and collaboration. If the system is relevant to any type of communication, conversation, collaboration etc, then it should be fine.
You can use any types of technology for this project. For example, AR, VR, mobile phone, desktop screens, or combination of any of them.
Create a concept video based on a narrative storyboard, then implement and demonstrate the final system.

How does The Human Subtitles meet this goal?

Are you a student who has hard time hearing your instruction from far away at the back?

Perhaps you are travelling to another country and have a hard time understanding the local language?

Or maybe you have trouble understanding people due to a hearing impairment?

Well, worry no more, because the “The Human Subtitles” is here for you!

The Human Subtitles is a device you can wear that uses Augmented Reality (AR) and machine learning so you can see what others are saying right above their heads!

The device is worn like eye-glasses and it does the following:

Provides speech-to-text transcriptions of the person talking in front of you
Translates any foreign language to your preferred language
Transcriptions are placed directly over the head of the person who is speaking
Overall, enhances communication when auditory communication is difficult

Concept Video

Final Demo Video

Initial Concept Sketches

Our design process began with 10 concept sketches from each of the three group members (30 in total). We selected the 10 best concept sketches to display, which are all shown below.

Alongside each of the 10 concept sketches, there is a description of:

what the project or each sketch is,
how the designer came up with, and
why did the the group end up choosing it or not.

Concept #1: Friend Detector (Anish)

The idea behind this design is to use face recognition to detect your contacts such as families, friends, and acquaintances in group meetings or calls. Whenever you are in a group video call, the software would indicate which members you know by displaying their name and their social status (e.g. friend, family member) using your phone contacts. The face detection would be done using machine learning software that uses contact photos as models (with the permission of the user). However, we decided not to move forward with this idea as user’s would already know the person they are looking for; it is unnecessary for a complex machine learning software to do this simple job. Also, this idea would only work when cameras are turned on, thus suggesting that this idea has a shallow scope.

Concept #2: Interactive Whiteboard (Anish)

The idea behind this design is to use a whiteboard on your tablet device by hovering over the screen without ever having to touch the screen. It would work the same way as any note taking app, such as Microsoft’s OneNote, but without needing a pencil or touch. This allows users to take notes and draw without having to buy “pencils” for their tablets. A user can take notes by hovering up to 30cm over the tablet, allowing greater flexibility. However, we decided not to move forward with this idea as there would be a high risk of accidental inputs, such as accidentally hovering over your notes can activate the note taking mechanism. Preventing these cases would be difficult.

Concept #3: AI Idea Generator (Anish)

The idea behind this design is to use a large language based chatbot, such as ChatGPT, to generate ideas during a project group meeting. During a group meeting, a member would say something like “We need an idea for a VR project”, and ideas would be generated for every group member to see. With user permission, a machine learning software would listen to the conversations in the meeting, pick out specific remarks relating to the project, and finally generate ideas or solutions. After each group call terminates, the software would erase any information about the conversation so that any data would be deleted after the meeting ends, thus protecting privacy. However, we decided not to move forward with this idea as it would be too difficult to implement in the short timeline that we had.

Concept #4: Human Subtitles (Haider)

In this sketch, the user is at the back of the classroom and is having trouble hearing the professor, so the user's AR glasses displays above the professor's head what the professor is saying. The idea for this sketch comes from having trouble hearing what others are saying, by adding subtitles above the speaker's head, the user can see exactly who is talking and have a clear indicator of what they are saying in a real-world setting. We decided to proceed with this idea, as we thought there were many different ways we could expand upon this idea, also it seemed like something that could be very useful, especially for those who are hearing impaired.

Concept #5: Robot Suit (Haider)

In this sketch the user is on a video call with another person, however the user on call in has access to a robot suit that can freely move around as well as have hands to interact with the world without having to be physically present. This idea comes from try to solve the limitation of video calls which restrict users to only an image, if users could have also have a physical presence while being far away it would make collaboration much easier. We decided not to move forward with this concept as it would be far too complex to implement.

Concept #6: Board Game (Haider)

In this concept sketch two users have a board, a physical board in front of them and game pieces; however, neither user is using the same board, as they are in different locations. Once a user moves a piece on their board, the same piece will be moved for the other player as well on their board automatically. The idea behind this concept comes from many people enjoying the physical aspect of board games; however, they are unable to play with friends who are not in the same physical location. We decided not to move forward with this idea as it was not very unique.

Concept #7: Text Tracker (Haris)

In this concept sketch we have two people using a text reading application at once. This could be a book they are reading, a scripture, a newspaper, etc. One user recites the text and so that the other people can keep along easily, the text that is being read is automatically highlighted on everyone else’s device for easy tracking to make sure everyone is on the same page. However, we decided not to move forward with this idea as we felt it wouldn’t be too useful in the real world.

Concept #8: Display Resizer (Haris)

This idea aims to resize any screen to a desired size. This would be achieved by capturing the image on the screen and mimicking it on the user’s VR set (VR looks like AR glasses in this concept sketch). However, the VR set would only show what is on the screen and everything else would be a background of the user’s choice, meaning you could emulate yourself to be in any environment while watching whatever you are watching. On top of that, the user would then be able to resize their screen with a pinching mechanism or something similar. This idea we did not go forward with as it is quite complicated to implement/achieve.

Concept #9: Virtual Puzzle Board (Haris)

A lot of people enjoy puzzles but the activity is space consuming. For example, people usually can’t build puzzles on planes, buses, in line for coffee, etc. While there are puzzles on your phone, it is not quite the same and often the puzzle can’t be big enough since the area for selecting pieces and displaying the puzzle, both take up a lot of screen space.This idea aims to make puzzles playable anywhere by making the pieces on your phone correlate to a space nearby so that you build virtual puzzles anywhere (just need a flat surface that the puzzle you are playing would be displayed on). This way the puzzle can be a realistic size and you get the feel/enjoyment of building a puzzle in real life. While this idea has a market, we decided not to go forward with it as placing the pieces of the puzzle could get complicated and would be a hassle to implement given the time constraint for this assignment.

Concept #10: Virtual Try-On (Haris)

This concept sketch depicts a virtual try on app that is displayed to the user when they wear the VR headset. All they would have to do is scan the item they want to try on by showing it to their phone’s camera, and then the VR headset would display that article of clothing on the user as though they were actually wearing it (the app would already have a scanning of the person's body as well similar to Face ID recognition set up on most devices). While this idea would be very useful we decided not to implement it because once again it would be too complicated to implement given the resources we have at hand.

Final Concept Design

Out of the 10 design concepts, we decided to move forward with Concept #4: Human Subtitles (Haider) shown below. This is because it is a unique way to communicate, which meets the design goal.

Concept Video

Here is a quick, high-level stop-motion video for our idea.

Sketch variations

Once the final design concept (Concept #4: Human Subtitles (Haider) was chosen, the team proceeded to sketch detailed variations.

Below of this page is a total of 10 sketch variations, each with a description of:

what the project or each sketch is,
how the designer came up with, and
why did the the group end up choosing it or not.

Variation #1: Translation (Haider)

In this sketch when a user speaks another language such as Spanish it will automatically be detected as a different language and translated accordingly. The translated text will appear above the speaker's head with a note indicating that the text has been translated from another language. The idea comes from not wanting to constantly take out your phone and Google translate to figure out what another person is saying, instead translation can occur instantaneously making it easier to communicate with others despite a language barrier. We decided to proceed with this idea as it seemed to be a useful function.

Variation #2: Facial Expression Detector (Haider)

In this sketch, other people's emotions are predicted based on their facial expressions and displayed to the user what emotion the person is likely feeling above their head. The idea for this comes from it sometimes being hard to understand how another person is feeling, however, if you do know how someone else is feeling then you can adjust how you speak to them accordingly enhancing communication ability. We decided not to move forward with this due to it being hard to distinguish someone's emotions based on facial expression alone.

Variation #3: Text Documenter (Haider)

In this sketch while someone is talking their entire speech is being displayed, transcribed, and logged. Once the speaker is done talking the user is asked if they would like to save the transcribed text to a document such as Microsoft Word or Google Docs. The idea for this sketch comes from wanting to save important conversations such as saving what a Professor said during a lecture, or if someone asked you to do a task that you don’t want to forget. We decided not to move forward with this idea for ethical concerns, as possibly having all of your words recorded is not something that others likely would not like to happen without their consent.

Variation #4: Recommended Response (Haider)

In this sketch, the user is engaging in a conversation with someone, when asked a question the user is shown a variety of recommended responses. The idea for this sketch comes from sometimes struggling to find an appropriate response to a question promptly, by having recommended responses users can quickly use or modify one of them to keep the conversation flowing. We decided to not move forward with this idea as it is difficult to distinguish when the user might need recommended responses, and when it is just causing a nuisance by taking up screen space.

Variation #5: Food Tracker (Haris)

Sometimes it can be a hassle to have to pull up your nutrition app and have to type in the type of food to find out its nutritional value. This idea uses the basic concept to display the nutritional information of a food item when it is looked at by the user wearing the AR glasses. This would save time for the user by making the nutritional lookup process much faster and it would also help them identify any unknown food items quickly. However, we decided not to move forward with this idea as it would be difficult to acquire the nutritional values of all this different kind of food, let alone having to account for serving sizes and other factors.

Variation #6: Item Verifier (Haris)

The idea for this design is to have a display appear next to an item, letting the user know whether or not the item is real. This could be useful for pawn shop owners or jewelry sellers so that they can know whether or not the inventory they are being offered is real or not. However, we decided not to move forward with this idea since we have little to know way of verifying the legitimacy of certain items being real or fake with just a basic look.

Variation #7: Clothing Tracker (Haris)

This idea aims to tell the wearer of the AR glasses what the materials of the item being looked at are. As seen in the sketch, any article of clothing that the user has on automatically has bubbles next to it indicating what the article of clothing is composed of. However, we decided not to move forward with this idea since there isn’t much use as most clothes have a tag where you can find that information and this idea wouldn’t be of much use to anyone. Also, it would be hard to tell what materials a certain clothing piece is made of without being close up to it or without touching it.

Variation #8: Summarize with AR (Anish)

This sketch variation uses the eye glass concept to summarize readings or long passages for you. The user will put the glasses on and look at a long passage of text for which the device will scan. Then, the device will ask the user whether they want a summary, an explanation, or to save the text. These options are listed in front of the user in an augmented reality fashion. However, we decided not to move forward with this variation as it would be too difficult for us given our limited knowledge on creating mixed reality applications.

Variation #9: Magnifying Eye Glasses (Anish)

This sketch variation tries to solve the problem of not being able to see very distant objects. The user will put on a pair of glasses and is able to adjust the zoom and magnification of their view, preventing the need for large and expensive tools such as telescopes. For instance, if the user is trying to see trees on a mountain they can adjust the zoom to up to 20 times magnification. However, we decided not to move forward with this variation as large magnifications required heavy, large, and expensive lenses that are not yet possible to be put in a pair of glasses.

Variation #10: Voice Enhancer (Anish)

This sketch variation attempts to solve the problem of not being able to physically communicate over long distances or over loud background noises. Often up to large distances, communication is nearly impossible even with yelling. When worn as a simple necklace, this device amplifies your voice with a simple button press as you try to call for your friend in the far distance. However, we decided not to move forward with this variation as it would not be practical in everyday life to carry a small megaphone. Also, megaphones already exist that amplify your voice.

Final Design

For the final design, we decided implement the following variation:

Variation #1: Translation (Haider)

Variation #1: Translation (Haider)

Rotoscope Sketch

Here is a rotoscope sketch of the final design.

Project Contribution

The following are my contributions to this design project:

Concept Sketches
- Concept #1: Friend Detector
- Concept #2: Interactive Whiteboard
- Concept #3: AI Generator
Variation Sketches
- Variation #8: Summarize with AR
- Variation #9: Magnifying Eye Glasses
- Variation #10: Voice Enhancer
Contributions to Implementation:
- Built the face detection model using Teachable Machine
- Helped with UI design
- Group members Haider, Haris, and Anish contributed equally as much to the implementation

Final Demo Video

This video showcases the final design: The Human Subtitles

You wear it like eye-glasses, and see the the words of the person speaking in front of you appear above them!

🐻 View the application (view in desktop for best experience): Human Subtitles

💻 See the source code: Glitch

Page updated

Google Sites

Report abuse