2nd Runner up - Electronic Design Competition (EDC) - IEEE Sri Lanka Section
Placed in top 5 out of 150 teams - HackX 2021
The product is focused on helping visually impaired individuals to detect objects and colors and give a basic understanding about the surroundings. This is accomplished by capturing the images of the eyesight using a camera and using image processing techniques to detect the dominant color of the targeted area and by using machine learning techniques to identify objects. Raspberry Pi module was chosen as the main processing unit and other attachable components such as the camera module, control panel with buttons, were chosen as a part of a simple design in mind to ease the use of the device. Key features of the product :
1) Use of 3D sound which enables the user to get a perception of location
2) Scan mode, which generates a caption of explaining the image captured from the camera
3) Simple functionality which allows the user to operate the product easily
4) Ergonomic considered design which makes the handling of the device easy to all user
5) Operability upon request by the user allows power saving and higher durability of the product.
A. Technologies Used
• Python - Used as the code base and was chosen for the application considering the ease of use, support for many different packages and because of its overall versatile nature.
• Raspberry Pi - Raspberry Pi 3B model is used as the processor of this product. Image processing, sound generation is done by the Raspberry Pi and its Tensorflow support is another reason for this choice.
• Computer Vision - Used for the processing of the camera feedback and other visual inputs.
• Spatial Audio Generation - Once the colors are identified using the software, a predefined sound corresponding to the color will be played. This sound will be generated in a manner such that the sound intensity will be maximum in the region of the color. This will allow the user the to get a spatial perception of the color.
• Text-to-Speech Generation - When camera captured the image, the objects of the image is identified by the corresponding Neural Network and after identifying the object class, Text-to-Speech generation is used to generate the object class name as a sound output.
B. Modes of Operation
There are two modes of operation.
• Normal Mode Under normal mode operation, the image fed by the inbuilt camera is partitioned into a 3x3 grid. Each segment is processed separately to identify the dominant color inside each grid. The processing is done starting from the left to right from the top row and continue to the next two rows. Thereafter, a unique sound corresponding to the detected dominant color will be generated with the horizontal location information. These sounds are used to create the 3D sound effect to give a perception of location to the user.
• Scan Mode Scan mode has color scan & object scan modes. Under the object scan mode, the center area of the image from the camera feed is analyzed to generate a caption stating the objects in the image and their properties. In the color scan mode, the dominant color of the center area of the image is conveyed to the user via the generated sound.
C. Color Identification & 3D Audio Generation
The dominant color information is saved for each grid and its inner workings is described below. The color feedback about the respective area is conveyed to the user with the use of 3D sounds generated by the system, according to the input of the image. These 3D sound effects are predefined with each sound representing a different color. Usually, visually impaired individuals develop better sensory systems because they depend on other sensory information such as sound than the visual input. This is the basis of our sound effect system and when outputting the audio of the relevant color, 3D sound is used in order to provide a better and more user-friendly perception of the surrounding. To give better information, the image input is divided into a 3x3 matrix and the dominant color within each unit cell is identified and the sound corresponding to each cell is returned.
D. Object Identification
The object identification of the image captured is carried out under the scan mode. The object within the center most grid is identified using machine learning techniques and the object is notified to the user via audio (example: “dog”,”car”). If there are more than one object present, information about all those classes is passed on to the user. Apart from this, the dominant color of the center most grid is sounded allowing the user to get a better idea of the object.
E. Other Competitive Products
Caption generating devices using camera feeds are already available in the market. But, these products are typically in the range of $1800 - $6000 and are not affordable for middle class and low income people, whereas our product can be produced below $400.
Left Middle Right Spatial Sensation Through 3D Audio
Color - Sound Map Used for the Feedback System
Output for the Given Image From the Color Identification System