Goal #1: Simulation should help navigate the visually impaired.
The primary goal of this project should be to assist the visually impaired with efficiency and ease. With this simulation, we will be able to generate sound outputs that the wearer can utilize to be able to navigate through obstacle-rich environments with relatively high accuracy. To evaluate this metric, we will need to conduct various tests when the design is complete. Such a test could be done by having a user navigate through a simulated environment that we build beforehand, and we can record the number of obstacles they bump in, as well as the time it takes for them to successfully navigate through the entire simulation. This could be repeated for multiple trials, with different users for a more complete understanding of the simulation’s efficacy.
Goal #2: Simulation should enable to walk faster safely
The secondary goal of this project is to enable the visually impaired to navigate around their environments faster in order for them to be able to reach their destination in a timely manner. This goal would be quantified by timing the average speed of operation by someone who’s vision has been impaired along an obstacle route with the aid of the device in comparison to their speed without the device, and finally that person’s speed without a vision impairment. As mentioned earlier, we could test this by timing the user’s navigational speed throughout the simulation. The faster their time, the more efficient the simulation is at satisfying this specific goal.
Goal #3: Audio outputs are intuitive to the user; easy to learn
Naturally, there would be a learning curve of some degree when people are introduced to new tools. Ideally, the sound generation algorithm is not super hard to use, and relatively easy to understand. Having taken this into consideration, we created the continuous sound wave with altering pitch in order to generate the simplest sound possible while retaining the information that we want the user to receive. To test this, we can have the subjects conduct trials of the simulation and monitor their ability to collide with fewer obstacles as the number of trials increases, which signifies improvement and mastery over interpreting the audio outputs.
This section describes everything about the software that is used to compile and run the software.
PyCharm Community Edition
Pygame
PysineWave
Windows OS Desktop or Laptop (or device that can compile and execute Python)
External Mouse (or equivalent input device)
External Keyboard (or equivalent input device)
Stereo Audio System (likely headphones)
The first subproject focuses on building a simulated environment entirely in Python, using the library called Pygame. Pygame is a free set of Python modules that is designed for writing video games; it comes with a variety of functions that can be extremely useful in the process of designing a video game. Our simulation isn’t designed as a video game, but it was built with similar principles in mind. The simulation begins immediately after the program is run, and the user is dropped into a simulated hallway with numerous sets of walls. An image of the simulation is shown below in Figure 1.
Since the target audience for the project is those who are visually impaired in some way, the graphics of the floor, walls, and sky won’t won’t affect the functionality of the simulation. Thus, the graphics that are in the simulation are used purely for testing purposes, so that those watching the user navigate through the simulation can see how they are performing. The choice for the brick wall texture in particular was so we can clearly see where each wall segment is clearly connected to. If the graphics were far more generic, such as a gray surface, it may be challenging to visually distinguish whether or not there are objects in the far distance.
The simulation’s map could be changed very easily. In the Python code, the map itself is denoted as a “matrix”, not in the traditional programming sense but as a two-dimensional space of blanks and 1’s. Every blank represents an open space that the user can freely walk through in the map, and every 1 represents a brick wall. Since the map is actually two-dimensional, it is then converted to a pseudo-three-dimensional space via calculating the 3-D projection height, and then drawing them out in 3-D with Pygame functions. An example of the code for the map can be seen in Figure 2.
The simulation is easily controlled with keyboard and mouse inputs from the user. The keyboard keys “W”, “A”, “S”, and “D” control movement, with the keys representing up, left, down, and right respectively. The mouse functions as the player’s head, which can be turned left and right. Effectively, the user is able to walk and navigate in this simulated environment with these controls.
Figure 1: The simulation after the user loads in.
Figure 2. A two-dimensional map, with 1 counting as walls and blanks (_) counting as open space.
Developing a sound algorithm is an important part of our project, hence it is the second sub-project that we have worked on. Essentially, the sound engine will output continuous, multi-channeled sounds that vary in pitch when the user approaches a wall. As they get closer to the wall, the pitch of the continuous sound will rise, and vice versa.
In order for the sound engine to work, the first step was to implement a rudimentary form of raycasting. Raycasting, in programming, is the act of sending out a vector (called a ray) from a specific point, to a specific direction. In the case of our simulation, a large number of rays are specified to come from the character’s center, and fan out in a wide angle in front of the character. An example of this can be observed in Figure 3. The rays are ultimately what gives the user a field of view, which is everything that is displayed on the simulation at any given time (refer to Figure 1 for one specific field of view). In the simulation, the players won’t be able to see these rays, as they are rendered invisible. However, two rays were selected to act as the user’s simulated LiDAR sensors. LiDAR stands for “light detecting and ranging”, which are sensors that measure distances between the sensor and an object that it “sees” via a pulsed laser. These two rays that are selected do exactly that – but in a simulated sense. They act as the user’s LiDARs by constantly generating distance data that it acquires as the user moves throughout the simulation. This process can be visualized in Figure 4. Note that there is a left LiDAR-ray and a right LiDAR-ray to simulate an actual LiDAR sensor.
The distance data that the two rays generate are fed into the sound engine itself. The sound engine, as soon as the simulation starts, uses Pysinewave to play an extremely low frequency sound. It is continuous and can be heard in both ears, but the low frequency nature of the sound renders it difficult to hear so that it wouldn’t risk overloading the user’s auditory senses. The sound engine begins to sort the distance data from the left and right LiDAR-rays into specific “bins”, or categories of distance values. After the distances are appropriately sorted, the sound engine utilizes Pysinewave to alter the frequency of the sound, based on the distance category that it is in. Depending on the bin and the LiDAR-ray that is sending the data, the user will hear the sound’s frequency change in either their left or their right ear. If the object appears to be very close from the user’s left side, then they will hear the left-hand sound grow gradually in frequency as they approach the object.
The choice of the sound cues are based entirely on intuition; higher frequencies were chosen for when an object gets very close to the user because a higher frequency noise is more likely to be noticed, and it signifies a warning more so than a lower frequency noise. The distance bins are also not on a linear scale, but rather a logarithmic one, which means that the distance bins aren’t all equivalent in range to each other. This choice was made in our design because the user would not necessarily care about objects that are extremely far away from them, but they would want to have more information about the objects that are closer to them. Thus, the bins that are responsible for smaller distance values are also smaller in range, so that the sound engine can adjust the frequency more precisely for the user to understand.
Figure 3: Raycasting in a 2D perspective. The blue dot is the user’s position and the orange lines are the rays.
Figure 4: Raycasting in a 2D perspective, with two rays selected as the simulated LiDARs. These LiDAR rays are the black rays, while the normal rays are the yellow rays.
The first test we will conduct is the intuitiveness of the sound engine. For the simulation, we aimed to make the sound outputs as intuitive as possible as to reduce the learning curve required to navigate using them. To do this, we will ask the user to determine the closest obstacle to them as soon as they load into a map designed specifically for this test. The test conductor will select a random location in the simulation and record the time required for the user to correctly guess where the closest obstacle is. Having a wrong guess will simply let the timer continue running, and a higher time means the audio cue is less intuitive for the user to determine where the obstacle is. The test conductor will also record the total number of guesses needed until the correct obstacle is pointed out. Each user will be asked to participate in three trials.
There will be a control for each trial, and then averaged. In this scenario, the same process will be repeated, but the user will not have sound to guide them towards the nearest obstacle, which essentially makes getting the answer correct entirely chance-based.
For our simulation to be worthwhile, it is vital for users to quickly and accurately determine where the closest obstacle is, given a number of obstacles that are present. We would expect that users, with the assistance of the continuous sound engine, can predict with confidence where the closest obstacle is. We would also expect that the user would be much less accurate and take much longer to figure out which obstacle is the closest in the absence of such a sound engine. Figure 5 displays the average number of attempts it took for each user to find the closest obstacle over three different trials. We can see that without the sound engine, the user is essentially guessing where the closest obstacle is, which will result in a very high amount of attempts needed to land on the correct answer. However, with the continuous sound engine in place, users only needed on average 1.5 attempts to determine the closest obstacle.
Similar results can be seen in the time it took for users to determine the closest obstacle in Figure 6. Without the assistance of continuous sound, users took much longer to determine where the closest obstacle was, whereas the use of the sound engine shortened that time by over 50% on average.
Figure 5. Average number of attempts it took for users to determine the closest obstacle in the Audio Intuitiveness Test. We can see that in the control trials, the users took many more attempts to determine the obstacle due to the absence of a sound engine. Despite the lower sample size, we can see that the sound engine greatly decreased the number of attempts needed.
Figure 6. Average time per attempt for the Audio Intuitiveness Test. We can see that the control trials took much longer on average than the normal trials.
The second test we will conduct is the simulated hallway trials, which will be used to evaluate the sound engine and simulation as a whole. Users will attempt to navigate through a series of three different “hallway” maps, which will be made beforehand. The number of obstacles that they bump into as well as the time it takes for them to navigate through the simulated hallway will be recorded. With these two parameters, it will be possible to evaluate how well the simulation itself is working. However, poor results in this section can also largely be affected by factors such as personal skill, experience with the simulation, and possibly some luck.
In order to show that the sound generation provided by the simulation has a positive effect on the navigational abilities of a visually impaired individual, we will compare the results to a control. The control trial will have the user walk through the hallway, with the sound generation algorithm disabled. In order to simulate the sensation of hitting an obstacle, a high frequency sound will only be generated if the user bumps into a wall, so they will not know that they are approaching one.
The purpose of these trials is to examine more holistically whether the sound engine improves the overall speed of the user, as seen while they attempt to navigate through a randomized obstacle course. Our expectation is that the user will be able to traverse through the landscape in a quicker and safer fashion when aided by the audio output from the sound engine. Figure 7 shows the average number of collisions recorded whenever the user made contact with any of the obstacles or walls over the course of multiple trials, without the sound engine for Control and then with it. It can be seen that without the sound engine, the user bumps into obstacles very frequently, and what we noticed more specifically was a tendency to use the walls for reference by walking next to or against them. Fortunately, the introduction of the sound engine reduced this collision count to 0 for every trial we conducted.
Additionally, it can be noted from Figure 8 that the average time per trial decreased on average by around 15 seconds when compared to the control, which is a desirable outcome. This can be attributed to the fact that the high-pitched sound associated with an object being directly in front of the user is unpleasant, which warns the user of obstacles very responsively.
Figure 7. Average number of collisions per trial in the Simulated Hallway Test. The only tangible statistic is that without the sound engine, the user collided with around 10 obstacles on average. To reiterate, the sound engine reduced that figure to 0 across every trial we conducted.
Figure 8. Average time per attempt for the Simulated Hallway Test. It can be noted that the average time with the use of the sound engine was considerably lower than the control trials.
Based on the results, we can conclude that the sound engine generates audio that is intuitive enough for users to determine where the closest obstacle is at any given moment, with a high degree of accuracy and speed. Ideally, it would only take one attempt on average to determine where the closest obstacle is, but the ambiguousness of these situations would usually occur when two or more obstacles were at similar distances compared to one another. This would effectively cause the outputted audio to be put in similar bins, thus sounding the same. This effect is exaggerated when the closest obstacles aren’t very close to the user, so the bins that they are sorted into are even more similar due to the use of the logarithmic binning.
For the average time of each trial, the control trials on average are longer, but one thing to note is that the users would guess much more frequently when they are taking random guesses. This results in more guesses per second compared to the trials in which users were using the sound engine. Despite this, the control trials still required the users to take more time to correctly guess where the closest obstacle was. This shows that the sound engine not only improved the time needed, but also the quality of each guess.
For this test (and the other test), we had an incredibly small sample size of two, with three trials for each user. Having a bigger sample size would definitely make our results more conclusive. However, this test does show a trend that we find promising.