With the first year project under my belt, i knew what to expect when it came to a university project. I knew the project would increase in difficulty, however with last years project strengthening my programming, manufacturing and team working skills I believed that I could create a functional product in second year. The challenge for second year’s project was to create a program that incorporates an audio or visual signal. The program should use signal processing to achieve a new and useful function that can be run on a Raspberry Pi 3 with a webcam being optional. Given that the challenge this year was focused on software, sadly there was no physical product with the hardware aspect of any project being my favorite. However this would allow me to grow my programming skills, more so than any other project to date, which will be invaluable moving forward. To complete this project each team was given a raspberry pi kit and a Logitech Webcam as listed above. In the lab, a mouse and keyboard were provided as none were provided in the raspberry pi kit. With the components gathered, it was now required of the team to identify a project.
Given the complexity of signal processing, the project was going to be a challenge no matter the approach. However, while difficult, the project did present a worthwhile challenge given the relevance signal processing has in modern day life. Audio and video processing are used in a wide range of applications such as medical alert systems, facial recognition technology, and autonomous robotics, just to name a few instances. Therefore, what can be learnt from taking part in this project is significant in real life software engineering. To figure out what form the project would take, the team started by brainstorming potential ideas that dealt with a real world issue that a signal processing based product could solve. There were several topics up for discussion regarding how the project would take form. Would this software process an audio signal or a visual image? What would be a realistic project goal that is also achievable, while being up to a high standard in terms of function and design?
The main decision that had to be made when creating this project was whether the team should focus on visual signal processing or audio processing. The team agreed that there was a larger range of project topics when using a visual signal as opposed to an audio signal. However, even with this knowledge, the idea of audio processing always seemed an interesting one. Knowing this, a decision was made regarding the project topic. It was decided that the project would contain both a visual and audio element instead of picking one or the other. While it was no doubt going to be a challenge, the team strove to pick a project that fit this new criterion whilst being realistically achievable, serves a useful purpose, and displays a high level of understanding of the technology being used.
After much research and discussion, the team had come up with numerous ideas that could both fulfill the project requirements and was related to our collective areas of interest. It was eventually decided that the project would be a multi-factor authentication system. This project solution was an accumulation of all the ideas in which the team wanted to incorporate into the project. This multi factor authentication system would include both visual and audio element of authentication.
But why base the project on an authentication system? With the ever-growing presence of the internet in modern day life, by proxy the amount personal data available of the web increases each year. It is up to authentication systems to keep this data in the hands of its owner and out of the grasps of hackers who want to exploit this data. It is apparent from recent international cyber attacks and information leaks, that cyber crime is ever present and ever growing. Given the best form of protection is prevention, it was vitally important that competent, user friendly multi-factor authentication systems are available to everyone.
With the purpose of the project decided upon, a competent audio/visual authentication method had to be chosen. This project would have to incorporate both these authentication methods independently to increase security.
For the visual authentication method, the team choose to create a QR scanner that could read a QR code from a live video feed. Firstly, QR codes are very popular in modern society, being used to advertise products and services, or generally just to hold and spread information. Given their popularity, it wouldn’t be hard for the public to understand (given their simple point and scan mechanic) and accept them as a method of security. A QR scanner can also be replicated through code that is at the team’s current skill level. Overall the QR scanner is a perfect example of visual authentication that is both stimulating to the software programmer, achievable in a relatively short period of time, and adaptable into a user’s day to day life.
For the audio aspect of the multifactor authentication system, voice recognition was chosen. This would be used in part with the QR Scanner. This was the obvious choice, given the heightened security involved in biometric security system in which voice recognition is part of. For this method, the user will be prompted to say their password. Once said, the program with analyse this audio sample and compare it to the users pre-recorded audio sample. If these samples satisfy the correct confidence level, then access will be granted to the user. It is understood that voice recognition will likely prove to be this project’s most challenging aspect; however, it is necessary for a high security multifactor authentication system, and an educational challenge. Therefore, it must be included. Given its need for advanced signal processing, voice recognition will be the main method of security for the project, thus the teams main focus moving forward.
With both audio and visual security aspects chosen, the authentication system will hence function in the following way. Each user will be given a unique QR code, this unique code will be scanned by our custom QR scanner. Once this QR code is accepted, the user will be prompted to say their password into the microphone to verify their identity. If both authentication attempts are successful, the user will be allowed to access the system.
To start the project the team chose to focus on building the QR scanner first, this was chosen as it would be less difficult to program than the voice recognition system, and this lower level of skill required would help us to learn for the more difficult voice recognition software. Once, the QR scanner is working correctly the team will then shift focus onto the voice recognition. Given the complexity of voice recognition, the team knew this aspect of the project would take the bulk of the available lab time, therefore it was key to complete the QR scanner quickly. Below are the major advancements made in the implementation of the QR scanner.
• Within the first week, the team had downloaded all necessary files and researched many examples of sample code online related to connecting the webcam to the raspberry pi and displaying the webcam’s live video stream to the monitor. The team used the OpenCV library.
• Next, we had to develop the software to recognise a QR code that is shown to the webcam. The teams first instinct was to try to isolate the colours in the images received from the webcam. With help from code sourced from programming forums, we developed code which distinguishes and isolates different ranges of colours and display only the elements that were in a specific colour range.
• After weeks of development, the team finally made a major breakthrough in terms of reading a QR code. It was discovered that our initial idea, while useful in learning how to develop Python code on the Raspberry Pi, was not necessary. The team developed a QR reader (with help from online sources), which works by having a manual QR code fed into the code, this QR code is converted into a text file which in turn is read by the software. Thus, the URL can be extracted.
• The team developed this code further. Using the webcam, the user could now press a key command such as ‘q’ to take a picture. Once this picture is taken, it is fed into the software and read by the code. This results in the same output as the above step.
• Next, the software was developed to open the URL in a browser. This works by using a substring that takes the URL and searches for it on the Raspberry Pi’s default browser. The URL is extracted from any given QR code as detailed in the previous points.
• Putting all the developed code together, the team got the webcam to recognise a QR code that is presented to it and, if it contains a resolvable URL, navigate to that URL. Every 500 milliseconds the code takes a picture through the webcam. This picture is then fed into the software. If the code can recognise a QR code in the image it will extract the URL and search for it on the Pi browser. If not, the code will keep repeating the process until a QR code is recognised. Thus, a QR code can be scanned and read from a live video stream through the webcam. Below are the resources that help us develop the QR scanner
Although most of the scanner’s functionality came from python libraries. This does not mean we learnt nothing from its implementation. The team learned how to use python, which also helped us to become comfortable with the pi environment. This knowledge was invaluable when programming the voice recognition software. While the QR scanner was completely necessary for the multi factor authentication system, voice recognition was chosen as the main method of authentication, and by proxy the teams focus, mainly given voice recognitions dependency on complex signal processing.
After the initial success of the QR scanner, the team began to implement the audio processing aspect of the project.
The first instinct that the group had was to incorporate one of the speech recognition APIs supported by the PyAudio library, a choice that was later found to be an incorrect approach. Without this information however, the team began research into different speech recognition APIs. The PyAudio library supports speech recognition APIs from several reputable sources, including IBM’s speech to text API and Microsoft’s voice recognition API built for Bing. After attempting to implement some of these APIs, the team found success with the Google Cloud Speech API.
The Google Cloud Speech API performs speech recognition with three different implementations: synchronous recognition, where an audio file less than a minute in duration is sent to the API and speech recognition is conducted on the file as a whole; asynchronous recognition, where an audio file up to 480 minutes is sent to the API and speech recognition is conducted at set intervals; and streaming recognition, where the API is given a live stream of audio and provides real-time speech recognition as the audio is being captured. As the goal was to identify a short spoken password, the team chose to implement synchronous recognition using this API.
After the first presentation, the team began research into audio recognition without using external APIs. The first objective the team tackled was isolating the required speech clip from a wav file. This objective was successfully completed by iterating through the audio sample, calculating the average sound intensity over a period of several milliseconds, and identifying the start and end points of the speech clip from an increase and subsequent decrease in average sound intensity.
Once this was completed, it was then a simple change to isolate the individual syllables in the spoken password. From this point, the team had planned to conduct Fourier transforms on each syllable, find the inner product of each syllable alongside a series of tones of different frequencies, then compare the resulting values to a set of syllable templates to determine which syllables, and subsequently what password, was being spoken. Unfortunately, the team had dedicated too much time to the previous API based techniques, and there was not enough time to both conduct the necessary research and implement the new audio recognition technique.
No project is without its problems, and this multifactor authentication project is no different. Given the fact that python was a completely new programming language to the whole team, it took a while to learn the correct syntax and formatting when programming the Raspberry Pi. When coding the QR scanner there was a problem connecting the Pi to the webcam, the task of displaying the livestream didn’t prove overly difficult. However, after this the code got increasingly more difficult. The programming of the voice recognition system was the most difficult aspect of the project.
Originally the team planned on accessing a web browser and entering data into text fields using python, but due to the obstacles we faced in its implementation this functionality was abandoned. While it was possible to open a web browser using a python script, the installed web browsers on the Raspberry Pi, i.e. Chromium and Firefox, caused great difficulty. The team attempted to use webdrivers to access the text fields in the browsers, but a compatible webdriver for Chromium could not be found and Firefox proved to be unacceptably slow to operate as the team had hoped.
While implementing the Google Cloud Speech API was done with little difficulty, the team’s attempt to compare audio samples by other means proved impossible due to the length of time spent on the other implementations of the project. The team completed research into audio recognition techniques and made some advances in code, such as isolating the individual syllables in an audio sample and generating and displaying a Fourier Transform of an audio sample, but due to the length of time spent pursuing a different implementation the team could not complete this task by the end of the module.
Every major advancement in the code took hours of research into online tutorials and involved multiple alterations in different iterations to achieve the desired results. However, these problems helped the team to grow, and communication was key to solve most of these problems. While it was difficult and at times impossible to take every member’s thoughts and opinions on board, compromise was critical, not only to the stability of the team, but also in the development of the software.
The team’s task was as follows. To create a multifactor authentication system that incorporates both a QR scanner and voice recognition software using a raspberry pi as well as a webcam. This scanner would have to recognise, and extract information contained in a generated QR code that was given to each user. Once this code has been accepted by the scanner, the user will be prompted to say their codeword. If this speakers voice is recognised to be the owner of the code that was scanned, this user will be authenticated and allowed into the system.
An incorrect approach was taken by the team towards the task in hand, due to lack of research on the topic of voice recognition at early stages of the project. Using the python language for the project required extra time and skill. Some problems were encountered while using libraries. There were different versions of python put together in the code and this caused many problems. Thus, another approach/library had to be used. This happened when using the SciPy library, a library for scientific computing, to calculate the Fast Fourier Transform of a signal, which caused delays when trying to fix it. The QR scanner was built at the beginning of the project development which can efficiently open scan a QR code and extract the information stored. Although the team built a fully functioning QR scanner, the objective of the project was not followed. No signal processing method was developed for this QR scanner. Libraries were used which performed the processing themselves. The finished QR Scanner took 6 weeks to build, and by then the building of the voice recognition took place. What the team learnt was that good planning and continuous reference to the objective of the project is needed to build a successful signal processing system. While we got the QR scanner to work, the voice recognition was troublesome, believing we could get voice recognition scanner was ambitious and borderline naive within the time frame allocated, however we wanted to push our selves and we believed the speaker recognition was the best way to do this.
Taking into account, all the reports and vlogs (given that each report was sent of a different computer I am only in possession of the first video log), I got result of 55% overall. By improving on last years result, with a more difficult project this time around i am quite content. Looking back the voice recognition was most likely out of our depth . While this section was heavily influenced by the teams report, many parts of the report were cut out or altered. This was to allow the viewer to get a understanding of the project, whist displaying how i learnt and grew by partaking in it, without being swamped with background information. Given that there was no physical project, media is limited. For the full unaltered report, it is linked below.