Voice recognition technology works by matching a person's voice pattern to stored voice pattern data in order to identify them. There are two types of voice recognition: speaker verification and speaker identification. Speaker verification is, essentially, authentication: the user claims to be a person, and, to verify that claim, the computer compares that user's voiceprint, which is a digital representation of their voice, to a stored voiceprint. Speaker verification systems are generally text-dependent, i.e., the user must say a predetermined phrase or must read a phrase after being prompted. Speaker verification is usually used in conjunction with other types of authentication, and is therefore quite good for high-security use cases, such as bank accounts. Speaker verification systems are robust, and are incredibly difficult to trick using a recording. Speaker identification is identifying a user based on their voiceprint: the speaker's voiceprint is taken and compared to a database of known voiceprints. Speaker identification systems are text-independent, and can use sources that are either live or recorded. These types of systems are used more frequently by law enforcement for suspect identification and for monitoring of inmates in prisons (Markowitz).
Voice recognition technologies use unique aspects of a person's voice to generate a set of features, called a voiceprint, which is then matched against stored features to identify or authenticate the person (Markowitz). A person's voice is determined by several anatomical characteristics, such as "size and shape of the mouth, throat, nose, and teeth, which are called the articulators, and the size, shape, and tension of the vocal cords" (Myers). The movements that a person's muscles in the lips, tongue, and jaw make during vocalization also uniquely influence the person's voice (Myers).
To generate a voiceprint, a recording of a person's voice is taken and characteristics such as pitch, volume, and pronunciation of certain vowel and consonant sounds are identified using filters. Voiceprints can be compared in a text-dependent or a text-independent manner. Text-dependent systems require that the person say the same phrase that was originally recorded, which makes matching of voiceprints easier and faster. Matching voiceprints in a text-independent manner requires a larger sample, from which patterns are extracted and compared to patterns in the voiceprint (Myers).
Speaker verification is primarily used for authentication purposes for banks and for unlocking devices. In 2012, the National Australia Bank introduced voice biometrics in its call centers (Bender). Shortly afterward, Barclay's began testing voice biometrics in its call centers among its wealthy customers, and, in 2014, the program was expanded to include all Barclay's customers (Robertson, Rob Davies). Also in 2014, Google announced that its newest mobile operating system would allow users to unlock their phones with their voiceprints (Hachman).
Speaker identification is a more complicated and invasive technology. It is mostly used today by law enforcement. A company called SpeechPro announced in 2012 that it had invented a storage and search system for voiceprints, and that they were working with the United States federal government as well as other governments from around the world to integrate these systems into current law enforcement practices (Gallagher). Just recently, the FBI announced that its Next Generation Identification (NGI) system was fully operational, and, while the system does not yet contain voiceprints at the same scale that it contains other biometric information such as face prints, it has the capabilities to do so (Lennard).
From a utilitarian perspective, voice biometrics does an excellent job of upholding its goal of maintaining users' confidentiality and security. Speaker verification systems are incredibly difficult to "hack" using recordings, and can even be tuned to note when a user is under duress, making it very hard to for a malicious actor to bypass a speaker verification system (Markowitz). Speaker verification systems can also be used to prevent fraud in customer support centers, making it difficult for hackers to gain access to data using social engineering attacks (Litan). Therefore, from a utilitarian perspective, voice biometrics does an excellent job of maintaining data security.
Voice biometrics also does an excellent job of identifying a user uniquely. Voiceprints are unique, so, like with fingerprints, they can be used by law enforcement agencies to uniquely identify suspects in criminal cases (Myers). This unique identification is also good from a utilitarian perspective, because it allows law enforcement agencies to identify and prosecute criminals, preventing future crimes and reducing overall crime (Myers).
From a deontological perspective, voice recognition technology is dangerous. People's voices are public; people speak all the time, in public as well as private spaces. While voice biometrics can help with data security, there is also a risk that voice biometrics could be used to track a person, using publicly placed recording devices and speaker identification systems (Rengamani). According to deontology, a person has a right to privacy, and voice biometrics makes it incredibly easy for a malicious actor to violate that right (Alterman). Furthermore, a person's voice cannot be changed, so once a person's voiceprint is recorded by a malicious actor, the person can be tracked for the rest of their life (Alterman). This is dangerous with regards to something like free speech: if a person can always be traced back to their voice, they might never be able to take part in protests or speak freely about their opinions, because their voice could be traced back to them and any sensitive information associated with their voiceprint. The nightmare vision of voice biometrics is a 1984-like world where authorities hear every conversation and identify every voice. Therefore, from a deontological perspective, voice biometrics does a poor job of maintaining people's privacy and enabling free speech.
Voice biometrics should be used for speaker verification, but not for speaker identification. Speaker verification is a good thing because it is one of the easiest and most secure ways to protect people's data, especially when coupled with other forms of authentication. Banks and companies with access to sensitive information such as credit card numbers should be required to use speaker verification in their customer support centers to protect against social engineering attacks. Speaker identification, on the other hand, is a slippery slope and therefore should not be used. Speaker identification can be used for good when used by law enforcement agencies to identify criminals, but there's a thin line between identifying criminals and listening for criminal activity, which is a violation of privacy. People's speech should not be recorded by the government in public, and that speech should not be traced back to individual identities.