CAPTCHAs

What is a CAPTCHA?

A CAPTCHA is a "Completely Automated Public Turing Test To Tell Computers and Humans Apart".

CAPTCHAS enable computer programs to determine if a person or a robot is requesting access to a service. Usually, but not always, it is desirable to block robots from accessing a program. CAPTCHAs are good at protecting systems from automated spam, voting, website registration, and brute-force password attacks.

Automated information or device discovery relies on robot access. If you want your website to be crawled by search engines, you will have to let in the robots.

Note: If Google's Director of Engineering Ray Kurzweil is correct about a coming AI singularity, at some time in the future computer programs may be giving humans CAPTCHAs.

At popular type of CAPTCHA displays alphanumeric characters that are randomly distorted and transformed out of shape. It is relatively easy for a person to read a CAPTCHA but difficult for computers to do the same.

Here is an example from Wikipedia of an easy-to-decipher visual CAPTCHA.

Wikipedia CAPTCHA: smwm

Problems With Visual CAPTCHAs

People who are blind, visually impaired, or have other disabilities might not be able to decipher visual CAPTCHAs.
(Audio CAPTCHAs can help solve this problem, but may not be satisfactory for people who are visually impaired and cannot hear, or who are intellectually disabled.}

Computer programs will eventually be able to decipher any type of CAPTCHA.

The Turing Test

The Turing test is designed for determining whether a computer has general human cognitive ability. A CAPTCHA is not, as the CAPTCHA acronym implies, a "Turing Test To Tell Computers and Humans Apart".

If a program can break a CAPTCHA it does not mean that it "passed the Turing test". I would call CAPTCHAs something more precise such as: "Automatic Robot Blocker" or an ARB.

Are CAPTCHAs Needed?

Most people consider CAPTCHAs to be an annoyance. The mental effort to solve a meaningless puzzle, deciphering horribly distorted and non-segmented letters, is both irritating and a waste of valuable time.

The natural question should be, Are there any better, user-friendly alternatives?

Here's an obvious alternative:

From any given IP address, limit the number of website registrations, login attempts, app rankings, restaurant ratings, etc. to five or some lower threshold per day. This should solve most CAPTCHA use cases. Comment spam might still require a CAPTCHA.

Time delays on requested actions or services could also reduce the need for CAPTCHAs in most use cases.

Access Filters, Identity Management, and CAPTCHAs

CAPTCHAs are designed to prevent robots from accessing services intended for humans. Blocking robots can be seen as a subset of 1) access filtering, and 2) identity management (IdM).

CAPTCHAs are not designed to ascertain a user's identity. A CAPTCHA is a specific type of access filter designed to block robots from accessing computer systems. CAPTCHAs are often used as part of identity management to prevent computer programs from filling out forms containing identity information. An effective IdM system, such as one that uses biometrics, can eliminate the need for CAPTCHAs.

Access Filters

My definition: Software that can automatically grant or restrict a user's access to services such as membership in an affinity group. Does not require identity verification though access filters can be used in conjunction with IdM systems.

Age restrictions. Users select or enter this information.

Capabilities: Users fill out forms with details about their expertise. A music lovers group can request details about certain types of music in order for a user to gain entrance.

Verifiable:

Investment groups. Users need to enter for 10 consecutive business days their best guess for the next day's close of the Dow Jones Industrial Average. If they guess close enough, according to pre-set mathematical thresholds, they can join the group.

Identity Management Systems

Software that can automatically authenticate the identity of a user to provide, or restrict access, to services such as an email account.

CAPTCHAs

Software that can automatically determine an agent requesting access to a system is either a human or a computer. Does not require identity verification.

Breaking CAPTCHAs

This section discusses some issues regarding breaking visual and audio CAPTCHAs.

Visual CAPTCHAs

Audio CAPTCHAs

Breaking audio CAPTCHAs that present to users a signal and noise can probably be considered a problem from the AI field of blind signal/source separation (BSS). Wikipedia lists techniques to implement BSS including these:

http://en.wikipedia.org/wiki/Blind_signal_separation

Principal components analysis
Singular value decomposition
Independent component analysis
Dependent component analysis
Non-negative matrix factorization
Low-complexity coding and decoding
Stationary subspace analysis
Common spatial pattern

Principal Components Analysis

R. Mutihac, Marc M. Van Hulle, Comparison Of Principal Component Analysis And Independent Component Analysis For Blind Source Separation, Romanian Reports in Physics, Volume 56, Number I, P. 20-32, 2004

"The performance of PCA singular value decomposition-based and stationary linear ICA in blind separation of artificially generated data out of linear mixtures was critically evaluated and compared. All our results outlined the superiority of ICA relative to PCA in faithfully retrieval of the original independent source components."

"ICA has emerged as a useful extension of PCA and developed in context with blind source separation (BSS) and digital signal processing (DSP)."

Singular Value Decomposition

This is a book chapter that examines the techniques of Principal Component Analysis (PCA) using Singular Value Decomposition (SVD), and Independent Component Analysis (ICA).

http://www.mit.edu/~gari/teaching/6.222j/ICASVDnotes.pdf

Independent Component Analysis

http://en.wikipedia.org/wiki/Independent_component_analysis

"Independent component analysis (ICA) is a computational method for separating a multivariate signal into additive subcomponents supposing the mutual statistical independence of the non-Gaussian source signals. It is a special case of blind source separation."

http://research.ics.aalto.fi/ica/book/intro.pdf

Independent component analysis (ICA) is a method for finding underlying factors or components from multivariate (multidimensional) statistical data. What distinguishes ICA from other methods is that it looks for components that are both statistically independent, and nongaussian.

Dependent Component Analysis

http://ojs.academypublisher.com/index.php/jcp/article/view/553

"Dependent Component Analysis (DCA) as an extension of Independent Component Analysis(ICA) for Blind Source Separation(BSS) has more applications than ICA and received more and more attentions during the last several years in the study of signal processing and neural networks."

Non-negative matrix factorization

http://en.wikipedia.org/wiki/Non-negative_matrix_factorization#Non-stationary_Speech_Denoising

"Speech denoising has been a long lasting problem in audio processing community. There exist lots of algorithms for denoising if the noise is stationary. For example, Wiener filter is suitable for additive Gaussian noise. However, if the noise is non-stationary, the classical denoising algorithms usually have poor performance because the statistical information of the non-stationary noise is difficult to estimate."

Page updated

Google Sites

Report abuse