researchareas
Research Areas
This page provides you the related information about my research interests.
1. Character Recognition
Character recognition techniques associate a symbolic identity with the image of character. Character recognition is commonly referred to as optical character recognition (OCR), as it deals with the recognition of optically processed characters. The modern version of OCR appeared in the middle of the 1940’s with the development of the digital computers. OCR machines have been commercially available since the middle of the 1950’s. Today OCR-systems are available both as hardware devices and software packages, and a few thousand systems are sold every week.
In a typical OCR systems input characters are digitized by an optical scanner. Each character is then located and segmented, and the resulting character image is fed into a preprocessor for noise reduction and normalization. Certain characteristics are the extracted from the character for classification. The feature extraction is critical and many different techniques exist, each having its strengths and weaknesses. After classification the identified characters are grouped to reconstruct the original symbol strings, and context may then be applied to detect and correct errors.
Optical character recognition has many different practical applications. The main areas where OCR has been of importance are text entry (office automation), data entry (banking environment) and process automation (mail sorting).
The present state of the art in OCR has moved from primitive schemes for limited character sets, to the application of more sophisticated techniques for omnifont and handprint recognition. The main problems in OCR usually lie in the segmentation of degraded symbols which are joined or fragmented. Generally, the accuracy of an OCR system is directly dependent upon the quality of the input document. Three figures are used in ratings of OCR systems; correct classification rate, rejection rate and error rate. The performance should be rated from the systems error rate, as these errors go by undetected by the system and must be manually located for correction.
In spite of the great number of algorithms that have been developed for character recognition, the problem is not yet solved satisfactory, especially not in the cases when there are no strict limitations on the handwriting or quality of print. Up to now, no recognition algorithm may compete with man in quality. However, as the OCR machine is able to read much faster, it is still attractive.
In the future the area of recognition of constrained print is expected to decrease. Emphasis will then be on the recognition of unconstrained writing, like omnifont and handwriting. This is a challenge which requires improved recognition techniques. The potential for OCR algorithms seems to lie in the combination of different methods and the use of techniques that are able to utilize context to a much larger extent than current methodologies. May be exchanged electronically or printed in a more computer readable form, for instance barcodes.
The applications for future OCR-systems lie in the recognition of documents where control over the production process is impossible. This may be material where the recipient is cut off from an electronic version and has no control of the production process or older material which at production time could not be generated electronically. This means that future OCR-systems intended for reading printed text must be omnifont.
Another important area for OCR is the recognition of manually produced documents.
Within postal applications for instance, OCR must focus on reading of addresses on mail produced by people without access to computer technology. Already, it is not unusual for companies etc., with access to computer technology to mark mail with barcodes. The relative importance of handwritten text recognition is therefore expected to increase.
[Source]: Line Eikvil, "Optical Character Recognition", available at: “citeseer.ist.psu.edu/142042.html".
2. Speech Recognition
Speech recognition is the process of converting an acoustic signal, captured by a microphone or a telephone, to a set of words. The recognized words can be the final results, as for applications such as commands & control, data entry, and document preparation. Research in this area has attracted a great deal of attention over the past five decades where several technologies are applied and the efforts were made to increase the performance up to marketplace standard so that the users will have the benefit in a variety of ways. During this long research period several key technologies were applied where the combination of hidden Markov Model (HMM) and the stochastic language model produces high performance.
To convert speech to on-screen text or a computer command, a computer has to go through several complex steps. When we speak, we create vibrations in the air. The analog-to-digital converter (ADC) translates this analog wave into digital data that the computer can understand. To do this, it samples, or digitizes, the sound by taking precise measurements of the wave at frequent intervals. The system filters the digitized sound to remove unwanted noise, and sometimes to separate it into different bands of frequency (frequency is the wavelength of the sound waves, heard by humans as differences in pitch). It also normalizes the sound, or adjusts it to a constant volume level. It may also have to be temporally aligned. In addition with these tasks speech end point detection is necessary in order to extract valid speech data from the spoken signal. These tasks are called preprocessing of speech signal. The next tasks are Feature Extraction and Recognition. Significant amount of research work is already done and also continuing in these areas using variety of different approaches.
The area of Automatic Speech Recognition (ASR) is classified into Isolated speech recognition (ISR) and Continuous speech recognition (CSR). An isolated-word speech recognition system requires that the speaker pause briefly between words, whereas a continuous speech recognition system does not. For Isolated word the assumption is that the speech to be recognized comprised a single word or phase and to be recognized as complete entity with no explicit knowledge or regard for the phonetic content of the word or phase. Hence, for a vocabulary of V words (or phases), the recognition algorithm consisted of matching the measured sequence of spectral vectors of the unknown spoken input against each of the set of spectral patterns for V words and selecting the pattern whose accumulated time aligned spectral distance was smallest as the recognized word. The notion of isolated speech recognition can be extended for connected speech recognition if we consider a small vocabulary and solve the co-articulation problem that arises between words. In continuous speech recognition, continuously uttered sentences are recognized. The standard approach continuous speech recognition is to assume a simple probabilistic model of speech production whereby a specified word sequence, W, produce an acoustic observation sequence, so that the decoded string has the maximum a posteriori probability. In continuous speech recognition it is very important to use sophisticated linguistic knowledge. The most appropriate units for enabling recognition success depend on the type of recognition and on the size of the vocabulary. Various units of reference templates/models from phonemes to words have been studied. When words are used as units, word recognition can be expected to be highly accurate; however it requires larger memory and more computation. Using phonemes as units does not greatly increase memory size requirements and also computation.
Some speech recognition systems require speaker enrollment; a user must provide samples of his or her speech before using them, whereas other systems are said to be speaker-independent, in that no enrollment is necessary. Some of the other parameters depend on the specific task. Recognition is generally more difficult when vocabularies are large or have many similar-sounding words. When speech is produced in a sequence of words, language models or artificial grammars are used to restrict the combination of words.
3.Document Image Analysis
Document analysis or more precisely, document image analysis, is the process that performs the overall interpretation of document images. This process is the answer to the question, ``How is everything that is known about language, document formatting, image processing and character recognition combined in order to deal with a particular application?'' Thus document analysis is concerned with the global issues involved in recognition of written language in images. It adds to OCR a superstructure that establishes the organization of the document and applies outside knowledge in interpreting it.
The process of determining document structure may be viewed as guided by a model, explicit or implicit, of the class of documents of interest. The model describes the physical appearance and the relationships between the entities that make up the document. OCR is often at the final level of this process, i.e., it provides a final encoding of the symbols contained in a logical entity such as paragraph or table, once the latter has been isolated by other stages. However, it is important to realize that OCR can also participate in determining document layout. For example, as part of the process of extracting a newspaper article the system may have to recognize the character string, continued on page 5, at the bottom of a page image, in order to locate the entire text.
In practice then, a document analysis system performs the basic tasks of image segmentation, layout understanding, symbol recognition and application of contextual rules in an integrated manner. Current work in this area can be summarized under four main classes of applications.
Text Documents
Forms
Postal Addresses and Check Reading
Line Drawings
Source: http://cslu.cse.ogi.edu/HLTsurvey/ch2node4.html
written by Richard G. Casey, IBM Almaden Research Center, San Jose, California, USA