Research

A. Document analysis

A reading system (OCR) requires the segmentation of text regions from non-textual ones and the arrangement in their correct reading order. Obviously, the quality of the layout analysis can determine the quality and feasibility of the whole document processing activity, see Figure 1.

Figure 1. Document image understanding/modeling process

I. Document layout analysis / page segmentation

In computer vision, document layout analysis is the process of identifying and categorizing the regions of interest in a document image. This process involves specifying zones in the document and classifying them into texts, tables, images, lines, etc. (see Figure 2).

Figure 2. Example of document layout analysis

Due to the variety of document layouts, document layout analysis is still a challenging problem with many document reading or understanding systems.

II. Table detection / decomposition

Table detection in the document image is still a challenging problem due to the variety of table structures and the complexity of document layout. One of the significant challenges that is caused by the OCR process and machine understanding is the document layout analysis for which table identification is important. In terms of most of the page segmentation algorithms, attention is not given to this problem although the use of tables is very popular.

Figure 3. Example of table detection & table decomposition

III. Information extraction

Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents. In most cases, this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video/documents could be seen as information extraction (Wiki)

(onto text) Gathering detailed structured data from texts, information extraction enables:

The automation of tasks such as smart content classification, integrated search, management, and delivery;
Data-driven activities such as mining for patterns and trends, uncovering hidden relationships, etc.

Figure 4. Example of fix form (ID) and free form (invoice) Information extraction

IV. OCR

Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten, or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo) or from subtitle text superimposed on an image (for example: from a television broadcast) (Wiki)

B. Face/Emotion expression (TBD)

Facial expression is one of the means of non-verbal communication, which accounts for a significant proportion of human interactions. It can be represented as discrete states (such as anger, disgust, fear, happiness) based on cross-culture studies. Human emotions are sometimes mixed together in specific time and space conditions. However, ignoring the intricate emotions intentionally created by humans, the primary emotion is still widespread due to its intuitive definition. Like most other FER methods, our approach focuses on recognizing six facial emotional expressions (proposed by Ekman) and the neutral state on the static image and ignoring the temporal relationships.

Google Sites

Report abuse