A. Document analysis
A reading system (OCR) requires the segmentation of text regions from non-textual ones and the arrangement in their correct reading order. Obviously, the quality of the layout analysis can determine the quality and feasibility of the whole document processing activity, see Figure 1.
Figure 1. Document image understanding/modeling process
I. Document layout analysis / page segmentation
In computer vision, document layout analysis is the process of identifying and categorizing the regions of interest in a document image. This process involves specifying zones in the document and classifying them into texts, tables, images, lines, etc. (see Figure 2).
Figure 2. Example of document layout analysis
Due to the variety of document layouts, document layout analysis is still a challenging problem with many document reading or understanding systems.
II. Table detection / decomposition
Table detection in the document image is still a challenging problem due to the variety of table structures and the complexity of document layout. One of the significant challenges that is caused by the OCR process and machine understanding is the document layout analysis for which table identification is important. In terms of most of the page segmentation algorithms, attention is not given to this problem although the use of tables is very popular.
Figure 3. Example of table detection & table decomposition
III. Information extraction
Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents. In most cases, this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video/documents could be seen as information extraction (Wiki)
(onto text) Gathering detailed structured data from texts, information extraction enables:
The automation of tasks such as smart content classification, integrated search, management, and delivery;
Data-driven activities such as mining for patterns and trends, uncovering hidden relationships, etc.
Figure 4. Example of fix form (ID) and free form (invoice) Information extraction
IV. OCR
Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten, or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo) or from subtitle text superimposed on an image (for example: from a television broadcast) (Wiki)
B. Face/Emotion expression (TBD)
Facial expression is one of the means of non-verbal communication, which accounts for a significant proportion of human interactions. It can be represented as discrete states (such as anger, disgust, fear, happiness) based on cross-culture studies. Human emotions are sometimes mixed together in specific time and space conditions. However, ignoring the intricate emotions intentionally created by humans, the primary emotion is still widespread due to its intuitive definition. Like most other FER methods, our approach focuses on recognizing six facial emotional expressions (proposed by Ekman) and the neutral state on the static image and ignoring the temporal relationships.