Multiple Languages

Post date: Oct 23, 2008 1:25:1 AM

Managing content in multiple languages has always posed a challenge for global businesses. The need to solve this difficult problem has become even more acute with the recent introduction of regulatory changes. Operating with discrete information silos based upon language is not a viable option, as valuable intelligence that could benefit the rest of the organization is effectively trapped in a black hole. Restricting operations to just one language is also impractical.

How Do Manual Legacy Techniques Approach the Problem of Multiple Languages?

Many traditional approaches involve parsing or semantic analysis which use rules of grammar and lexicons in an attempt to explicitly understand textual information. These methods are patently language-specific, with their reliance on the rules of grammar rendering them ineffectual with slang or grammatically incorrect constructions. These methods are also non-scalable, as much manual effort is required to teach the system each new word and every change in meaning. More generally, the system will only support a very limited subset of languages, such as English, German and French, and adding a new and very different language, such as Chinese, can be problematic and manually intensive. These incomplete solutions fall apart when data must be compared across different languages.

How Does Autonomy ZANTAZ Resolve the Challenge of Multiple Languages?

At the core of Autonomy ZANTAZ's technology lies a language-independent, pattern-matching model that uses statistical word patterns and probabilistic modeling to understand content. Although Autonomy ZANTAZ currently supports an impressive 106 languages, it is trivial to add support for more because the technology is fundamentally language independent. Since its approach is mathematically-based and free from linguistic restraints, it need not use any language-dependent parsing or dictionaries to extract meaning. While many enterprise platforms rely on preexisting knowledge of grammar and linguistic rules, Autonomy ZANTAZ allows indexed content to dictate the model as it develops a statistical understanding of patterns that occur in the content. Hence, neither slang nor industry-specific jargon pose a barrier for processing.

Autonomy ZANTAZ automates processes that were once labor intensive such as metadata tagging and taxonomy creation. Its technology can automatically deduce the significance of these new units of meaning, add them to relevant categories and create new categories where necessary.

Furthermore, Autonomy ZANTAZ allows for cross-lingual search and data management. For instance, a user can query a subject in English and automatically be provided with similar information in both English and another language, e.g. Spanish. Automatic language detection of incoming content, as well as language identification of queries, are both offered.