What is an ontology?
An ontology is a “specification of a conceptualization.” This definition is a mouthful but bear with me, it’s actually pretty useful. In general terms, an ontology is an organization of a body of knowledge or, at least, an organization of a set of terms related to a body of knowledge.
However, unlike a glossary or dictionary, which takes terms and provides definitions for them, an ontology works in the other direction. An ontology starts with a concept. We first have to find a concept that is important to the enterprise; and having found the concept, we need to express it in as precise a manner as possible and in a manner that can be interpreted and used by other computer systems. One of the differences between a dictionary or a glossary and ontology is, as we know, dictionary definitions are not really processable by computer systems. But the other difference is that by starting with the concept and specifying it as rigorously as possible, we get definitive meaning that is largely independent of language or terminology. Then the definition states that an ontology is a “specification of a conceptualization.” That is what we just described. In addition, of course, we then attach terms to these concepts, because in order for us humans to use the ontology we need to associate the terms that we commonly use.
Why is this useful to an enterprise?
Enterprises process great amounts of information. Some of this information is structured in databases, some of it is unstructured in documents or semi structured in content management systems. However, almost all of it is “local knowledge” in that its meaning is agreed within a
relatively small, local context. Usually, that context is an individual application, which may have been purchased or may have been built in-house.
One of the most time- and money-consuming activities that enterprise information professionals perform is to integrate information from disparate applications. The reason this typically costs a lot of money and takes a lot of time is not because the information is on different platforms or in different formats – these are very easy to accommodate. The expense is because of subtle, semantic differences between the applications. In some cases, the differences are simple: the same thing is given different names in different systems. However, in many
cases, the differences are much more subtle. The customer in one system may have an 80 or 90% overlap with the definition of a customer in another system, but it’s the 10 or 20% where the definition is not the same that causes most of the confusion; and there are many, many
terms that are far harder to reconcile than “customer.” So the intent of the enterprise ontology is to provide a “lingua franca” to allow, initially, all the systems within an enterprise to talk to each other and, eventually, for the enterprise to talk to its trading partners and the rest of the world.
Isn’t this just a corporate data dictionary or consortia of data standards?
The enterprise ontology does have many similarities in scope to both a corporate data dictionary and consortia data standard. The similarity is primarily in the scope of the effort: both of those initiatives, as well as enterprise ontologies, aim to define the shared terms that an enterprise
uses. The difference is in the approach and the tools. With both a corporate data dictionary and a consortia data standard the interpretation and use of the definitions is strictly by humans, primarily system designers. Within an enterprise ontology, the expression of the ontology is
such that tools are able to interpret and make inferences on the information when the system is running.
How to build an enterprise ontology
The task of building an enterprise ontology is relatively straightforward. You will want an ontology editor. Protégé from Stanford is free and quite a good place to start. As you begin to build your ontology into a suite of systems and other products, you may want to look at
commercial products such as TopBraid from TopQuadrant.
The analytical work is similar to building a conceptual enterprise data model and involves many of the same skills: the ability to form good abstractions, to elicit information from users through interviews, as well as to find informational clues through existing documentation and data. One of the interesting differences is that as the ontology is being built it can be used in connection with data profiling to see whether the information that is currently being stored in information systems does in fact comply with the rules that the ontology would suggest.
What to look for in an enterprise ontology
What distinguishes a good or great enterprise ontology from a merely adequate one are several characteristics that will mostly be exercised later in the lifecycle of the actual use of the ontology. Of course, they are important to consider at the time you’re building the ontology.
Expressiveness
The ontology needs to be expressive enough to describe all the distinctions that an enterprise makes. Most enterprises of any size at all have tens of thousands to hundreds of thousands of distinctions that they use in their information systems. Not only is each piece of schemata in all
of their databases a distinction but so are many of the codes they have in code tables as well as decisions that are called out either in code or in procedure manuals. The sum total of all these distinctions is the operating ontology of the enterprise. However, they are not formally
expressed in one place. The structure as well as the base concepts used need to be rich enough that when a new concept is uncovered it can be expressed in the ontology.
Elegance
At the same time, we need to strive for an elegant representation. It would be simple but perhaps simplistic to take all the distinctions in all the current systems and put them in a simple repository and call them an ontology. This misses some of the great strengths of an ontology. We
want to use our ontology not only to document and describe distinctions but also to find similarities. In these days of Sarbanes-Oxley regulations it would be incredibly helpful to know which distinctions and which parts of which schemas deal with financial commitments and
“material transactions.”
Inclusion and exclusion criteria
Essentially, the ontology is describing distinctions amongst “types.” In many cases, what we would like to know is whether a given instance is of a particular type. Let’s say it’s a record in a product table, therefore it’s a type “product.” But in another system we may have inventory and
we would like to know whether this instance is also compatible with the type that we’ve defined as inventory. In order to do this, we need in the ontology a way to describe inclusion and exclusion criteria: what other clues we would use if we or another system were evaluating a
particular instance to determine whether it was, in fact, of a particular type. For instance, if inventory were defined as being physical goods held for resale, one inclusion criteria might be weight because weight is an indicator of a physical good. Clearly, there would be many more, as well as criteria for excluding. But this gives you an idea.
Cross referencing capability
Another criterion that is very important is the ability to keep track of where the distinction was found; that is, which system currently implements and uses this particular distinction. This is very important for producing any type of where-used information because as we change our
distinctions it might have side effects on other systems.
Inferencing
Inferencing is the ability to find or infer additional information based on the information we have. For instance, if we know that an entity is a person we can infer that the person has a birthday, whether we know it or not, and we can also infer that the person is less than 150 years
old. While this sounds simple at this level, the power in an ontology is when the inference chains become long and complex and we can use the inferencing engine itself to make many of these conclusions on-the-fly.
Foreign-language support
As we described earlier, the ontology is a specification of a conceptualization that we attach terms to. It doesn’t take much to add the ability to add foreign language terms. This adds a great deal of power for developers who wish to present the same information, and the same screens,
in multiple languages, as we are really just manipulating the concepts and attaching the appropriate language at runtime. Some of these characteristics are aided by the existence of tools or infrastructures, but many of them are produced by the skill of the ontologist.
Summary
We believe that the enterprise ontology will become a cornerstone in many information systems in the future. It will become a primary part of the systems integration infrastructure as one application will be translated into the ontology and we will very rapidly know what the
corresponding schema and terms are and what transformations are needed to get to another application. It will become part of the corporate search strategy as search moves beyond mere keywords into actually searching for meaning. It will become part of business intelligence and
data warehousing systems as naïve users can be led to similar terms in the warehouse repository and aid their manual search and query construction.
Many more tools and infrastructures will become available over the next few years that will make use of the ontology, but the prudent information manager will not wait. He or she will recognize that there is a fair lead time to learn and implement something like this, and any
implementation will be better than none because this particular technology promises to greatly leverage all the rest of the system technologies.