Glossaries and ontologies: Definitions of terms in data

In the wonderful talk "Design in Practice", Rich Hickey talks about how important an up-to-date glossary is in the design process.

I have a similar view, and see something like "Business Knowledge Blueprints" as an important foundation for business applications and especially for data models and (enterprise) knowledge graphs.

Today, however, I was given the idea of including formal definitions in a glossary. The Web Ontology Language (OWL) is defined as a description logic which, as the name suggests, is primarily suitable for describing or classifying a section of the world or the entities in the Universe of Discourse.

A specialized ontology for the description of glossaries, taxonomies and folksonomies is the Simple Knowledge Organization System (SKOS), which I recommend for every new glossary.

Unique names?

Rich also talks about how important unique terms or names are in the glossary and gives the term "customer" as an example.

I have my doubts here: in my opinion, the term "customer" in particular is not suitable for being clearly defined. When I'm in companies, salespeople, management, the billing department and employees in end "customer" contact often understand "our customers" to mean something quite different.

And despite all the effort to make support staff, for example, say "user", "client" or similar, in everyday language it often becomes "our customers want ...".

I'm glad that in RDF, and therefore in OWL with the prefixes, something like namespaces are created that allow me to map something like this by using accounting:customer or support:customer.

Unique identifiers for individuals - as opposed to terms in a glossary - are extremely important in the area of ETL, but also when linking structured and unstructured data. To this end, I define URLs for each individual as a permanent ID in each of my projects. This is of course "stolen" from Linked Data principles, and yet I like to emphasize it as its own concept "PermID" to highlight the importance for uniquely linking information.

Examples

Let's look at one thing that almost no one defines: an address. What does that mean for your example?

I like to turn to the publicly available ontology gist from Semantic Arts for ideas on how to define concepts. gist is minimalist, but quite precise in its definitions.

Address

If we look at the definition for physical addresses, we see that there is both a gist:StreetAddress and a gist:PostalAddress. In the description, which can be seen as a glossary entry, a gist:StreetAddress is defined as "An address that points to a fixed location in the physical world", while the definition of gist:PostalAddress is: "A set of codes that postal authorities can use to deliver physical mail".

That's quite a difference. And I think gist:StreetAddress is not ideally named: there are "addresses" in the physical world that have nothing to do with streets.

And while gist is defined as an "ontology", neither concept has a formal definition when we look at the OWL version of it :)

Obviously the authors decided that a definition for an upper level ontology would be too restrictive. But perhaps in a specific project it would be beneficial to add a formal definition that refers to actual data structures in use.

But this shows that OWL can be a good way to manage Gloassaries. You can define textual descriptions, even with more semantic meaning than "just" text. gist:StreetAddress, for example, as skos:ScopeNote, which says "This excludes addresses that are not associated with a fixed location, such as a PO Box or an FPO code.". The mere presence of "ScopeNotes" may encourage users to add such notes, leading to more precise definitions.

The meaning of skos:ScopeNote itself can be found in the Simple Knowledge Organization System: "skos:scopeNote provides some, possibly partial, information about the intended meaning of a concept, particularly as an indication of how the use of a concept is constrained in indexing practice. [...]" SKOS Primer

Specification

Let's look at a slightly more formally defined example, a specification.

Remember, gist is an upper level ontology, so specifications are very high level to make them usable in many different contexts.

However, gist defines two types of specifications, a product specification and a service specification. And there is a formal definition for both. Let's take a look:

The natural language definition of gist:ProductSpecification says: "To provide something that can be physically stored or digitally stored." and for gist:ServiceSpecification it says: "A description of something that can be done for a person or organization (which produces some form of action)."

Both definitions are concise, clear and describe the difference ("physical" vs. "can be done"), but if you look at the formal definition, you might be surprised at some additional details:

gist:ProductSpecification
a owl:Class ;
owl:equivalentClass [
a owl:Class ;
owl:intersectionOf (
gist:CatalogItem
[
a owl:Restriction ;
owl:onProperty gist:isCategorizedBy ;
owl:someValuesFrom gist:ProductCategory ;
]
) ;
] ; [...].
 

gist:ServiceSpecification
a owl:Class ;
owl:equivalentClass [
a owl:Class ;
owl:intersectionOf (
gist:CatalogItem
[
a owl:Restriction ;
owl:onProperty gist:isBasisFor ;
owl:someValuesFrom gist:Event ;
]
) ;
] ; [....].

Both define their specifications as a gist:CatalogItem. In other words, as something that is to be sold. This means that it explicitly excludes purely internal specifications, which I find quite surprising.

Also, gist:ServiceSpecification clearly says that it is the basis for a gist:Event, and we could look up the definition to find that a gist:Event (not an "event") is defined as a gist:Behavior with a gist:startDateTime and a gist:endDateTime, so something that actually happens in the world.

And the gist:ProductSpecification is a gist:CatalogItem categorized by a gist:ProductCategory. Also a definition I wouldn't have guessed from the natural language text: only something categorized into a gist:ProductCategory is actually accepted as a ProductSpecification.

Note also the detail that a specification is in the catalog, not a product. This is pretty obvious when you think about it, but often we say, "This product belongs in our catalog/offer/....".

This is where a formal definition forces us to think and act more precisely. I also think it makes the difference between a glossary and everyday language clearer if we talk about the meaning of clientXY:customer and not just "customer".

 

We use cookies on our website to support technical features that enhance your user experience.

Es werden keine Dienste zur Analyse Ihres Verhaltens genutzt, wir tracken sie nicht.