How to use knowledge graphs in UML

The U in UML is short for Unified though the UML is mainly used for software engineering. Yet the UML has spread beyond software and is now commonly used to model technical systems and even more general knowledge about non-technical societal systems. As people stood back and reflected on the nature of modelling reality, be it in UML or any other method, they became aware that the ideas go back to Plato. This is reassuring because it basically says that our modelling practices are rooted in philosophy going back many centuries. How’s that that for proven technology?

Semantic engineers nor system architects consider themselves philosophers but it doesn’t hurt to be aware that the art of modelling reality has ancient roots. This also begs the question what needs to be done to rid a UML model of its “code smell”, i.e. which elements of the UML are allowed when one wishes to create a purely semantic “platonic model” of reality. This is actually a very useful exercise when modellers have a a software engineering perspective and tend to ask themselves how they would implement this or that in java, python, c#, sql, etc. They apply, (sub)conscious constraints to the model for fear of not being able to build a working product at a later stage. This is bad because it distracts from the exercise of capturing business knowledge and locks the model to technologies and products that evolve.

In modelling-speak, the model should never depend on a platform/technology/language or product. The ultimate pitfall is for a modeller to be tasked with having “to build a model for a database by SAP, Oracle, Microsoft, ….”. The result is likely to work and locked to the particular product. Imagine the cost of changing the platform. Things get even worse when time is short and the modeller builds quick fixes dictated by the product. Short term thinking and supplier lock-in are evil but tempting. Long term thinking is essential but can be a hard sell because it pays back later.

A simple approach to avoiding the pitfall of lock-in is to restrain oneself to a tiny subset of the UML class meta-model. Think classes, attributes, enumerations, generalisations and associations. These are the same toolset that Plato used so any thought of data bases is banished to a cave.

Classy knowledge modelling

UML class modelling is the work horse of object-oriented software engineering. People gather the essential business entities and their relations.

Clothes maketh the man

Clothes are to man what attributes are to a class. Think of a police officer and you see a person in a uniform, preferably dark blue. Typical attributes are a whistle, radio, a baton and fire arm. Other statements narrow down the definition in that a police officer is a kind of person who travels in a vehicle or on a bike, has a partner, and chases vilains.

UML modelleth the class

UML class modelling is the work horse of object-oriented software engineering. People gather the essential business entities and their relations. Early design commonly starts by eliciting the concepts and the attributes that make them what they are.

Encourage the steal

As a rule of thumb, try and use existing concepts before defining your own. It feels bad to find out later that others who may know more about a subject and thought longer and harder may have beaten you to it. An infamous dutch saying “beter goed gejat dan slecht uitgevonden” says that you should prefer stealing something good than to poorly invent something new. Linked Open Vocabularies encourage us to reuse exi sting libraries of concepts so there’s no reason to have a qualm about not designing things anew.

Models start living when experts draw a picture for laymen to lay bear the different concepts, attributes and relations. Below sketch shows how one would go about modelling a police officer in UML whilst reusing an existing class Person. The latter is imported from the Flemish “OSLO” model, definitions, including spelling mistakes and all. This would hook up our extension to an existing model so the meaning and context is perfectly clear.

Reusing existing models and the concepts defined therein is fundamental to interoperable information design.

Accepted UML elements

The sketch shows the subset of UML class models. The generalisation expresses that an Officer is a person and has a rank. The rank is provided as an association with stereotype <<dependency>>. The person inherits the legal cohabitation and nobility title from the OSLO model. The attributes data types are rdfs:Literal which implies that one can attach a language tag – @nl, @fr or @de in the Belgian context. Any other XSD data types are acceptable.

This said, the modeller should wonder whether these attributes are really what’s intended when modelling a police officer. It looks as if the OSLO model cares about information that would be found in a civil registry such as a knighthood. Alternatively, the modeller may want to check if foaf:Person or schema:Person meet her needs.

This tiny example really exposes the power of mixing in existing models with one’s own original work because it makes one ask questions that would go under the radar otherwise or reveals attributes that add spice to one’s model. Fancy being arrested by a knighted police officer.

A rose by any other name wouldn’t smell as sweet

Reusing existing models, like citations, helps people speak the same language and reduces semantic friction. Exchanging information is a lot easier when the data pertain to well defined concepts.

Call a rose a rose and use this definition of Rosa. The approach of using a neutral language, in this case latin, is the ancient equivalent of platform- and language-independence so, again, we’re not reinventing the wheel.

Reuse existing models and stand on the shoulders of giants

Publically available ontologies, i.e. libraries of object types including their definition and mutual relations, abound on the web. But how does one reuse this splendid corpus of work if you’re a humble software engineer who has to work with UML

The SEMIC workgroup has given this ample thought and produced a guideline on how build UML and reuse a selection of existing terms as classes in UML. These terms are tied to the ontology on the web through a URI. The URI is encoded either as a tagged value such as

http://xmlns.com/foaf/0.1/Person

or qualified as a well-known prefix plus a local name such as

foaf:Person

where foaf is short for the popular friend-of-a-friend ontology. Either way, a user can follow the link to the unequivocal meaning and the context in which foaf defines a person.

Import existing RDF

Sparx Enterprise Architect (EA) is a popular UML modelling tool. There’s adequate guidance for creating class models in EA. Importing an ontology as a package that can be integrated with our class models is a boon to developers. This page shows an Add-In that can fetch existing ontologies, either from the web or from a set of widely used ontologies such as foaf, skos, rdfs and many more.

Follow your nose

Ontologists tend to urge users to enter a model and read the specs as if it were a rabbit hole. An ontology being a graph, one discovers related terms, fascinating context and many more goodies. The side-effect of following one’s nose is that modellers and programmers alike will unwittingly information that can and should be (re-)used when building systems for a particular application.

Compare this to the present situation where a database developer has to build a model from a blank sheet and, when lucky, a heap of PDF documents and, even more luck, access to domain experts. Chances are that he won’t understand all of it, not being a domain expert, and make mistakes. Review and correction cycles will be needed that will drive nuts the modeller, domain expert and manager.

Building a model from existing ontologies, also known as knowledge graphs, makes life easier because the modeller can surf the graph and discover all there is to know. The prerequisite is of course that domain experts, past or present, locally grown or foreign ones, bothered to share their know-how in the shape of RDF knowledge-graphs.

The single source of truth

This example shows a set of classes in the “local government” namespace defined by OSLO. The text in the attached note is the definition stored with the BasicAddress. This hightlights the good modelling practice of keeping everything in one place. A developer doesn’t want to go looking for a definition of a class in a separate note.

A model with a view

The note in this diagram exposes the definition of the BasicAddress class. Notes can reveal to the reader the definition of classes, attributes or relations. The diagram focus is on on “Person Relation” which is of course central to local government business. There’s a lot more classes in this model than shown in this diagram so the diagram can be considered a human-friendly view of a particular concern. In this case a modeller would be interested in the relations and information that local government entertains with people under its administration.

Other views may be of something completely different, e.g. the planning permissions could focus on addresses and geographic outlines. Such focus diagrams would hide non-essential diagrams whilst showing classes such as geometry which is defined in the locn namespace. The curious modeller can easily poke around the model and design an algorith to find the the birth date of the owner of a parcel in a given community.

Classy knowledge modelling

Clothes maketh the man

UML modelleth the class

Encourage the steal

Accepted UML elements

A rose by any other name wouldn’t smell as sweet

Reuse existing models and stand on the shoulders of giants

Import existing RDF

Follow your nose

The single source of truth

A model with a view

Comments