Complex relations in RDF

The simple subject-predicate-object data model of RDF tempts us to treat relationships and verbs as simple predicates or properties/properties.

However, this often does not do justice to reality.

For example, if we are collecting data on employment relationships or project contractors, a first approach might be:

@prefix :  .
@prefix demo:  .
@prefix gist:  .
@prefix schema:  .
   
:_Meier schema:worksFor ~:_Company .

And then we might want to say that Mr. Maier works as a team leader:

:_Meier schema:worksFor ~:_Company ..
:_Meier schema:ocupationalCategory :_Teamleader .

Super. But maybe Mr. Maier is a freelancer and works for two companies
at the same time

:_Meier schema:worksFor ~:_CompanyA .
:_Meier schema:worksFor ~:_CompanyB .
:_Meier schema:ocupationalCategory :_Teamleader .
:_Meier schema:ocupationalCategory :_Project manager .

In which company is he now team leader, in which project manager? Or we want to say that Maier has been working for :_CompanyA since 2012, how does that work?

RDF-Star is often seen as a solution for displaying temporal validity, but as we'll see in a moment, it doesn't do as much as some people hope:

:_meier schema:worksFor :_Firma >>
  gist:actualStartDateTime "2012-09-01T00:00:00" .
:_meier schema:worksFor :_Company2 >>
  gist:actualStartDateTime "2012-09-01T00:00:00" .
:_meier schema:position :_Teamleader >>
  gist:actualStartDateTime "2012-09-01T00:00:00" .
:_meier schema:position :_Project manager >>
  gist:actualStartDateTime "2012-09-01T00:00:00" .

RDF-Star cannot tell us here to what extent these four facts are related, even if they all start at the same time...

An old problem, a well-known solution

This problem of converting verbs and adjectives into formal logic has been known for a long time.

An often-used solution was first described by Donald Davidson in The Logical Form of Action Sentences (1967), according to him all actions can be described as events, and events in his theory are objects, or in RDF jargon, entities.

This therefore turns schema:worksFor into demo:Employment. And all the necessary information can then be attached to this freelance employment relationship:

 :_MeiersArbeitsverhältnis_A a demo:Contract ;
      demo:client :_CompanyA ;
      demo:contractor :_Meier ;
      demo:position :_Teamleader ;
      gist:actualStartDateTime "TimeA"^^xsd:dateTime .
      
      :_MeiersArbeitsverhältnis_B a demo:Contract ;
      demo:client :_CompanyB ;
      demo:contractor :_Meier ;
      demo:position :_ProjectManager ;
      gist:actualStartDateTime "TimeA"^^xsd:dateTime .

And while we're at it, let's add information about his department and the underlying contract:

 :_MeiersArbeitsverhältnis_A a demo:Contract ;
  demo:client :_CompanyA ;
  demo:contractor :_Meier ;
  demo:position :_Team leader ;
  demo:department :_DepartmentA ;
  demo:contract :_contractA ;
  gist:actualStartDateTime "TimeA"^^xsd:dateTime .
  
  :_MeiersArbeitsverhältnis_B a demo:Contract ;
  demo:client :_CompanyB ;
  demo:contractor :_Meier ;
  demo:position :_Projectmanager ;
  demo:department :_DepartmentB ;
  demo:contract :_ContractB ;
  gist:actualStartDateTime "TimeA"^^xsd:dateTime .

This means we have captured significantly more information. And as soon as more information is recorded for such an employment and contractual relationship, the difference in the required triples, i.e. the load on the trip store, diminishes.

In the simple statement, the difference is one to three triples:

 :_Meier schema:worksFor :_CompanyA .
 
  # vs
  
  :_MaiersArbeitsverhältnis_a a demo:Contract ;
    demo:client :_CompanyA ;
    demo:auftragnehmer :_Meier .

And if we were to use RDF-Star here to express the time frame, we would need an additional RDF-Star statement for each triple, which brings us to 8 to 8 triples and we have not even expressed that we are describing *a* contractual relationship here.

Of course, we could now add another 4 RDF star statements that refer to the associated contract in order to summarize these triples, but that would already bring us to 12 to 8 triples: a clear advantage for the representation of a verb or a property as a "thing", as an entity, as its own
URL.

"Reification" is a bad term for this in the RDF context

This technique is often called "reification": turning a property into a "thing".

In RDF, however, there is a semantic form called "reification" that can be used to reference a triple. RDF-Star started out a bit as a syntactic sugar for reification in RDF, a simpler syntax for making a statement about a triple.

This is why "reify" is often interpreted in the context of RDF as this rather rarely used vocabulary, which leads to misunderstandings.

This is why I avoid the word in this context and only talk about event semantics.

Query complexity

Queries on such complex data models are naturally more complex: both the textual and conceptual query itself, as well as the execution of the query. This is one reason why we see so many simple properties defined in ontologies, when perhaps an entity would have made more sense.

For example, let's search for the companies where Mr. Maier worked in a simple model:

select ?company where {
   :_Maier schema:worksFor ?company
}

Wonderful, isn't it? Knowledge graphs can be that easy...

But now with the same relationship expressed as an event:

 select ?company where {
   ?contract a demo:Contract ;
   demo:contractor :_Maier ;
   demo:client ?company
 }

Significantly more tedious to write, and the triplestore also has to search for two complete triples and a potential set of triples and then do a join of the three triples to answer - significantly more time-consuming!

A small benchmark is still missing here, which I hope to add in the near future.

More detail is always possible

Of course, we can also look at the properties of our demo:Contract as things: perhaps Mr. Maier's position changes during the term of a contract. Then our demo:position is no longer sufficient, and we need to introduce a demo:position, with the demo:contract and a position description properties.

And so on, and so on. This can become almost as complex as you like. And overwhelm every triplestore (and every head).

Suggested procedure

In my opinion, you should check whether there is an event behind every property. Or let's say: whether you can attach more information to this property.

This is the only way to get complete models.

And then you should consider whether you *really* need this information!

And then estimate what the different models mean for the triplestore used, and here only testing, testing, testing helps.

Testing with synthetic or live data, but above all with a quantity of data that is at least close to the desired quantity, at most an order of magnitude away from it. Preferably a maximum of 50% away from it.

And then use this information to decide on a case-by-case basis which property is modeled and how: as an OWL property or as an OWL node or individual.

Whereby "deciding on a case-by-case basis" should often be underpinned by performance measurements.

An old problem, a well-known solution

"Reification" is a bad term for this in the RDF context

Query complexity

More detail is always possible

Suggested procedure

Additional resources