Investment protection: an important advantage of RDF and SPARQL

A current customer project has once again shown me two important advantages of the W3C standards around semantic technologies and graph data:

Stability, in the sense of portability of data and queries
Choice of products between which I can exchange my data and queries
And thus an investment protection that proprietary graph databases do not offer

tl;dr

Why RDF? Because it protects your investment!

In the example: Data migration to a new product after 6 years, effort: 1 day. Migration of existing queries: probably less than 4% changes. The data model remains, the APIs and queries for customers remain. The customer could choose between three products to meet the changed requirements.

Stability

The point here is that the data itself represents the value that RDF will retain in the future. Considerable resources have probably gone into structuring, collecting, merging, validating and maintaining this data.

In order to maintain this value, a stable data format is important. I want to still be able to read this data in 10 years' time, otherwise my investment is lost, or at least requires considerable additional resources to convert the data into a format that is currently opportune or fashionable.

This is where a standard like RDF helps. Yes, it is slowly changing. And that's precisely why I can still access and work with it 10 years later using various tools. My investment is protected.

In addition:

Many products support RDF

There are now a number of products for working with the standards around RDF.

These include various triplestores and graph databases. And all of them offer different features and focuses that make them suitable for different applications.

Whether I need particularly fast rule processing, incremental validation of updates even for huge amounts of data, connection to unstructured data or full-text search engines, transactions, time-based queries, location-based queries, high availability or connection to the preferred programming language: there is an offer for almost everything that covers the requirements.

And your data still remains in a standardized format that you may be able to process in another application or database in 5 years.

In addition, there is a wide range of tools and methods for working with this data, from user interfaces, various inference systems, visualization libraries, SQL-to-RDF adapters, there is a lot available. These range from open source to freeware to high-priced enterprise-level products.

In addition, there is training material, examples, code and business and scientific articles galore, and for many problems there is an already formulated solution. Anyone who wants to familiarize themselves with the field will find more than enough training material, both for training employees and for self-study.

If I compare this with Neo4J, for example: one provider, one query language, one database implementation, a limited selection of tools. This may seem like an advantage at first: no choice also means no learning and decision-making process once Neo4J has been implemented. But it also means "no choice" when new skills are required after a longer period of use. It also means transforming the data if the new capabilities require a new database.

An example

My current customer is looking for a new triplestore for a database of 600 million triples that is to grow to 900 million.

In the current application, more than 600 queries are run every time the data is updated to validate the data.

To test the alternatives, I exported the 600 million triples from the existing database into a standardized format (NQuads), transformed them into another standardized format (TriG) using an existing tool and imported them into three different triplestores.

This went off without a hitch. Data migration effort in three different products: one day.

I then sent the existing 600 validation queries - unchanged - to all three alternative products, and of the 600 validations, all were accepted and processed, and only for 20-25 validations out of 600 did the different stores claim errors in the - valid - data.

This is a more than good result, 96.x% of the queries run with the new stores without any changes.

This preserves the investment, the effort to adapt these 20+ queries is very manageable, there was basically no effort to convert the data.

Summary

My customer can today by his decision for W3C standards 6 years ago:

without the cost of a data migration
with minimal costs for customizing existing queries (<5% changes)
increase the performance of the existing solution by an order of magnitude
increase robustness and resilience by switching to an HA-Configuration
simplify the existing architecture
significantly reduce the scope of self-written services

His costs are limited to the one-off costs for the actual improvements: the queries are converted into constraints, the architecture is simplified to work with a smaller number of self-written services.