All examples below refer to the LDBC Semantic Publishing Benchmark (SPB) query mix (http://ldbc.eu). This benchmark is chosen because the domain is easy to understand:
The GraphDB storage component uses disk-based AVL Trees to keep triples in ordered fashion. GraphDB keeps two such trees as indices, which contain all statements, sorted by POS or PSO. They can return all triples for a fixed predicate with either bound or unbound subject & object.
GraphDB uses the indexed nested loops (INL) join strategy. E.g. assume that for the following query:
the optimiser has selected to execute the ?x rdf:type foaf:Person pattern first:
Aggregation is done in a single pass over the result set and uses HashMaps to calculate the aggregate values. The aggregation overhead is relatively small compared to the fetch time and is done in linear time over the collection size.
For example, a typical aggregation query is LDBC SPB Q7:
Execution time is ~500ms, fetch time is ~2700ms, and aggregation time is <100ms.
To see the query explain plan, use the onto:explain pseudo-graph:
Instead of the query result, GraphDB returns an iterator with the explain plan result.
The query plan is:
The optimizer selected to first execute the join ?x rdf:type rdfs:Class, then ?x rdfs:label ?label. The reason is that the number of classes is much smaller than the number of things that have a label (45 vs 200 in this case).
The query plan (on a 5M Dataset) is:
Skip to end of metadata Go to start of metadata