Explain Plan

Skip to end of metadata
Go to start of metadata
This documentation is NOT for the latest version of GraphDB.

Latest version - GraphDB 7.1

GraphDB Documentation

Next versions

GraphDB 6.6
GraphDB 7.0
GraphDB 7.1

Previous versions

GraphDB 6.4
GraphDB 6.3
GraphDB 6.2
GraphDB 6.0 & 6.1

[OWLIM 5.4]
OWLIM 5.3
[OWLIM 5.2]
[OWLIM 5.1]
[OWLIM 5.0]
[OWLIM 4.4]
[OWLIM 4.3]
[OWLIM 4.2]
[OWLIM 4.1]
[OWLIM 4.0]

The Explain Plan is the regular feature in GraphDB versions 6.4.0, 6.4.1, 6.4.2.

LDBC SPB

All examples below refer to the LDBC Semantic Publishing Benchmark (SPB) query mix (http://ldbc.eu). This benchmark is chosen because the domain is easy to understand:

  • CreativeWorks (journalistic articles)
  • topics (entities found in the content of articles, such as people, organizations, locations)
  • mentions of topics in works

GraphDB Execution Strategy

Disk-based AVL Trees

The GraphDB storage component uses disk-based AVL Trees to keep triples in ordered fashion. GraphDB keeps two such trees as indices, which contain all statements, sorted by POS or PSO. They can return all triples for a fixed predicate with either bound or unbound subject & object.

Join strategy

GraphDB uses the indexed nested loops (INL) join strategy. E.g. assume that for the following query:

select * where {
  ?x rdf:type foaf:Person.
  ?x rdfs:label ?label.
}

the optimiser has selected to execute the ?x rdf:type foaf:Person pattern first:

  • ?x rdf:type foaf:Person causes a query to the POS index, which returns all triples with P=rdf:type, O=foaf:Person
  • Then query execution loops over this collection, binding ?x to items X
  • A nested query is made to the PSO index where P=rdfs:label, S=X

Aggregation

Aggregation is done in a single pass over the result set and uses HashMaps to calculate the aggregate values. The aggregation overhead is relatively small compared to the fetch time and is done in linear time over the collection size.

For example, a typical aggregation query is LDBC SPB Q7:

# Retrieve the N most popular topics mentioned by CreativeWorks. Limit results to some "primary content"
SELECT ?mentions (COUNT(*) as ?count)
WHERE {
  ?creativeWork cwork:mentions ?mentions .
  {
    SELECT ?creativeWork (count(*) as ?pcCount) {
      ?creativeWork bbc:primaryContentOf ?pc .
    }
    GROUP BY ?creativeWork
  }
}
GROUP BY ?mentions
ORDER BY DESC(?count)
LIMIT 10

Execution time is ~500ms, fetch time is ~2700ms, and aggregation time is <100ms.

Using Explain Plan

To see the query explain plan, use the onto:explain pseudo-graph:

PREFIX onto: <http://www.ontotext.com/>
select * from onto:explain
. . .

Instead of the query result, GraphDB returns an iterator with the explain plan result.

  • each row provides information about a single triple pattern
    • the order of triple patterns may not correspond to the original query because the order is selected by the GraphDB Query Optimiser
    • the identation in the first column shows the nesting of pattern queries (nested loops)
    • the second column shows the collection size, i.e. number of triples matching the pattern
    • the next two columns show the number of unique subjects and objects in this collection
    • the last two columns show the optimizer's judgement of "complexity" and the time it took to iterate the collection
  • the last two rows show the execution time, the fetching time, and the number of results

Example plans

Simple Query

select * from onto:explain where {
  ?x rdf:type rdfs:Class.
  ?x rdfs:label ?label.
}

The query plan is:

The optimizer selected to first execute the join ?x rdf:type rdfs:Class, then ?x rdfs:label ?label. The reason is that the number of classes is much smaller than the number of things that have a label (45 vs 200 in this case).

Query LDBC SPB Q16

SELECT DISTINCT ?thing ?tag ?category ?title FROM onto:explain
WHERE {
           ?thing rdf:type cwork:CreativeWork .
           ?thing rdf:type ?class .
           ?class rdfs:subClassOf cwork:CreativeWork .
           ?thing cwork:tag ?tag .
           ?thing ?a ?o .
           ?a rdfs:subPropertyOf cwork:tag .
           ?thing cwork:category ?category .
           ?thing cwork:title ?title .
           FILTER (CONTAINS (?title, "policy") && (?tag = ?o)) .
           OPTIONAL {
             ?thing cwork:audience ?audience .
             FILTER ( ?audience = cwork:InternationalAudience ) .
           }
} ORDER BY ?tag LIMIT 100

The query plan (on a 5M Dataset) is:

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.