GraphDB-SE Query Behaviour

Skip to end of metadata
Go to start of metadata
This documentation is NOT for the latest version of GraphDB.

Latest version - GraphDB 7.1

GraphDB Documentation

Next versions

GraphDB 6.6
GraphDB 7.0
GraphDB 7.1

Previous versions

GraphDB 6.4
GraphDB 6.3
GraphDB 6.2
GraphDB 6.0 & 6.1

[OWLIM 5.4]
OWLIM 5.3
[OWLIM 5.2]
[OWLIM 5.1]
[OWLIM 5.0]
[OWLIM 4.4]
[OWLIM 4.3]
[OWLIM 4.2]
[OWLIM 4.1]
[OWLIM 4.0]

This section of the user guide is intended to provide all relevant information regarding SPARQL query processing in GraphDB. The following paragraphs detail the standards supported, implementation specific behaviour permitted by the standards, and extensions that add new functionality.

SPARQL compliance

GraphDB supports the following SPARQL specifications:

Note that these are working drafts and have not yet reached recommendation status with the W3C.

SPARQL 1.1 Protocol for RDF defines the means for transmitting SPARQL queries to a SPARQL query processing service, and returning the query results to the entity that requested them.

SPARQL 1.1 Query provides more powerful query constructions compared to SPARQL 1.0. It adds:

  • Aggregates;
  • Subqueries;
  • Negation;
  • Expressions in the SELECT clause;
  • Property Paths;
  • Assignment;
  • An expanded set of functions and operators.

SPARQL 1.1 Update provides a means to change the state of the database using a query-like syntax. SPARQL Update has similarities to SQL INSERT INTO, UPDATE WHERE and DELETE FROM behaviour. Full details are provided on the W3C SPARQL Update working group page, but here is a brief summary of the various types of modification operations on the RDF triples:

  • INSERT DATA {...} - inserts RDF statements;
  • DELETE DATA {...} - removes RDF statements;
  • DELETE {...} INSERT {...} WHERE {...} - for more complex modifications;
  • LOAD (SILENT) from_iri - loads an RDF document identified by from_iri;
  • LOAD (SILENT) from_iri INTO GRAPH to_iri - loads an RDF document into the local graph called to_iri;
  • CLEAR (SILENT) GRAPH iri - removes all triples from the graph identified by iri;
  • CLEAR (SILENT) DEFAULT - removes all triples from the default graph;
  • CLEAR (SILENT) NAMED - removes all triples from all named graphs;
  • CLEAR (SILENT) ALL - removes all triples from all graphs.

The following operations are used to manage graphs:

  • CREATE - creates a new graph in stores that support empty graphs;
  • DROP - removes a graph and all of its contents;
  • COPY - modifies a graph to contain a copy of another;
  • MOVE - moves all of the data from one graph into another;
  • ADD - reproduces all data from one graph into another.

SPARQL 1.1 Federation provides extensions to the query syntax for executing distributed queries over any number of SPARQL endpoints. This feature is very powerful, and allows integration of RDF data from different sources using a single query.

For example, to discover DBpedia resources about people who have the same names as those stored in a local repository:

SELECT ?dbpedia_id
WHERE {
   ?person a foaf:Person ;
           foaf:name ?name .
   SERVICE <http://dbpedia.org/sparql> {
        ?dbpedia_id a dbpedia-owl:Person ;
                    foaf:name ?name .
   }
}

The above query matches the first part against the local repository and for each person it finds, it checks the DBpedia SPARQL endpoint to see if a person with the same name exists and if so returns the id.

Since Sesame repositories are also SPARQL endpoints, it is possible to use the federation mechanism to do distributed querying over several repositories on a local server. For example, imagine two repositories - one repository (called 'my_concepts') with triples about concepts and a separate repository (called 'my_labels'), which contains all the label information. To retrieve the corresponding label for each concept the following query can be executed on the 'my_concepts' repository:

SELECT ?id ?label
WHERE {
    ?id a ex:Concept .
    SERVICE <http://localhost:8080/openrdf-sesame/repositories/my_labels> {
        ?id rdfs:label ?label.
    }
}

Federation must be used with caution, first of all to avoid doing excessive querying of remote (public) SPARQL endpoints, but also because it can lead to inefficient query patterns. The following example finds resources in the second SPARQL endpoint, which have a similar rdfs:label to the rdfs:label of <http://dbpedia.org/resource/Vaccination> in the first SPARQL endpoint:

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>

SELECT ?endpoint2_id {
    SERVICE <http://faraway_endpoint.org/sparql>
    {
        ?endpoint1_id rdfs:label ?l1 .
        FILTER( lang(?l1) = "en" )
    }
    SERVICE <http://remote_endpoint.com/sparql>
    {
        ?endpoint2_id rdfs:label ?l2 .
        FILTER( str(?l2) = str(?l1) )
    }
}
BINDINGS ?endpoint1_id
{ ( <http://dbpedia.org/resource/Vaccination> ) }

However, such a query is very inefficient, because no intermediate bindings are passed between endpoints. Instead, both sub-queries execute independently, requiring the second sub-query to return all X rdfs:label Y statements that it stores. These are then joined locally to the (likely much smaller) results of the first sub-query.

SPARQL 1.1 Graph Store HTTP Protocol provides a means for updating and fetching RDF graph content from a Graph Store over HTTP in the REST style. The URL patterns for this new functionality are provided at:

  • <SESAME_URL>/repositories/<repo_id>/rdf-graphs/service (for indirectly referenced named graphs)
  • <SESAME_URL>/repositories/<repo_id>/rdf-graphs/<NAME> (for directly referenced named graphs).

The methods supported by these resources and their effects are:

  • GET fetches statements in the named graph from the repository in the requested format.
  • PUT updates data in the named graph in the repository, replacing any existing data in the named graph with the supplied data. The data supplied with this request is expected to contain an RDF document in one of the supported RDF formats.
  • DELETE deletes all data in the specified named graph in the repository.
  • POST updates data in the named graph in the repository by adding the supplied data to any existing data in the named graph. The data supplied with this request is expected to contain an RDF document in one of the supported RDF formats.

Request headers:

'Accept': Relevant values for GET requests are the MIME types of supported RDF formats.
'Content-Type': Must specify the encoding of any request data that is sent to a server. Relevant values are the MIME types of supported RDF formats.

For requests on indirectly referenced named graphs, the following parameters are supported:

'graph' (optional): specifies the URI of the named graph to be accessed.
'default' (optional): specifies that the default graph is to be accessed. This parameter is expected to be present but have no value.

Each request on an indirectly referenced graph needs to specify precisely one of the above parameters.

Named graphs

An RDF database can store collections of RDF statements (triples) in separate graphs identified (named) by a URI. A group of statements with a unique name is called a named graph. An RDF database has one more grapoh that does not have a name and this is called the 'default graph'.

The SPARQL query syntax provides a means to execute queries across default and named graphs using FROM and FROM NAMED clauses. These clauses are used to build an 'RDF Dataset', which identifies what statements the SPARQL query processor will use to answer a query. The dataset contains a default graph and named graphs and is constructed as follows:

  • FROM uri - brings statements from the database's graph, identified by 'uri' in to the dataset's default graph, i.e. the statements 'lose' their graph name.
  • FROM NAMED uri - brings the statements from database's graph identified by 'uri' in to the dataset, i.e. the statements keep their graph name.

If either FROM or FROM NAMED are used, then the database's default graph is no longer used as input for processing this query. In effect, the combination of FROM and FROM NAMED clauses exactly defines the dataset. This is somewhat bothersome, as it precludes the possibility, for instance, of executing a query over just one named graph and the default graph. However, there is a programmatic way to get around this limitation that is described below.

The default SPARQL dataset

The SPARQL specification does not define what happens when no FROM or FROM NAMED clauses are present in a query, i.e. it does not define how a SPARQL processor should behave when no dataset is defined. In this situation, implementations are free to construct the default dataset in any way they please.

GraphDB constructs the default dataset as follows:

  • The dataset's default graph contains the merge of the database's default graph AND all the database's named graphs.
  • The dataset contains all named graphs from the database.

This means that if a statement ex:x ex:y ex:z exists in the database in the graph ex:g, then the following query patterns will behave as follows:

Query Bindings
SELECT * { ?s ?p ?o } ?s=ex:x ?p=ex:y ?o=ex:z
SELECT * { GRAPH ?g { ?s ?p ?o } } ?s=ex:x ?p=ex:y ?o=ex:z ?g=ex:g

in other words, the triple ex:x ex:y ex:z will appear to be in both the default graph and the named graph ex:g at the same time.

There are several reasons for this behaviour:

  1. It provides an easy way to execute a triple pattern query over all stored RDF statements.
  2. It allows all named graph names to be discovered, i.e. with this query: SELECT ?g { GRAPH ?g { ?s ?p ?o } }.

Managing Explicit and Implicit Statements

GraphDB maintains two flags for each statement:

  • Explicit: the statement is inserted in the database by the user: using SPARQL Update, the Sesame API or the 'imports' configuration parameter. The same explicit statement can exist in the database's default graph and in each named graph.
  • Implicit: the statement is created as a result of inference, by either an Axiom or Rule. Inferred statements are ALWAYS created in the database's default graph.

These two flags are not mutually exclusive. The following sequences of operations are possible

  • For the operations, we use the names "insert/delete" for explicit, and "infer/retract" for implicit (retract means that all premises of the statement are deleted or retracted)
  • We use tuples <statement graph flags> to show the results after each operation:
    • <s G EI> means statement s in graph G having both flags Explicit and Implicit
    • <s _ EI> means statement s in the default graph having both flags Explicit and Implicit
    • <_ G _> means the statement is deleted from graph G

First consider operations on a statement s in the default graph only:

  • insert <s _ E>, infer <s _ EI>, delete <s _ I>, retract <_ _ _>
  • insert <s _ E>, infer <s _ EI>, retract <s _ E>, delete <_ _ _>
  • infer <s _ I>, insert <s _ EI>, delete <s _ I>, retract <_ _ _>
  • infer <s _ I>, insert <s _ EI>, retract <s _ E>, delete <_ _ _>
  • insert <s _ E>, insert <s _ E>, delete <_ _ _>
  • infer <s _ I>, infer <s _ I>, retract <_ _ _> (if the two inferences are from the same premises)

This does not show all possible sequences, but the principles become clear:

  • No duplicate statement can exist in the default graph
  • Delete/retract clears the appropriate flag
  • The statement is deleted only after both flags are cleared
  • Deleting an inferred statement has no effect (except to clear the I flag, if any)
  • Retracting an inserted statement has no effect (except to clear the E flag, if any)
  • Inserting the same statement twice has no effect: insert is idempotent
  • Inferring the same statement twice has no effect: infer is idempotent, and I is a flag not a counter. But the Retraction algorithm ensures I is cleared only after all premises of s are retracted.

Now consider operations on a statement s in named graph G, and inferred statement s in the default graph:

  • insert <s G E>, infer <s _ I> <s G E>, delete <s _ I>, retract <_ _ _>
  • insert <s G E>, infer <s _ I> <s G E>, retract <s G E>, delete <_ _ _>
  • infer <s _ I>, insert <s G E> <s _ I>, delete <s _ I>, retract <_ _ _>
  • infer <s _ I>, insert <s G E> <s _ I>, retract <s G E>, delete <_ _ _>
  • insert <s G E>, insert <s G E>, delete <_ _ _>
  • infer <s _ I>, infer <s _ I>, retract <_ _ _> (if the two inferences are from the same premises)

The additional principles are:

  • The same statement can exist in several graphs. In particular, as explicit in graph G and implicit in the default graph
  • Delete/retract works on the appropriate graph

In order to avoid a proliferation of duplicate statements, it is recommended not to insert inferrable statements in named graphs.

Querying Explicit and Implicit Statements

The database's default graph can contain a mixture of explicit and implicit statements. The Sesame API provides a flag called 'includeInferred', which can help in some situations. This flag is passed to several API methods, and when set to false, it causes only explicit statements to be iterated or returned. When this flag is set to true, then both explicit and implicit statements are iterated or returned.

GraphDB provides extensions for providing more control over the processing of explicit and implicit statements. These extensions allow the selection of explicit, implicit, or both for query answering and also provides a mechanism to identify which statements are explicit and which are implicit. This is achieved by using some 'pseudo-graph' names in FROM and FROM NAMED clauses, which cause certain flags to be set. The details are as follows:

Clause Behaviour
FROM <http://www.ontotext.com/explicit>
The dataset's default graph includes only explicit statements from the database's default graph.
FROM <http://www.ontotext.com/implicit>
The dataset's default graph includes only inferred statements from the database's default graph.
FROM NAMED <http://www.ontotext.com/explicit>
The dataset contains a named graph http://www.ontotext.com/explicit that contains only explicit statements from the database's default graph, i.e. quad patterns such that GRAPH ?g {?s ?p ?o} rebinds explicit statements from the database's default graph to a graph named http://www.ontotext.com/explicit

FROM NAMED <http://www.ontotext.com/implicit>
The dataset contains a named graph http://www.ontotext.com/implicit that contains only implicit statements from the database's default graph.

Note that these clauses do not affect the construction of the default dataset in the sense that using any combination of the above will still result in a dataset containing all the named graphs from the database. All that changes which statements appear in the dataset's default graph and whether any extra named graphs (explicit or implicit) appear.

Specifying the dataset programmatically

The Sesame API provides an interface Dataset and an implementation class DatasetImpl for defining the dataset for a query by providing the URIs of named graphs and adding them to the 'default graphs' and 'named graphs' members. This permits 'null' to be used to identify the default database graph (or 'null context' to use Sesame terminology).

This dataset can then be passed to queries or updates, e.g.

Accessing internal identifiers for entities

Internally, GraphDB uses integer identifiers (IDs) to index all entities (URIs, blank nodes and literals). Statement indices are made up of these IDs and a large data structure is used to map from ID to entity value and back. There are occasions, e.g. when interfacing to application infrastructure, when having access to these internal IDs can improve the efficiency of data structures external to GraphDB by allowing them to be indexed by an integer value rather than a full URI.

This section introduces a special GraphDB predicate and function that provide access to these internal IDs. The datatype of internal IDs is <http://www.w3.org/2001/XMLSchema#long>.

Predicate <http://www.ontotext.com/owlim/entity#id>
Description Map between entity and internal ID
Example Select all entities and their IDs:

PREFIX ent: <http://www.ontotext.com/owlim/entity#>
SELECT * WHERE {
?s ent:id ?id
} ORDER BY ?id
Function <http://www.ontotext.com/owlim/entity#id>
Description Return an entity's internal ID
Example Select all statements and order them by the internal ID of the object values:

PREFIX ent: <http://www.ontotext.com/owlim/entity#>
SELECT * WHERE {
?s ?p ?o .
} order by ent:id(?o)

Examples

  • Enumerate all the entities and bind the nodes to ?s and their IDs to ?id, order by ?id:
    select * where {
      ?s <http://www.ontotext.com/owlim/entity#id> ?id
    } order by ?id
    
  • Enumerate all non-literals and bind the nodes to ?s and their IDs to ?id, order by ?id:
    SELECT * WHERE {
      ?s <http://www.ontotext.com/owlim/entity#id> ?id .
      FILTER (!isLiteral(?s)) .
    } ORDER BY ?id
    
  • Find the internal IDs of subjects of statements with specific predicate and object values:
    SELECT * WHERE {
      ?s <http://test.org#Pred1> "A literal".
      ?s <http://www.ontotext.com/owlim/entity#id> ?id .
    } ORDER BY ?id
    
  • Find all statements where the object has the given internal ID by using an explicit, untyped value as the ID (the "115" used as object in the second statement pattern):
    SELECT * WHERE {
      ?s ?p ?o.
      ?o <http://www.ontotext.com/owlim/entity#id> "115" .
    }
    
  • As above, but using an xsd:long datatype for the constant within a FILTER condition:
    SELECT * WHERE {
      ?s ?p ?o.
      ?o <http://www.ontotext.com/owlim/entity#id> ?id .
      FILTER (?id="115"^^<http://www.w3.org/2001/XMLSchema#long>) .
    } ORDER BY ?o
    
  • Find the internal IDs of subject and object entities for all statements:
    SELECT * WHERE {
      ?s ?p ?o.
      ?s <http://www.ontotext.com/owlim/entity#id> ?ids.
      ?o <http://www.ontotext.com/owlim/entity#id> ?ido.
    }
    
  • Retrieve all statements where the ID of the subject is equal to "115"^^xsd:long, by providing an internal ID value within a filter expression:
    SELECT * WHERE {
      ?s ?p ?o.
      FILTER ((<http://www.ontotext.com/owlim/entity#id>(?s)) = "115"^^<http://www.w3.org/2001/XMLSchema#long>).
    }
    
  • Retrieve all statements where the string-ised ID of the subject is equal to "115", by providing an internal ID value within a filter expression:
    SELECT * WHERE {
      ?s ?p ?o.
      FILTER (str( <http://www.ontotext.com/owlim/entity#id>(?s) ) = "115").
    }
    

Sesame 'direct hierarchy' vocabulary

GraphDB supports the Sesame specific vocabulary for determining 'direct' sub-class, sub-property and type relationships. The special vocabulary used and their definitions are shown below (reproduced from the Sesame openrdf user guide). The three predicates are all defined using the namespace definition:

PREFIX sesame: <http://www.openrdf.org/schema/sesame#>
Predicate Definition
A sesame:directSubClassOf B Class A is a direct subclass of B if:
  1. A is a subclass of B and;
  2. A and B are not equal and;
  3. there is no class C (not equal to A or B) such that A is a subclass of C and C of B.
P sesame:directSubPropertyOf Q Property P is a direct subproperty of Q if:
  1. P is a subproperty of Q and;
  2. P and Q are not equal and;
  3. there is no property R (not equal to P or Q) such that P is a subproperty of R and R of Q.
I sesame:directType T Resource I is a direct type of T if:
  1. I is of type T and
  2. There is no class U (not equal to T) such that:
    a. U is a subclass of T and;
    b. I is of type U.

Other special query behaviour

There are several more special graph URIs used in GraphDB-SE, which are used to control query evaluation.

Clause Behaviour
FROM/FROM NAMED <http://www.ontotext.com/disable-sameAs> Switch off the enumeration of equivalence classes produced by the owl:sameAs Optimisation. By default, all owl:sameAs URIs are returned by triple pattern matching. This clause reduces the number of results to include a single representative from each owl:sameAs class. See Not enumerating sameAs for details
FROM/FROM NAMED <http://www.ontotext.com/count> Used to trigger the evaluation of the query, so that it gives a single result in which all the variable bindings in the projection are replaced with a plain literal, holding the value of the total number of solutions of the query. In the case of a CONSTRUCT query in which the projection contains three variables (?subject, ?predicate, ?object), the subject and the predicate are bound to <http://www.ontotext.com/> and the object holds the literal value; this is because there cannot exist a statement with literal in the place of the subject or predicate. This clause is deprecated in favor of using the COUNT aggregate of SPARQL 1.1
FROM/FROM NAMED <http://www.ontotext.com/skip-redundant-implicit> Used to trigger the exclusion of implicit statements when there exists an explicit one within a specific context(even default). Initially implemented to allow for filtering of redundant rows where the context part is not taken into account and which leads to 'duplicate' results.
FROM <http://www.ontotext.com/distinct> Using this special graph name in DESCRIBE and CONSTRUCT queries will cause only distinct triples to be returned. This is useful when several resources are being described, where the same triple can be returned more than once, i.e. when describing its subject and its object. This clause is deprecated in favor of using the DISTINCT clause of SPARQL 1.1
FROM <http://www.ontotext.com/owlim/cluster/control-query> Identifies the query to the GraphDB-Enterprise cluster master node as needing to be routed to all worker nodes.
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.