This section of the user guide is intended to provide all relevant information regarding SPARQL query processing in GraphDB. The following paragraphs detail the standards supported, implementation specific behaviour permitted by the standards, and extensions that add new functionality.
GraphDB supports the following SPARQL specifications:
Note that these are working drafts and have not yet reached recommendation status with the W3C.
SPARQL 1.1 Protocol for RDF defines the means for transmitting SPARQL queries to a SPARQL query processing service, and returning the query results to the entity that requested them.
SPARQL 1.1 Query provides more powerful query constructions compared to SPARQL 1.0. It adds:
SPARQL 1.1 Update provides a means to change the state of the database using a query-like syntax. SPARQL Update has similarities to SQL INSERT INTO, UPDATE WHERE and DELETE FROM behaviour. Full details are provided on the W3C SPARQL Update working group page, but here is a brief summary of the various types of modification operations on the RDF triples:
The following operations are used to manage graphs:
SPARQL 1.1 Federation provides extensions to the query syntax for executing distributed queries over any number of SPARQL endpoints. This feature is very powerful, and allows integration of RDF data from different sources using a single query.
For example, to discover DBpedia resources about people who have the same names as those stored in a local repository:
The above query matches the first part against the local repository and for each person it finds, it checks the DBpedia SPARQL endpoint to see if a person with the same name exists and if so returns the id.
Since Sesame repositories are also SPARQL endpoints, it is possible to use the federation mechanism to do distributed querying over several repositories on a local server. For example, imagine two repositories - one repository (called 'my_concepts') with triples about concepts and a separate repository (called 'my_labels'), which contains all the label information. To retrieve the corresponding label for each concept the following query can be executed on the 'my_concepts' repository:
Federation must be used with caution, first of all to avoid doing excessive querying of remote (public) SPARQL endpoints, but also because it can lead to inefficient query patterns. The following example finds resources in the second SPARQL endpoint, which have a similar rdfs:label to the rdfs:label of <http://dbpedia.org/resource/Vaccination> in the first SPARQL endpoint:
However, such a query is very inefficient, because no intermediate bindings are passed between endpoints. Instead, both sub-queries execute independently, requiring the second sub-query to return all X rdfs:label Y statements that it stores. These are then joined locally to the (likely much smaller) results of the first sub-query.
SPARQL 1.1 Graph Store HTTP Protocol provides a means for updating and fetching RDF graph content from a Graph Store over HTTP in the REST style. The URL patterns for this new functionality are provided at:
The methods supported by these resources and their effects are:
'Accept': Relevant values for GET requests are the MIME types of supported RDF formats.
For requests on indirectly referenced named graphs, the following parameters are supported:
'graph' (optional): specifies the URI of the named graph to be accessed.
Each request on an indirectly referenced graph needs to specify precisely one of the above parameters.
An RDF database can store collections of RDF statements (triples) in separate graphs identified (named) by a URI. A group of statements with a unique name is called a named graph. An RDF database has one more grapoh that does not have a name and this is called the 'default graph'.
The SPARQL query syntax provides a means to execute queries across default and named graphs using FROM and FROM NAMED clauses. These clauses are used to build an 'RDF Dataset', which identifies what statements the SPARQL query processor will use to answer a query. The dataset contains a default graph and named graphs and is constructed as follows:
If either FROM or FROM NAMED are used, then the database's default graph is no longer used as input for processing this query. In effect, the combination of FROM and FROM NAMED clauses exactly defines the dataset. This is somewhat bothersome, as it precludes the possibility, for instance, of executing a query over just one named graph and the default graph. However, there is a programmatic way to get around this limitation that is described below.
The SPARQL specification does not define what happens when no FROM or FROM NAMED clauses are present in a query, i.e. it does not define how a SPARQL processor should behave when no dataset is defined. In this situation, implementations are free to construct the default dataset in any way they please.
GraphDB constructs the default dataset as follows:
This means that if a statement ex:x ex:y ex:z exists in the database in the graph ex:g, then the following query patterns will behave as follows:
in other words, the triple ex:x ex:y ex:z will appear to be in both the default graph and the named graph ex:g at the same time.
There are several reasons for this behaviour:
GraphDB maintains two flags for each statement:
These two flags are not mutually exclusive. The following sequences of operations are possible
First consider operations on a statement s in the default graph only:
This does not show all possible sequences, but the principles become clear:
Now consider operations on a statement s in named graph G, and inferred statement s in the default graph:
The additional principles are:
In order to avoid a proliferation of duplicate statements, it is recommended not to insert inferrable statements in named graphs.
The database's default graph can contain a mixture of explicit and implicit statements. The Sesame API provides a flag called 'includeInferred', which can help in some situations. This flag is passed to several API methods, and when set to false, it causes only explicit statements to be iterated or returned. When this flag is set to true, then both explicit and implicit statements are iterated or returned.
GraphDB provides extensions for providing more control over the processing of explicit and implicit statements. These extensions allow the selection of explicit, implicit, or both for query answering and also provides a mechanism to identify which statements are explicit and which are implicit. This is achieved by using some 'pseudo-graph' names in FROM and FROM NAMED clauses, which cause certain flags to be set. The details are as follows:
Note that these clauses do not affect the construction of the default dataset in the sense that using any combination of the above will still result in a dataset containing all the named graphs from the database. All that changes which statements appear in the dataset's default graph and whether any extra named graphs (explicit or implicit) appear.
The Sesame API provides an interface Dataset and an implementation class DatasetImpl for defining the dataset for a query by providing the URIs of named graphs and adding them to the 'default graphs' and 'named graphs' members. This permits 'null' to be used to identify the default database graph (or 'null context' to use Sesame terminology).
This dataset can then be passed to queries or updates, e.g.
Internally, GraphDB uses integer identifiers (IDs) to index all entities (URIs, blank nodes and literals). Statement indices are made up of these IDs and a large data structure is used to map from ID to entity value and back. There are occasions, e.g. when interfacing to application infrastructure, when having access to these internal IDs can improve the efficiency of data structures external to GraphDB by allowing them to be indexed by an integer value rather than a full URI.
This section introduces a special GraphDB predicate and function that provide access to these internal IDs. The datatype of internal IDs is <http://www.w3.org/2001/XMLSchema#long>.
GraphDB supports the Sesame specific vocabulary for determining 'direct' sub-class, sub-property and type relationships. The special vocabulary used and their definitions are shown below (reproduced from the Sesame openrdf user guide). The three predicates are all defined using the namespace definition:
There are several more special graph URIs used in GraphDB-SE, which are used to control query evaluation.
Skip to end of metadata Go to start of metadata