This section of the user guide is intended to provide all relevant information regarding SPARQL query processing in OWLIM. The following paragraphs will detail the standards supported, implementation specific behaviour permitted by the standards and extensions that add new functionality.
OWLIM supports the following SPARQL specifications:
Note that these are working drafts and have not yet reached recommendation status with the W3C.
SPARQL 1.1 Protocol for RDF defines the means for transmitting SPARQL queries to a SPARQL query processing service and returning the query results to the entity that requested them.
SPARQL 1.1 Query provides more powerful query constructions compared to SPARQL 1.0. It adds:
SPARQL 1.1 Update provides a means to change the state of the database using a query-like syntax. SPARQL Update has similarities to SQL INSERT INTO, UPDATE WHERE and DELETE FROM behaviour. Full details are provided on the W3C SPARQL Update working group page, but here is a brief summary of the various types of modification operations on RDF triples:
The follow operations are used to manage graphs:
SPARQL 1.1 Federation provides extensions to the query syntax for executing distributed queries over any number of SPARQL endpoints.This new feature from Sesame 2.6 is very powerful, and allows integration of RDF data from different sources using a single query.
For example, to discover DBpedia resources about people who have the same names as those stored in a local repository:
The above query matches the first part against the local repository and for each person it finds, it checks the DBpedia SPARQL endpoint to see if a person with the same name exists and if so returns the id.
Since Sesame repositories are also SPARQL endpoints, it is possible to use the federation mechanism to do distributed querying over several repositories on a local server. For example, imagine two repositories - one repository (called 'my_concepts') with triples about concepts and a separate repository (called 'my_labels') that contains all the label information. To retrieve the corresponding label for each concept the following query can be executed on the 'my_concepts' repository:
Federation must be used with caution, first of all to avoid doing excessive querying of remote (public) SPARQL endpoints, but also because it can lead to inefficient query patterns. The following example finds resources in the second SPARQL endpoint that have a similar rdfs:label to the rdfs:label of <http://dbpedia.org/resource/Vaccination> in the first SPARQL endpoint:
However, such a query is very inefficient, because no intermediate bindings are passed between endpoints. Instead, both sub-queries execute independently, requiring the second sub-query to return all X rdfs:label Y statements that it stores. These are then joined locally to the (likely much smaller) results of the first sub-query.
SPARQL 1.1 Graph Store HTTP Protocol provides a means for updating and fetching RDF graph content from a Graph Store over HTTP in the REST style. The URL patterns for this new functionality are provided at:
The methods supported by these resources and their effects are:
'Accept': Relevant values for GET requests are the MIME types of supported RDF formats.
For requests on indirectly referenced named graphs, the following parameters are supported:
'graph' (optional): specifies the URI of the named graph to be accessed.
Each request on an indirectly referenced graph needs to specify precisely one of the above parameters.
An RDF database can store collections of RDF statements (triples) in separate graphs identified (named) by a URI. A group of statements with a unique name is called a named graph. An RDF database has one more grapoh that does not have a name and this is called the 'default graph'.
The SPARQL query syntax provides a means to execute queries across default and named graphs using FROM and FROM NAMED clauses. These clauses are used to build an 'RDF Dataset' that identifies what statements the SPARQL query processor will use to answer a query. The dataset contains a default graph and named graphs and is constructed as follows:
If either FROM or FROM NAMED are used then the database's default graph is no longer used as input for processing this query. In effect, the combination of FROM and FROM NAMED clauses exactly defines the dataset. This is somewhat bothersome, as it precludes the possibility, for instance, of executing a query over just one named graph and the default graph. However, there is a programmatic way to get around this limitation that is described below.
The SPARQL specification does not define what happens when no FROM or FROM NAMED clauses are present in a query, i.e. it does not define how a SPARQL processor should behave when no dataset is defined. In this situation, implementations are free to construct the default dataset in any way they please.
OWLIM constructs the default dataset as follows:
This means that if a statement ex:x ex:y ex:z exists in the database in graph ex:g then the following query patterns will behave as follows:
in other words, the triple ex:x ex:y ex:z will appear to be in both the default graph and the named graph ex:g at the same time.
There are several reasons for this behaviour:
Explicit statements: Statements that have been inserted by the the user in to a database, using SPARQL Update, the Sesame API or the 'imports' configuration parameter are flagged as being 'explicit'. Explicit statements can exist in the database's default graph and named graphs.
Implicit statements: Statements that have been created as a result of inference are flagged as being implicit and are stored ONLY in the database's default graph.
Therefore, the database's default graph can contain a mixture of explicit and implicit statements. The Sesame API provides a flag called 'includeInferred' that can help in some situations. This flag is passed to several API methods and when set to false will cause only explicit statements to be iterated or returned. When this flag is set to true, then both explicit and implicit statements are iterated or returned.
OWLIM provides extensions for providing more control over the processing of explicit and implicit statements. These extensions allow the selection of explicit, implicit, or both for query answering and also provides a mechanism to identify which statements are explicit and which are implicit. This is achieved by using some 'pseudo-graph' names in FROM and FROM NAMED clauses that cause certain flags to be set. The details are as follows:
Note that these are only flags and do not affect the construction of the default dataset in the sense that using any combination of the above will still result in the dataset containing all the named graphs from the database. All that is changed is which statements appear in the dataset's default graph and whether any extra named graphs (explicit or implicit) appear.
The Sesame API provides an interface Dataset and an implementation class DatasetImpl for defining the dataset for a query by providing the URIs of named graphs and adding them to the 'default graphs' and 'named graphs' members. This permits the use of 'null' to be used to identify the default database graph (or 'null context' to use Sesame terminology).
This dataset can then be passed to queries or updates, e.g.
Internally, OWLIM uses integer identifiers (IDs) to index all entities (URIs, blank nodes and literals). Statement indices are made up of these IDs and a large data structure is used to map from ID to entity value and back. There are occasions, e.g. when interfacing to application infrastructure, when having access to these internal IDs can improve the efficiency of data structures external to OWLIM by allowing them to be indexed by an integer value rather than a full URI.
This section introduces a special OWLIM predicate and function that provide access to these internal IDs. The datatype of internal IDs is <http://www.w3.org/2001/XMLSchema#long>.
There are several more special graph URIs used in OWLIM-SE that can be used to control query evaluation.
Skip to end of metadata Go to start of metadata