OWLIM-SE Query Behaviour

Skip to end of metadata
Go to start of metadata
Search
This documentation is NOT for the latest version of GraphDB.

Latest version - GraphDB 7.1

OWLIM Documentation

Next versions

OWLIM 5.0
OWLIM 5.1
OWLIM 5.2
OWLIM 5.3
OWLIM 5.4

GraphDB 6.0 & 6.1
GraphDB 6.2
GraphDB 6.3
GraphDB 6.4
GraphDB 6.5
GraphDB 6.6
GraphDB 7.0
GraphDB 7.1

Previous versions

OWLIM 4.3
OWLIM 4.2
OWLIM 4.1
OWLIM 4.0

This section of the user guide is intended to provide all relevant information regarding SPARQL query processing in OWLIM. The following paragraphs will detail the standards supported, implementation specific behaviour permitted by the standards and extensions that add new functionality.

SPARQL compliance

OWLIM supports the following SPARQL specifications:

Note that these are working drafts and have not yet reached recommendation status with the W3C.

SPARQL 1.1 Protocol for RDF defines the means for transmitting SPARQL queries to a SPARQL query processing service and returning the query results to the entity that requested them.

SPARQL 1.1 Query provides more powerful query constructions compared to SPARQL 1.0. It adds:

  • Aggregates
  • Subqueries
  • Negation
  • Expressions in the SELECT clause
  • Property Paths
  • Assignment
  • An expanded set of functions and operators

SPARQL 1.1 Update provides a means to change the state of the database using a query-like syntax. SPARQL Update has similarities to SQL INSERT INTO, UPDATE WHERE and DELETE FROM behaviour. Full details are provided on the W3C SPARQL Update working group page, but here is a brief summary of the various types of modification operations on RDF triples:

  • INSERT DATA {...} - for inserting RDF statements
  • DELETE DATA {...} - for removing RDF statements
  • DELETE {...} INSERT {...} WHERE {...} - for more complex modifications
  • LOAD (SILENT) from_iri - to load an RDF document identified by from_iri
  • LOAD (SILENT) from_iri INTO GRAPH to_iri - to load an RDF document into the local graph called to_iri
  • CLEAR (SILENT) GRAPH iri - remove all triples from the graph identified by iri
  • CLEAR (SILENT) DEFAULT - remove all triples from the default graph
  • CLEAR (SILENT) NAMED - remove all triples from all named graphs
  • CLEAR (SILENT) ALL - remove all triples from all graphs

The follow operations are used to manage graphs:

  • CREATE - creates a new graph in stores that support empty graphs
  • DROP - removes a graph and all of its contents
  • COPY - modifies a graph to contain a copy of another
  • MOVE - moves all of the data from one graph into another
  • ADD - reproduces all data from one graph into another

SPARQL 1.1 Federation provides extensions to the query syntax for executing distributed queries over any number of SPARQL endpoints.This new feature from Sesame 2.6 is very powerful, and allows integration of RDF data from different sources using a single query.

For example, to discover DBpedia resources about people who have the same names as those stored in a local repository:

SELECT ?dbpedia_id
WHERE {
   ?person a foaf:Person ;
           foaf:name ?name .
   SERVICE <http://dbpedia.org/sparql> {
        ?dbpedia_id a dbpedia-owl:Person ;
                    foaf:name ?name .
   }
}

The above query matches the first part against the local repository and for each person it finds, it checks the DBpedia SPARQL endpoint to see if a person with the same name exists and if so returns the id.

Since Sesame repositories are also SPARQL endpoints, it is possible to use the federation mechanism to do distributed querying over several repositories on a local server. For example, imagine two repositories - one repository (called 'my_concepts') with triples about concepts and a separate repository (called 'my_labels') that contains all the label information. To retrieve the corresponding label for each concept the following query can be executed on the 'my_concepts' repository:

SELECT ?id ?label
WHERE {
    ?id a ex:Concept .
    SERVICE <http://localhost:8080/openrdf-sesame/repositories/my_labels> {
        ?id rdfs:label ?label.
    }
}

Federation must be used with caution, first of all to avoid doing excessive querying of remote (public) SPARQL endpoints, but also because it can lead to inefficient query patterns. The following example finds resources in the second SPARQL endpoint that have a similar rdfs:label to the rdfs:label of <http://dbpedia.org/resource/Vaccination> in the first SPARQL endpoint:

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>

SELECT ?endpoint2_id {
    SERVICE <http://faraway_endpoint.org/sparql>
    {
        ?endpoint1_id rdfs:label ?l1 .
        FILTER( lang(?l1) = "en" )
    }
    SERVICE <http://remote_endpoint.com/sparql>
    {
        ?endpoint2_id rdfs:label ?l2 .
        FILTER( str(?l2) = str(?l1) )
    }
}
BINDINGS ?endpoint1_id
{ ( <http://dbpedia.org/resource/Vaccination> ) }

However, such a query is very inefficient, because no intermediate bindings are passed between endpoints. Instead, both sub-queries execute independently, requiring the second sub-query to return all X rdfs:label Y statements that it stores. These are then joined locally to the (likely much smaller) results of the first sub-query.

SPARQL 1.1 Graph Store HTTP Protocol provides a means for updating and fetching RDF graph content from a Graph Store over HTTP in the REST style. The URL patterns for this new functionality are provided at:

  • <SESAME_URL>/repositories/<repo_id>/rdf-graphs/service (for indirectly referenced named graphs)
  • <SESAME_URL>/repositories/<repo_id>/rdf-graphs/<NAME> (for directly referenced named graphs).

The methods supported by these resources and their effects are:

  • GET fetches statements in the named graph from the repository in the requested format.
  • PUT updates data in the named graph in the repository, replacing any existing data in the named graph with the supplied data. The data supplied with this request is expected to contain an RDF document in one of the supported RDF formats.
  • DELETE deletes all data in the specified named graph in the repository.
  • POST updates data in the named graph in the repository by adding to any existing data in the named graph with the supplied data. The data supplied with this request is expected to contain an RDF document in one of the supported RDF formats.

Request headers:

'Accept': Relevant values for GET requests are the MIME types of supported RDF formats.
'Content-Type': Must specify the encoding of any request data that is sent to a server. Relevant values are the MIME types of supported RDF formats.

For requests on indirectly referenced named graphs, the following parameters are supported:

'graph' (optional): specifies the URI of the named graph to be accessed.
'default' (optional): specifies that the default graph is to be accessed. This parameter is expected to be present but have no value.

Each request on an indirectly referenced graph needs to specify precisely one of the above parameters.

Named graphs

An RDF database can store collections of RDF statements (triples) in separate graphs identified (named) by a URI. A group of statements with a unique name is called a named graph. An RDF database has one more grapoh that does not have a name and this is called the 'default graph'.

The SPARQL query syntax provides a means to execute queries across default and named graphs using FROM and FROM NAMED clauses. These clauses are used to build an 'RDF Dataset' that identifies what statements the SPARQL query processor will use to answer a query. The dataset contains a default graph and named graphs and is constructed as follows:

  • FROM uri - brings statements from the database's graph identified by 'uri' in to the dataset's default graph, i.e. the statements 'lose' their graph name
  • FROM NAMED uri - brings the statements from database's graph identified by 'uri' in to the dataset, i.e. the statements keep their graph name

If either FROM or FROM NAMED are used then the database's default graph is no longer used as input for processing this query. In effect, the combination of FROM and FROM NAMED clauses exactly defines the dataset. This is somewhat bothersome, as it precludes the possibility, for instance, of executing a query over just one named graph and the default graph. However, there is a programmatic way to get around this limitation that is described below.

The default SPARQL dataset

The SPARQL specification does not define what happens when no FROM or FROM NAMED clauses are present in a query, i.e. it does not define how a SPARQL processor should behave when no dataset is defined. In this situation, implementations are free to construct the default dataset in any way they please.

OWLIM constructs the default dataset as follows:

  • The dataset's default graph contains the merge of the database's default graph AND all the database's named graphs
  • The dataset contains all named graphs from the database

This means that if a statement ex:x ex:y ex:z exists in the database in graph ex:g then the following query patterns will behave as follows:

Query Bindings
SELECT * { ?s ?p ?o } ?s=ex:x ?p=ex:y ?o=ex:z
SELECT * { GRAPH ?g { ?s ?p ?o } } ?s=ex:x ?p=ex:y ?o=ex:z ?g=ex:g

in other words, the triple ex:x ex:y ex:z will appear to be in both the default graph and the named graph ex:g at the same time.

There are several reasons for this behaviour:

  1. It provides an easy way to execute a triple pattern query over all stored RDF statements
  2. It allows all named graph names to be discovered, i.e. with this query: SELECT ?g { GRAPH ?g { ?s ?p ?o } }

Managing Explicit and Implicit Statements

Explicit statements: Statements that have been inserted by the the user in to a database, using SPARQL Update, the Sesame API or the 'imports' configuration parameter are flagged as being 'explicit'. Explicit statements can exist in the database's default graph and named graphs.

Implicit statements: Statements that have been created as a result of inference are flagged as being implicit and are stored ONLY in the database's default graph.

Therefore, the database's default graph can contain a mixture of explicit and implicit statements. The Sesame API provides a flag called 'includeInferred' that can help in some situations. This flag is passed to several API methods and when set to false will cause only explicit statements to be iterated or returned. When this flag is set to true, then both explicit and implicit statements are iterated or returned.

OWLIM provides extensions for providing more control over the processing of explicit and implicit statements. These extensions allow the selection of explicit, implicit, or both for query answering and also provides a mechanism to identify which statements are explicit and which are implicit. This is achieved by using some 'pseudo-graph' names in FROM and FROM NAMED clauses that cause certain flags to be set. The details are as follows:

Clause Behaviour
FROM <http://www.ontotext.com/explicit> The dataset's default graph will include only explicit statements from the database's default graph
FROM <http://www.ontotext.com/implicit> The dataset's default graph will include only inferred statements from the database's default graph
FROM NAMED <http://www.ontotext.com/explicit> The dataset will contain a named graph called http://www.ontotext.com/explicit that contains only explicit statements from the database's default graph, i.e. quad patterns such as {GRAPH ?g {?s ?p ?o} will bind to explicit statements from the database's default graph with a graph name of <http://www.ontotext.com/explicit>
FROM NAMED <http://www.ontotext.com/implicit> The dataset will contain a named graph called http://www.ontotext.com/implicit that contains only implicit statements from the database's default graph

Note that these are only flags and do not affect the construction of the default dataset in the sense that using any combination of the above will still result in the dataset containing all the named graphs from the database. All that is changed is which statements appear in the dataset's default graph and whether any extra named graphs (explicit or implicit) appear.

Specifying the dataset programmatically

The Sesame API provides an interface Dataset and an implementation class DatasetImpl for defining the dataset for a query by providing the URIs of named graphs and adding them to the 'default graphs' and 'named graphs' members. This permits the use of 'null' to be used to identify the default database graph (or 'null context' to use Sesame terminology).

This dataset can then be passed to queries or updates, e.g.

Accessing internal identifiers for entities

Internally, OWLIM uses integer identifiers (IDs) to index all entities (URIs, blank nodes and literals). Statement indices are made up of these IDs and a large data structure is used to map from ID to entity value and back. There are occasions, e.g. when interfacing to application infrastructure, when having access to these internal IDs can improve the efficiency of data structures external to OWLIM by allowing them to be indexed by an integer value rather than a full URI.

This section introduces a special OWLIM predicate and function that provide access to these internal IDs. The datatype of internal IDs is <http://www.w3.org/2001/XMLSchema#long>.

Predicate <http://www.ontotext.com/owlim/entity#id>
Description Map between entity and internal ID
Example Select all entities and their IDs:

PREFIX ent: <http://www.ontotext.com/owlim/entity#>
SELECT * WHERE {
?s ent:id ?id
} ORDER BY ?id
Function <http://www.ontotext.com/owlim/entity#id>
Description Return an entity's internal ID
Example Select all statements and order them by the internal ID of the object values:

PREFIX ent: <http://www.ontotext.com/owlim/entity#>
SELECT * WHERE {
?s ?p ?o .
} order by ent:id(?o)

Examples

  • Enumerate all the entities and bind the nodes to ?s and their IDs to ?id, order by ?id:
    select * where {
      ?s <http://www.ontotext.com/owlim/entity#id> ?id
    } order by ?id
    
  • Enumerate all non-literals and bind the nodes to ?s and their IDs to ?id, order by ?id:
    SELECT * WHERE {
      ?s <http://www.ontotext.com/owlim/entity#id> ?id .
      FILTER (!isLiteral(?s)) .
    } ORDER BY ?id
    
  • Find the internal IDs of subjects of statements with specific predicate and object values:
    SELECT * WHERE {
      ?s <http://test.org#Pred1> "A literal".
      ?s <http://www.ontotext.com/owlim/entity#id> ?id .
    } ORDER BY ?id
    
  • Find all statements where the object has the given internal ID by using an explicit, un-typed value as the ID (the "115" used as object in the second statement pattern):
    SELECT * WHERE {
      ?s ?p ?o.
      ?o <http://www.ontotext.com/owlim/entity#id> "115" .
    }
    
  • As above, but using an xsd:long datatype for the constant within a FILTER condition:
    SELECT * WHERE {
      ?s ?p ?o.
      ?o <http://www.ontotext.com/owlim/entity#id> ?id .
      FILTER (?id="115"^^<http://www.w3.org/2001/XMLSchema#long>) .
    } ORDER BY ?o
    
  • Find the internal IDs of subject and object entities for all statements:
    SELECT * WHERE {
      ?s ?p ?o.
      ?s <http://www.ontotext.com/owlim/entity#id> ?ids.
      ?o <http://www.ontotext.com/owlim/entity#id> ?ido.
    }
    
  • Retrieve all statements where the ID of the subject is equal to "115"^^xsd:long, by providing an internal ID value within a filter expression:
    SELECT * WHERE {
      ?s ?p ?o.
      FILTER ((<http://www.ontotext.com/owlim/entity#id>(?s)) = "115"^^<http://www.w3.org/2001/XMLSchema#long>).
    }
    
  • Retrieve all statements where the string-ised ID of the subject is equal to "115", by providing an internal ID value within a filter expression:
    SELECT * WHERE {
      ?s ?p ?o.
      FILTER (str( <http://www.ontotext.com/owlim/entity#id>(?s) ) = "115").
    }
    

Other special query behaviour

There are several more special graph URIs used in OWLIM-SE that can be used to control query evaluation.

Clause Behaviour
FROM/FROM NAMED <http://www.ontotext.com/disable-sameAs> Used to switch off the enumeration of the equivalence classes produced by owl:sameAs during triple pattern matching, which is the default behaviour, so that solutions followed by these are excluded. Its purpose is to reduce the number of results to only those that are valid for a single representative of the class (this is a rough description and not fully explanatory). For example, given a triple that matches a pattern: test:Inst rdf:type, test:SomeClass and test:Inst is owl:sameAs to test:Inst2 then, by default there would be 2 triples matching the pattern, one for test:Inst and another for test:Inst2. Using the above system graph in FROM/FROM NAMED clauses excludes such redundancies. BE AWARE that if the query uses filters over the textual representation of a node that modifier may skip some valid solutions since not all the nodes within an equivalence class will be matched against such a FILTER.
FROM/FROM NAMED <http://www.ontotext.com/count> Will trigger the evaluation of the query so that it will give a single result in which all the variable bindings in the projection will be replaced with a plain literal holding the value of the total number of solutions of the query, i.e. the equivalent of COUNT(*) from SQL. In the case of a CONSTRUCT query in which the projection contains three variables (?subject, ?predicate, ?object), the subject and the predicate will be bound to <http://www.ontotext.com/> and the object will hold the literal value. This is because there cannot exist a statement with literal in the place of the subject or predicate.
FROM/FROM NAMED <http://www.ontotext.com/skip-redundant-implicit> Will trigger the exclusion of implicit statements when there exists an explicit one within a specific context(even default). Initially implemented to allow for filtering of redundant rows where the context part is not taken into account and which leads to 'duplicate' results.
FROM <http://www.ontotext.com/distinct> Using this special graph name in DESCRIBE and CONSTRUCT queries will cause only distinct triples to be returned. This is useful when several resources are being described, where the same triple can be returned more than once, i.e. when describing its subject and its object.
FROM <http://www.ontotext.com/owlim/cluster/control-query> Identifies the query to the OWLIM-Enterprise cluster master node as needing to be routed to all worker nodes.
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.