{toc}
h1. Introduction
RDF Rank is an algorithm, which identifies the more important or more popular entities in the repository by examining their interconnectedness. The popularity of entities can then be used to order query results in a similar way to the internet search engines, such as how Google orders search results using PageRank [http://en.wikipedia.org/wiki/PageRank].
The RDF Rank component computes a numerical weighting for all the nodes in the entire RDF graph stored in the repository, including URIs, blank nodes and literals. The weights are floating point numbers with values between 0 and 1 that can be interpreted as a measure of a node's relevance/popularity.
Since the values range from 0 to 1, the weights can be used for sorting a result set (the lexicographical order works fine even if the rank literals are interpreted as plain strings). Here is an example SPARQL query that uses RDF rank for sorting results by their popularity:
{noformat}PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
PREFIX opencyc-en: <http://sw.opencyc.org/2008/06/10/concept/en/>
SELECT * WHERE {
?Person a opencyc-en:Entertainer .
?Person rank:hasRDFRank ?rank .
}
ORDER BY DESC(?rank) LIMIT 100
{noformat}
As seen in the example query, RDF Rank weights are made available via a special system predicate. Triple patterns with the predicate {{[http://www.ontotext.com/owlim/RDFRank#hasRDFRank]}} are handled specially by GraphDB, where the object of the statement pattern is bound to a literal containing the RDF Rank of the subject.
In order to use this mechanism the RDF ranks for the whole repository must be computed in advance. This is done by committing a series of SPARQL updates that use special vocabulary to parameterise the weighting algorithm, followed by an update that triggers the computation itself.
h1. Parameters
|| Parameter | *Maximum iterations* ||
|| Predicate | {{[http://www.ontotext.com/owlim/RDFRank#maxIterations]}} ||
|| Description | Sets the maximum number of iterations of the algorithm over all entities in the repository. ||
|| Default | 20 ||
|| Example | PREFIX rank: <[http://www.ontotext.com/owlim/RDFRank#]> \\
INSERT DATA \{ rank:maxIterations rank:setParam "16" . \} ||
|| Parameter | *Epsilon* ||
|| Predicate | {{[http://www.ontotext.com/owlim/RDFRank#epsilon]}} ||
|| Description | Used to terminate the weighting algorithm early when the total change of all RDF Rank scores has fallen below this value. ||
|| Default | 0.01 ||
|| Example | PREFIX rank: <[http://www.ontotext.com/owlim/RDFRank#]> \\
INSERT DATA \{ rank:epsilon rank:setParam "0.05" . \} ||
h1. Full computation
To trigger the computation of the RDF Rank values for all resources use the following update:
{noformat}
PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
INSERT DATA { _:b1 rank:compute _:b2. }
{noformat}
h1. Incremental updates
The full computation of RDF Rank values for all resources can be relatively expensive. When new resources have been added to the repository after a previous full computation of RDF Rank vales, then either a full re-computation can be done for all resources (see above) or only the RDF Rank values for the new resources can be computed (an incremental update). The following control update:
{noformat}
PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
INSERT DATA {_:b1 rank:computeIncremental "true"}
{noformat}
computes RDF Rank values for those resources, which do not have an associated value, i.e. those that have been added to the repository since the last full RDF Rank computation.
{info}
The incremental computation uses a different algorithm, which is lightweight (in order to be fast), but is not as accurate as the proper ranking algorithm. As a result, ranks assigned by the proper and the lightweight algorithms will diverge slightly from each other.
{info}
h1. Exporting RDF Rank values
The computed weights can be exported to an external file using an update of this form:
{noformat}
PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
INSERT DATA { _:b1 rank:export "/home/user1/rdf_ranks.txt" . }
{noformat}
If the export fails then the update throws an exception and an error message will be recorded in the log file.
Lastly, when using [RDF Priming|GraphDB-SE Experimental Features#RDF Priming], the RDF Rank values can be used as the initial activation values. To set this up, use the following update:
{noformat}
PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
INSERT DATA { _:b1 rank:ranksAsWeights _:b2 . }
{noformat}
h1. Introduction
RDF Rank is an algorithm, which identifies the more important or more popular entities in the repository by examining their interconnectedness. The popularity of entities can then be used to order query results in a similar way to the internet search engines, such as how Google orders search results using PageRank [http://en.wikipedia.org/wiki/PageRank].
The RDF Rank component computes a numerical weighting for all the nodes in the entire RDF graph stored in the repository, including URIs, blank nodes and literals. The weights are floating point numbers with values between 0 and 1 that can be interpreted as a measure of a node's relevance/popularity.
Since the values range from 0 to 1, the weights can be used for sorting a result set (the lexicographical order works fine even if the rank literals are interpreted as plain strings). Here is an example SPARQL query that uses RDF rank for sorting results by their popularity:
{noformat}PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
PREFIX opencyc-en: <http://sw.opencyc.org/2008/06/10/concept/en/>
SELECT * WHERE {
?Person a opencyc-en:Entertainer .
?Person rank:hasRDFRank ?rank .
}
ORDER BY DESC(?rank) LIMIT 100
{noformat}
As seen in the example query, RDF Rank weights are made available via a special system predicate. Triple patterns with the predicate {{[http://www.ontotext.com/owlim/RDFRank#hasRDFRank]}} are handled specially by GraphDB, where the object of the statement pattern is bound to a literal containing the RDF Rank of the subject.
In order to use this mechanism the RDF ranks for the whole repository must be computed in advance. This is done by committing a series of SPARQL updates that use special vocabulary to parameterise the weighting algorithm, followed by an update that triggers the computation itself.
h1. Parameters
|| Parameter | *Maximum iterations* ||
|| Predicate | {{[http://www.ontotext.com/owlim/RDFRank#maxIterations]}} ||
|| Description | Sets the maximum number of iterations of the algorithm over all entities in the repository. ||
|| Default | 20 ||
|| Example | PREFIX rank: <[http://www.ontotext.com/owlim/RDFRank#]> \\
INSERT DATA \{ rank:maxIterations rank:setParam "16" . \} ||
|| Parameter | *Epsilon* ||
|| Predicate | {{[http://www.ontotext.com/owlim/RDFRank#epsilon]}} ||
|| Description | Used to terminate the weighting algorithm early when the total change of all RDF Rank scores has fallen below this value. ||
|| Default | 0.01 ||
|| Example | PREFIX rank: <[http://www.ontotext.com/owlim/RDFRank#]> \\
INSERT DATA \{ rank:epsilon rank:setParam "0.05" . \} ||
h1. Full computation
To trigger the computation of the RDF Rank values for all resources use the following update:
{noformat}
PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
INSERT DATA { _:b1 rank:compute _:b2. }
{noformat}
h1. Incremental updates
The full computation of RDF Rank values for all resources can be relatively expensive. When new resources have been added to the repository after a previous full computation of RDF Rank vales, then either a full re-computation can be done for all resources (see above) or only the RDF Rank values for the new resources can be computed (an incremental update). The following control update:
{noformat}
PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
INSERT DATA {_:b1 rank:computeIncremental "true"}
{noformat}
computes RDF Rank values for those resources, which do not have an associated value, i.e. those that have been added to the repository since the last full RDF Rank computation.
{info}
The incremental computation uses a different algorithm, which is lightweight (in order to be fast), but is not as accurate as the proper ranking algorithm. As a result, ranks assigned by the proper and the lightweight algorithms will diverge slightly from each other.
{info}
h1. Exporting RDF Rank values
The computed weights can be exported to an external file using an update of this form:
{noformat}
PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
INSERT DATA { _:b1 rank:export "/home/user1/rdf_ranks.txt" . }
{noformat}
If the export fails then the update throws an exception and an error message will be recorded in the log file.
Lastly, when using [RDF Priming|GraphDB-SE Experimental Features#RDF Priming], the RDF Rank values can be used as the initial activation values. To set this up, use the following update:
{noformat}
PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
INSERT DATA { _:b1 rank:ranksAsWeights _:b2 . }
{noformat}