Skip to end of metadata
Go to start of metadata

Intro

The ordering of autocomplete results (search terms) is very important, since the user won't/can't scroll past the first 50 or 100.
Currently they are returned in alphabetical order.

  • Actually it's a complex ordering function that prefers terms where the query appears at the start of prefLabel
  • (While the autocomplete returns all results where the query appears at the start of any word in any label)

Problem:

  • A small village "Amsesomewhere" will come before "Amsterdam", even if the village does not appear anywhere in the data.
  • that makes it hard to find the British Museum, since there are many other terms starting with "British"

It's better to order by popularity/importance (i.e. how widely used a term is).

RDFRank

The plan is to use RDFRank, a unique OWLIM feature, which works like this:

  • Follows links and increases the score of visited nodes
  • Diminishes score increment with every iteration (damping)
  • Never decreases scores
  • Iterates up to maxIterations, or when the net change of an iteration is less than some epsilon
  • At the end normalizes all scores to 0..1

Compute RDFRank

RS-298

  • compute RDFRank. This is an expensive operation done once as part of Repository Creation.
    PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
    INSERT DATA {[] rank:compute []}
    
  • tell OWLIM FTS use RDFRank to boost the score (relevance) of Lucene search results. Allowed values are 'no' (default), 'yes' and 'squared' (use the square of RDFRank).
    PREFIX luc: <http://www.ontotext.com/owlim/lucene#> 
    INSERT DATA { luc:useRDFRank luc:setParam "squared" . }
    
    • The default is "no" and the new syntax is INSERT not ASK
  • rebuild the Lucene index
    PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
    INSERT DATA { luc:thesIndex luc:createIndex "true" . }
    

Incremental Update of RDFRank

RS-1462
It would be good to recompute term rank when data is updated:

  • thesaurus terms are used in Data/Image Annotation
  • tags are attached/detached to an object (see Tags Spec).
    Note: tags are also thesaurus terms, so below we talk only about "terms"

Computing the rank incrementally is hard:

  • rank:computeIncremental computes the rank only of nodes that don't have any RDFRank, i.e. new terms. It cannot be used to update the rank of terms that are attached/detached
  • rank:computeIncremental performance:
    • Mitac is concerned this may be slow and may block the system. If so, needs to be run nightly.
    • Vlado thinks the scope of this update is small, so it shouldn't be slow
    • The performance of rank:computeIncremental (and whether it blocks the system) needs to be timed
  • If we use luc:score and not rank:hasRDFRank (see below) then we need to recreate the FTS index for the affected terms only.

How important is this?

  • Updating the rank is not critical because these few new statements will have little effect on the overall rank, compared to the quite bigger number of statements from imported data.
  • Computing the rank of new terms is probably critical, because they won't be returned by the query using rank:hasRDFRank. This needs to be done when adding a tag to rs-tag

Use RDFRank

There are two methods to use RDFRank:

  • Direct: use rank:hasRDFRank
  • Indirect: use luc:score, which itself is boosted by rank:hasRDFRank
SELECT * {
  ?term luc:thesIndex "lond*".
  ?term rank:hasRDFRank ?rank.  # DIRECT
  # ?term luc:score ?score.  # INDIRECT
} ORDER BY DESC(?score) LIMIT 100

We currently use the Direct method because Indirect has the following complications:

  • doesn't work with a wildcard query (eg "amst*")
    OWLIM-1079
    We need to change multiTermRewriteMethod of QueryParser:
  • needs to update the FTS index incrementally after Incremental Update of RDFRank
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.