Skip to end of metadata
Go to start of metadata
You are viewing an old version of this page. View the current version. Compare with Current  |   View Page History

Problem & Tasks

  • The query is slow for 3-letter queries, because we use prefix search and there are many matches
  • Kostadinov to increase the UI typing timeout, so if the user starts typing "London", NO query for "Lon" will be made
  • Mitac proposes a cache, Vlado is against such complications
  • Mitac to time the Lucene query alone, without the additional thesaurus restrictions

Alternative Approaches

Approximate Instead of Prefix Search

The autocomplete at is fast and usable.
It doesn't use prefix queries (eg "asp" doesn't find "aspirin") but finds misspellings (eg "aspiri" finds "aspirin"). Vlado asked Kosyo:

  • it uses editing (Levensthein) distance, which is a Lucene query option. It allows up to 3 misspelt chars with a Lucene "approximate" query like:
  • it ranks matches using TF-IDF ranking
  • it uses the Forest autocomplete module

Vlado asked Dominic whether it's necessary to use prefix queries: should "Rem" find "Rembrandt", or is it sufficient for "Rembrand" and "Rembrant" and similar mis-spellings to find it

While prefix search on the first word is not so necessary, it's necessary on subsequent words.

  • Eg on
    1. "aspiri" shows various matches for aspirin, including "litecoat aspirin"
    2. "aspirin li" shows nothing
    3. user has to type all the way to "aspirin litecoa" to get again "litecoat aspirin"
  • If we don't do prefix search, then for "british museum"
    1. "british" will show "british museum"
    2. "british m" will show nothing
    3. user has to type all the way to "british museu" to get again "british museum"

Maybe the best approach is to combine the two: use approximate search for the first word, and prefix for the rest, eg:

britis~3 m*

will find "british museum" and also "british mint", etc.

SOLR Instead of Embedded Lucene

The autocomplete of an AZ project by our LifeSci group is prefix and very fast.
Vlado asked Dancho: it uses an external SOLR index, not the Lucene index embeded in OWLIM.


Data as of Sep 03, 2012. Timing is in ms

Query #Results Owlim4 Local Owlim4 Remote Owlim5 Local Owlim5 Remote
oil p 27 906 1752 1113 425
rem 130 2143 10753 2721 1508
fieren 1 7 138 25 30
Vries 18 22 188 60 41
Poorter 3 10 91 17 29
Metropolitan 29 97 319 174 122
Lon 1151 2666 20075 4740 5146
London 448 1937 20606 2832 4426
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.