There is a need for a full text search within literals in OWLIM that will be automatically up-to-date with the repository data. The existing OWLIM Lucene plug-in lacks incremental building and is not kept up-to-date automatically; this plug-in comes in to fill these gaps while adding some useful features.
To create an index, issue the following SPARQL update query
where <index-options> can be a combination of the following options, separated by a semicolon ';':
Maybe to get started more quickly, these examples will help you understand how to create an index and then execute searches on it.
The following query creates an index on all entities that have an rdfs:labels with @en or no language tag, using an EnglishAnalyzer with snippets enabled.
Now that the index is created, you can run the following query to obtain the top 20 entries that start with "a" with their actual literal and snippet where the literal occurs, including the Lucene score.
Benchmark testing utilized a LUBM-50 dataset (6654856 explicit statements) using default values for test memory and repository configuration. CPU used was Intel(R) Core(TM)2 Duo CPU E6850 @ 3.00GHz.
A: Just like in a normal query. Consider the following example:
The above query joins the union part (with bindings for ?c and ?s) with the lucene part on ?s. Provided that the lucene index contains things of the right classes, i.e. things of type Type1 AND Type2. There are a few noteworthy details though:
If I union up two lucene2 queries (on different lucene2 indices), will the snippets and scoring still work?
A: The short answer is no, because Lucene scores are generated per query, so basically one cannot execute 2 different Lucene query and expect adequate scoring when joining the results. Consider the following example:
While the query above is valid, it is not sane because of the reasons mentioned earlier. The results will be incorrect since different scoring is used for the two queries. Instead of using UNION, you should create a single index for Type1, Type2 and Predicate1 and execute just one query.
A: It is the way Lucene's FastVectorHighlighter generates snippets. For example if the "match" is on an indexed property that is relatively short (such as a report title), then the snippet tends to be less than the entire title even though the title length is less than the requested snippet length. Lucene tends to cut the snippet off chars before the first term match. The getBestFragment method's javadoc is not very helpful in explaining why.
The drop index SPARQL update request throws an error if the index does not exist. Is there any way to ask if the index exists and if so - drop it?
This is where the luc2:list comes in handy. If you have an index named "myIndex", then you can execute the following SPARQL update and get response code 200, even when the index does not exist:
First of all, let's get the terminology right - a term represents a word of text. What you are actually trying to search for is a phrase. In addition, the Lucene2 plug-in supports the Lucene query syntax. This means that if you search for phrases such as "City of Manchester" like this (mind the quotes):
There is a caveat, in order to be able to use " in a literal, you need to use the additional quoting construct - http://www.w3.org/TR/sparql11-query/#QSynLiterals
You will get appropriate results. Indeed, the Analyzers are filtering stop words, but this isn't an issue, since we are using the same analyzer for index and search - the one that was specified during index creation time. More info on how this actually works could be found here http://lucene.apache.org/core/3_6_2/api/core/org/apache/lucene/analysis/package-summary.html in the "Token Position Increments" section.
Yes, you can. Go for it!
Skip to end of metadata Go to start of metadata