View Source

h2. Overview

*Boolean* is a full-text search paradigm, providing a tool for searching in the document content, as well as in the document features and structure. It allows you to combine words using Logical Operators, such as _AND_ & _OR,_ to limit or widen your search. By looking for a given word that appears in the document content, you can retrieve a particular document or documents. This works like a classical search engine locating strings of characters in your set of documents.

On top of that, you can search for some of the additional information, available for each document in your document set. This additional information is called metadata and consists of different document features such as the source of the document, the date it was created, etc.

h2.Searching in the body of the document

The *Body content* text box is the place where you enter the keyword(s) that define your search. The *Keyword* paradigm finds these words in the body of your document set.

(/) Have in mind that it only searches for whole words or the exact letters you have typed. For example, if you type "elect", you will get all documents in which the verb "elect" appears. However, if you type "{*}elect{*}", you will get all documents that contain words such as "{*}Elect{*}ronics", "{*}elect{*}ricals", "{*}elect{*}ed", "{*}elect{*}ricity", etc.

!_Images for reuse^task.png!
*Find all documents that talk about a particular term, for example "petrol OR electric".*

!_Images for reuse^to_do.png!
Type "petrol OR electric" in the *Body content* text box.

!_Images for reuse^KeywordSearch_body.png|border=1!

h2.Searching in the features of the document

The *Feature* box is the place where you can select search criteria, based on the document features, i.e. the additional information available for the documents. These features can vary depending on the domain and the set of documents. In this case, as the corpus of documents consists of international news, some of the features are: the subject and author(s) of the news article; some key entities/key phrases that are selected as characteristic of the article; the title/subtitle; the original URL, where the article can be found; etc.

The *Feature value* text box is where you type the value of you document feature.

!_Images for reuse^task.png!
*Find all documents where a particular term appears in their title, for example the term "cars".*

!_Images for reuse^to_do.png!
Select TITLE as your document feature from the *Feature* box and type "cars" in the *Feature value* box.

!_Images for reuse^KeywordSearch_feature.png|border=1!

h2.Searching in the body and the features of the document

You can narrow your query even further by searching in both the document features and content.

!_Images for reuse^task.png!
*Find all documents where a term and a document feature appear together, for example documents about the "cars" that have the word _"hybrid"_ as KEYENTITY.*

!_Images for reuse^to_do.png!
Type "cars" in the *Body content* text box, choose KEYENTITIES from the *Feature* drop down list, and type "hybrid" in the *Feature value* text box.

!_Images for reuse^KeywordSearch_body_feat.png|border=1!\\

The retrieved results are displayed in the [*Document Query Result*|Document Query Result#_Document Query Result] screen.\\ \\

(/) There is a little peculiarity when dealing with URL-representing features. As the URL does not get split into its consisting tokens within the KIM internal index, users should always end the search phrase/word with an asterisk ('*') sign.

For example, if you want to find documents, which URLs contain the word "pirelli" and there is a record in the database such as {{http://some.news.site/ft-pirelli-10-aug-2001.html}}, the correct query to find this entry should be {{pirelli*}} and {color:red}NOT{color} {{pirelli}}.