- 20120406 Search Discussion
- Benchmark Queries
- FRs of Event
- Starting From Term
- Search Styles
- Other Discussions
- Essence from spec
- Brief Description
- Fundamental Relations
- Explore after Search
Notes from 20120406 Search Discussion in Sofia
Following a request by Jana, we worked out a few of the benchmark queries from [RS Docs^D6 - BM Query Testing.pdf] as FRs:
- Bronze vessels from China
Things "of type" <Vessels> and "using material" <Bronze> and "from place" <China>
- "Jorma, T" in either Acquisition Name or Producer Name
Things "from actor" <Jorma T>
- 18th Century flagons
Things "of type" <Flagon> and "from time" "1800-1899"
Check whether "from time" means "production time"
- Rembrandt works
Things "by" <Rembrandt>
- paintings with keyword "fish"
Things "of type" <Painting> "keyword" "fish"
Martin's doc defines only FRs of Thing. In order to find more specific situations, we can consider FRs of event (being a sub-object of Thing)
- Things exhibited in China by Gallery1
Things "participated in event" "of type <Exhibition> "at place" <China> "carried out by actor" <Gallery1>
As you can see, this is very clumsy and will be hard for the user to understand.
- Things offered at auction organized by Sotheby's
Thing "at event" "Auction" "participted" <Sotheby's>
We could abridge it at the expense of natural language flow
Instead of saying "Event" in the UI, we could use Event Type as connective in the grammar (users would always want a specific event type)
- Use event type, or subproperty linking to event?
- event type:
Thing "was at" "exhibition" "at place" <London> and "was at" "minting" "at place" <Paris>
Things "exhibited" "at place" <London> and "minted at" <Paris>
- event type:
Google has a single box and users like to just start from the term they are searching for, so why can't we do the same?
We tried working backwards (starting from term), then reconstruct the query structure (splice the term in the appropriate place of the query tree) but failed.
Below is a messy capture of that fail. But don't despair, we formulated a limited version of this, see "Thesaurus Search" below.
RS3.3 (or 3.4) will support the following search styles
- Keyword search
- Full-text search over all text of the object, including thesaurus labels in multiple languages (excluding broader terms)
- No autocompletion
- Thesaurus search
- Index all thesaurus values used in each object (including broader hierarchy)
- User starts typing, RS autocompletes across all thesauri and shows an indication of the term type in brackets
- Implicit conjunction between terms
- eg Bronze vessels from China
becomes simply: "China (place)" "bronze (material)" "vessel (object)"
- same idea as Starting From Term but without the complications of trying to figure out a query structure
- TBD: index all relations, or only FRs? Jana thinks FRs, and now Vlado tends to agree
- Complex search
FR search using some "search sentence" structure
- The Keyword and Thesaurus searches will use a single search box
- If the user autocompletes (using down-arrow or the mouse), a Thesaurus term is selected and another box is open to the right
- If the user does not autocomplete, then it's a FTS Keyword.
- If the user enters several Thesaurus terms and a Keyword (the last box), both are used.
So this gives a combined search: Keyword+Thesaurus.
Rationale: if we have 2 searches, we want to be able to use them together (conjunction)
- In addition to terms and keyword, we should allow date-range
"number" or "number-number" means "range of creation date"
eg "China (place)" "bronze (material)" "1200-1600"
- complex FR search allows most precise matching, is most flexible
- but may be too complex for users, and goes against faceting.
Facets present a fixed number of slots, whereas a complex search (and/or) does not
- Fixed ("linearized" form with drop-downs, conjunction-of-disjunctions) or flexible (grammar-based)?
- Decision: RS3 will use a fixed style, in a "wizard" style, i.e. for each FR show the definition on the right
Our offer proposes to use Natural Language Processing for RS4. This could include:
- Automatic semantic annotation from free text (eg "curatorial notes").
Unlike Controlled Natural Language that's controlled by a grammar and the user cannot type free text, here the idea is to extract named entities and other knowledge from free text (our Semantic Annotation and Search group is about half of Ontotext)
- Verbalization, i.e. generate textual descriptions from semantic knowledge, using results from MOLTO.
This could be combined with FTS...
TODO: merge this to Explore after Search
- Given an object and checkbox selection of some of the facets, we can construct a conjunctive search
- Given a complex query, we cannot construct a set of facets
- Another idea: given an object result set, find the people involved with them, then more objects related to the same people (RelFinder)
- RS should not allow a user to formulate an invalid query.
- But preventing the user from formulating an empty (over-constrained) query is much harder, since it'd have to run all possible continuations and see which are empty.
Preventing empty searches is easy in a faceting system because the facets are a fixed number of precomputed slots, but very hard in a flexible query system
- Support different Search Techniques (based on Ontology, using Thesauri, fine filtering using Facets)
- Browsing as well as detailed searching
- Integrated with Tools (eg after search, be able to Annotate)
- Interlinked terminology (co-referenced)
The search function to incorporate:
- FC's and FR's
Vlado: we plan to use Search Tools#Exhibit for this
- Summary results
Vlado: i.e. a result list. Again we plan to use Exhibit which has tabular, pagination, timeline and geographical mapping
- Detailed Results
Vlado: currently investigating generation of the object detailed view with RForms / SHAME
- Access to Annotation
Vlado: will be able to annotate any object or field
- Access to Annotation
- Browse from results to similar or connected
Vlado: will be able to go to related objects. How do we define "similar"
Dominic: At what point is it approach to use or switch to facets. Does the size of the dataset determine this to some extent.
Vlado: FR returns a result set, Exhibit provides faceting over it.
- Exhibit2 supports up to 1k results (same limit as in Europeana) since it uses browser memory
- Exhibit3 supports up to 100k through intermediate server that handles the results. But transferring all these results from RDF may be slow...
FRs are described at CRM Search.
- Dominic: Following our conversation it does not seem to be a big issue adding FRs, since they are only searching parameters, so I assume that if we have more relevant FRs these can simply be added to the list. Please confirm. I am working to the original Doerr matrix within "New Framework"
- Vlado 20120321: I've implemented 2 FRs (Thing "refers to or is about" Place, Thing "from" Place). It took 42 rules and 9 axioms, and about 1 day (plus 1d for the framework around it: I implemented a simple "weaving" script and a syntactic simplification to write one rule per line
For example, the FROM relationship can apply to PLACE, EVENT, TIME, ACTOR and THING. There are a number of others including:
- Has type
- Is part of
- Is similar or same as
- Has met
- From, has founder or has parent
- Is origin of, founder of, or parent of
- Refers to
- Is referred by
These FRs relate to particular concepts and some relationships are only relevant to some concepts so this would have to be reflected in the logic of the search widget.
- Dominic: All I need to understand is the reason for leaving these other relationships out and only including “has type” and “from”. There may be very good reasons for this, but in order to comment further I need to understand your current thinking so that I can adjust my thinking accordingly and that we are all on the same page. I hope that makes sense.
- Vlado: RS3.1 implemented a few FRs to the best of our understanding, because the document "FRs of Thing" was not available.
RS3.3 will implement a lot more FRs, according to Martin's definitions (with corrections), and taking into account the data we have (RKD and BM)
A thought just occurred to me that you may want to have basic relationships and advanced relationships and turn this setting on and off, so that on basic setting you aren’t confronted with the full range of relationships – particularly if they are less relevant. Again, I would need your advice on this as I don’t know if this is sensible or not.
We need to understand how search leads onto browse.
I would envisage a user environment where a user, once they have got their result can either select it and work on it (annotate) and/or then use it as the basis for another search or can decide to use it as the basis for browsing, in which case they should be allowed to alter certain parameters and at some point be allowed to alter the rules (see vocabulary mapping requirements). For example, they may wish to add a vocabulary mapping to see if the results change.
My assumption is that you can also search records by their annotations which have some of their own criteria.
I am writing something to send through but should I keep my comments high level or is there a specific thing that people want me to comment on. Do you want any representative pictures or have we got enough of these.
- A user may wish to search for stone tablets with inscriptions from a particular culture. The search system should allow the user to continually refine the search until the appropriate level of recall is achieved. This may be through a combination of techniques (combining text search, with controlled searching and relationships, perhaps also using a facet UI)
- Question: how does the system allow a user to narrow down their search to a manageable number of results? (akin to the basic search?)
- Once a result set if achieved the user should be able to browse these records and drill into them for further information (from summary to detail).
- The user should then be able to use an result record as an anchor for exploring (advanced search). This may require a different set of relationships. For example, I may have started with a painting but what if I want to ask for connection to other object types. Some of the same criteria that I used may be useful or may not. Time and place and people may be useful but material may not. If I decide to change object type I would want to see other object types that have some relationship to the original result.
- Question: What are the parameters that users can alter to explore different facets of a base record? – akin to advanced search?
- Sometimes I may want to retain the object type but want to explore other aspects like technique, iconography, etc. What are the connections to paintings created by other artists. Sometimes I want to take a particular concept and take it beyond my current domain.
The idea that I had which is represented on page 45 of the functional requirement is that when you had a result that you could use it as an anchor to look for other records with some type of relationship. This involves first narrowing down with the search feature and then broadening out again but along a particular relationship branch. So you might decide to search for Rembrandt paintings, you then get a results set. You select a particular record and then ask to see if there are any relationships to other object types based on time and place, or iconography etc. This would be like the Fact Forge relationship system except you only have one thing and you choose the relationship, rather than having two things and the system looking up the relationships between the two. Once the relationship path has provided some results you may then want to use one of the results to do the same thing and find another connected thing perhaps based on a different relationship path, and so on.
For example, in RelFinder on FactForge you can put in “British Museum” and “British Library” and Fact Forge provides different relationship paths between the two. In my RelFinder relationship system you only have “British museum” and you choose a path, say “has building type” or “is origin of” which might link to other buildings or other things. (You may want to restrict to the same object type or to any object type). The system would then look at those relationships paths and see what is connected. It might find, the British Library, or the National History Museum, both of which had there origins in the British Museum. So, instead of thing1 + thing2 = these relationships, you have thing1 + these relationships = thing1,2,3,..n)
I guess that the interface could be more like a traditional faceted search than the RelFinder representation if we want to reference back to the original document.