View Source

{jira:RS-755}: spec
{jira:RS-690}: implementation
{toc}

h1. Introduction

It turns out that while search using Fundamental Relations ([FR Implementation-old|FR Implementation-old]) is fast, getting the display fields of search results is a major reason for slow [Search Performance], because:
- We need to fetch the fields of all search results (up to a given limit limit: 500 or 1000), in order to pass them to Exhibit for faceting
- There is quite a number of display fields (see below)
- Display fields are obtained from several alternative places for the different datasets (RKD and BM)
- One needs to be very careful what queries to use. There is a description at [RS Server Spec#TODO] (maybe will move elsewhere)

h2. Conventions

- Below we use a pseudo-SPARQL notation. The selected object is denoted as ?E, intermediate variables as ?varN, result variables as ?Var
- We show several (usually 2) alternative paths per property, coming from the different datasets
- We return strings even for thesaurus terms (not the URIs of these terms) since these can be used directly by the frontend (Exhibit).
-- Below we show this using prefLabel. For RKD this depends on: {jira:RS-1040}
-- (?) Mitac, do we want to make this change now?
- All display fields can contain multiple values, except where noted below. So the alternatives should be taken as a union (or)
- rdf:type is *not* needed
- we skip the crm: prefix because it comes out as a smiley in confluence. It *must* be used in the actual queries

h2. Prefixes
{noformat}
@prefix rso: <http://www.researchspace.org/ontology/> .
@prefix rst-identifier: <http://www.researchspace.org/thesaurus/identifier/> .
@prefix rst-note: <http://www.researchspace.org/thesaurus/note/> .
@prefix bm: <http://collection.britishmuseum.org/id/> .
@prefix bmo: <http://collection.britishmuseum.org/id/ontology/> .
{noformat}

h1. Display Fields

| *Field* | *Paths* | *Notes* |
| ?Dataset | if (?E P50_has_current_keeper bm:the-british-museum or ?E P52_has_current_owner bm:the-british-museum) \\
then "BM" else "RKD" | Used to select RForm template.\\
For BM is defined according to [FR Implementation-old#FC70_Thing for RS|FR Implementation-old#FC70_Thing for RS].\\
For RKD is defined by exception, since the owners/keepers vary between The Metropolitan (New York), Rijksmuseum, National Gallery (London), Mauritshuis |
| ?Type | ?E bmo:PX_object_type ?Type or \\
?E rso:P2_has_object_type ?Type | Object type. Even it may be multi-valued |
| ?PrefID | ?E P48_has_preferred_identifier ?id. \\
{?id rdfs:label ?PrefID} UNION {?id P3_has_note ?PrefID} | Prepended to the title
{jira:RS-1362} |
| ?UID | ?E P1_is_identified_by ?uid1. ?uid1 P2_has_type rst-identifier:nuxeo_uid; P3_has_note ?UID. | Service property needed by current implementation. Single-valued |
| ?Title | ?E P102_has_title ?title1. ?title1 P2_has_type rst-note:title-primary; P3_has_note ?Title.\\
or {?E P102_has_title ?title1} LIMIT 1. ?title1 rdfs:label ?Title.\\
or ?E bmo:PX_physical_description ?Title | RKD has single title-primary.\\
BM doesn't have indication of primary title, so use the first one (eg using a subquery and LIMIT 1). Many objects (eg coins) don't have a title, so use PX_physical_description.\\
We shorten to 30 chars but leave the last word whole
{jira:RS-1362} |
| ?Material | ?E P45_consists_of ?material1. ?material1 skos:prefLabel ?Material | Material |
| ?Technique | ?E P108i_was_produced_by ?prod. ?prod P9_consists_of ?subprod. ?subprod P32_used_general_technique ?technique1. \\
?technique1 skos:prefLabel ?Technique | Technique |
| ?Creator | ?E P108i_was_produced_by ?prod. ?prod P9_consists_of ?subprod. ?subprod P14_carried_out_by ?creator1. \\
?creator1 skos:prefLabel ?Creator | Produced by |
| ?PlaceCreated | ?E P108i_was_produced_by ?prod. ?prod P9_consists_of ?subprod. ?subprod P7_took_place_at ?placeCreated1. \\
?placeCreated1 skos:prefLabel DISTINCT(?PlaceCreated) | Apply DISTINCT since the same place may be mentioned twice (2 different production processes) |
| ?PlaceFound | ?E P12_occurred_in_the_presence_of ?find. ?find a bmo:EX_Discovery. \\
?find P7_took_place_at ?placeFound1. ?placeFound1 skos:prefLabel ?PlaceFound | Place of Discovery (findspot) |
| ?Places | ?placeCreated1 P88i_forms_part_of ?place1 or ?placeFound1 P88i_forms_part_of ?place1. \\
?place1 skos:prefLabel DISTINCT(?Places) | All super-places of "place created" or "place found" + the places themselves. Used from Exhibit for faceted search. |
| ?DateCreated | ?E P108i_was_produced_by ?prod. ?prod P9_consists_of ?subprod. {?subprod P4_has_time-span ?dateCreated1} LIMIT 1. \\
?dateCreated1 P82_at_some_time_within ?DateCreated | We take only the first ?prod with P4_has_time-span.\\
(!) This will often return 2 dates "from-to" (P82a and P82b), especially for BM objects.\\
TODO Jana: check if they are returned in the right order. \\
TODO Jana, Mitac: how to return them? I suggest separated with a dash |
| ?DateFound | ?E P12_occurred_in_the_presence_of ?find. ?find a bmo:EX_Discovery.\\
?find P4_has_time-span ?dateFound1. ?dateFound1 P82_at_some_time_within ?DateFound | (!) This will often return 2 dates (same questions as above) |
| ?Date | if ?DateCreated then ?DateCreated else ?DateFound | Used in Exhibit facet and timeline |
| ?Image | Use rso:FR_main_representation as defined in [FR Implementation#Thing-Image|FR Implementation#Thing-Image]
{jira:RS-1539} | Single value.
- RKD returns a Nuxeo GUID while BM returns an external URL. These 2 cases can be distinguished at the frontend
- ?img rso:P3_has_image_file is the filename, while P1_is_identified_by\[P2_has_type=nuxeo_uid\]/P3_has_note is a GUID needed to retrieve the image from Nuxeo |
| ?Images | Use rso:FR138i_representation as defined in [FR Implementation#Thing-Image|FR Implementation#Thing-Image]
{jira:RS-1539} | All images. This is *not* used in search results, but in CompleteMO view |

h2. Place vs Hierarchical Places

?Place returns the own place name of each place along the hierarchy chain, not a compound name including the super-places.
- The reason is that the thesauri that we have take care to make the place name unique, eg Paris, Texas vs Paris, France.
- So there is no need to use the heavy-weight variant of North America> USA> Texas> Paris vs Europe> France> IleDeFrance> Paris
- BM places: out of 157967 concepts in thesaurusandplace_*.trig there are only 342 duplicate prefLabels:
{noformat} grep -h prefLabel th* | perl -pe "s{\s+skos:prefLabel }{}" | sort | uniq -d | wc -l {noformat}
- I've checked one pair and it's legitimate: the 2 terms are in different thesauri: Place vs State.
-- idThes:x117928 skos:inScheme idThes:state; skos:prefLabel "Vatican City".
-- idThes:x40886 skos:inScheme idThes:place; skos:prefLabel "Vatican City".

?Places returns all hierarchical places, eg "Europe\|Bulgaria\|Sofia"

The whole Places hierarchy is loaded and cached using&nbsp;

{code}
select ?x ?y where {?x a crm:E53_Place; skos:broader ?y. }
{code}
Vlado suggested using the following query if need to retrieve chain of parents for individual place:

{code}
SELECT ?label ?subTopicOf {
  {?x crm:P88i_forms_part_of [skos:prefLabel ?label; skos:broader [skos:prefLabel ?subTopicOf]]}
  UNION
  {?x skos:prefLabel ?label; skos:broader [skos:prefLabel ?subTopicOf]}
  FILTER (?x in (<http://collection.britishmuseum.org/id/place/x45461>,
                 <http://collection.britishmuseum.org/id/place/x66969>))
}
{code}

h2. Display Fields implementation


h3. API

- created a new enum DisplayField with names as specified on the above table. This is similar to the older Shortcut enum. The values were preserved where matching between the DisplayField and Shortcut enums
- added a new method to SearchAPI - frSearchNew() that should be used instead of frSearch()
- the results are Entities, for which Entity.getDisplayField(DisplayField) should be called. This is similar to the older Entity.getShortcutField(Shortcut), except that it now returns Literal\[\] instead of Entity\[\]. As Jana noted, this will cause problems with the Place hierarchy, which is currently implemented via getAncestors(URI)

h3. Implementation details

- SearchAPIImpl.java
- created declarative table, A_DISPAY_FIELD; interpretted in loadDisplayFields()
- when several options are possible, we select several fields via SPARQL, then do the merging in Java
- e.g. for the ?Title field, we use 3 different vars
|| Var || SPARQL ||
| Title1 | ?E crm:P102_has_title ?title1. ?title1 crm:P2_has_type rst-note:title-primary; crm:P3_has_note ?Title1. |
| Title2 | ?E crm:P102_has_title ?title1. ?title1 rdfs:label ?Title2. |
| Title3 | ?E bmo:PX_physical_description ?Title3 |
- for the ?Places field we use 2 different vars:
|| Var || SPARQL ||
| Place1 | ?E crm:P108i_was_produced_by ?prod. ?prod P9_consists_of ?subprod. ?subprod crm:P7_took_place_at ?placeCreated1. ?placeCreated1 crm:P88i_forms_part_of ?place1. ?place1 skos:prefLabel ?Places1. |
| Place2 | ?E crm:P12_occurred_in_the_presence_of ?find. ?find a bmo:EX_Discovery. ?placeFound1 crm:P88i_forms_part_of ?place1. ?place1 skos:prefLabel ?Places2. |
| Places3 | ?E crm:P108i_was_produced_by ?prod. ?prod P9_consists_of ?subprod. ?subprod crm:P7_took_place_at ?Places3. |
| Places4 | ?E crm:P12_occurred_in_the_presence_of ?find. ?find a bmo:EX_Discovery. ?find crm:P7_took_place_at ?Places4. |

- the results are combined via two methods:
-- trySeveral(), checks several fields and takes the 1st non-null value, e.g. trySeveral("Title1", "Title2", "Title3") will check if "Title1" has non-empty array of values and take the first one, or else check if "Title2" has non-empty array of values and take the 1st one, etc.
-- combineSeveral(), adds all values for each field, then takes only the unique ones, e.g. combineSeveral("Place1", "Place2") will take all values for Place1, then all values for Place2, then combine them, then take only the unique ones.

*Update 14 Nov, 2012* (discussion Mitac/Jana)
- getDisplayField() will load the field if not loaded already
- ?Places will be returned in a hierarchical manner, for Exhibit - all Ancestors will be encoded in the Literal, separated with "\|", e.g. "Europe\|Bulgaria\|Sofia"
- there will be a separate method Entity.ngetImages() which will return all BM images and the main Rembrandt image

h2. Print Display Fields

The display fields are printed in the following order, on one line.
If a field is missing, the corresponding label/punctuation is skipped.
If all fields in the line is missing, the common (prefix) label is skipped.
- ?Type ": " ?Title ". "
- "Created: " ?Creator ", " ?PlaceCreated ", " ?DateCreated ". "
- "Material: " ?Material ", "
- "Technique: " ?Technique ", "
- "Found: " ?PlaceFound ", " ?DateFound ". "
- ?Image: display as image (thumbnail)

h2. Display Field Facets

The following fields are used in Exhibit facets:
- ?Type
- ?Material
- ?Technique
- ?Creator
- ?Places
- ?Date