Skip to end of metadata
Go to start of metadata

RS-755 : spec
RS-690 : implementation

Introduction

It turns out that while search using Fundamental Relations (FR Implementation-old) is fast, getting the display fields of search results is a major reason for slow Search Performance, because:

  • We need to fetch the fields of all search results (up to a given limit limit: 500 or 1000), in order to pass them to Exhibit for faceting
  • There is quite a number of display fields (see below)
  • Display fields are obtained from several alternative places for the different datasets (RKD and BM)
  • One needs to be very careful what queries to use. There is a description at RS Server Spec#TODO (maybe will move elsewhere)

Conventions

  • Below we use a pseudo-SPARQL notation. The selected object is denoted as ?E, intermediate variables as ?varN, result variables as ?Var
  • We show several (usually 2) alternative paths per property, coming from the different datasets
  • We return strings even for thesaurus terms (not the URIs of these terms) since these can be used directly by the frontend (Exhibit).
    • Below we show this using prefLabel. For RKD this depends on: RS-1040
    • Mitac, do we want to make this change now?
  • All display fields can contain multiple values, except where noted below. So the alternatives should be taken as a union (or)
  • rdf:type is not needed
  • we skip the crm: prefix because it comes out as a smiley in confluence. It must be used in the actual queries

Prefixes

@prefix rso:            <http://www.researchspace.org/ontology/> .
@prefix rst-identifier: <http://www.researchspace.org/thesaurus/identifier/> .
@prefix rst-note:       <http://www.researchspace.org/thesaurus/note/> .
@prefix bm:             <http://collection.britishmuseum.org/id/> .
@prefix bmo:            <http://collection.britishmuseum.org/id/ontology/> .

Display Fields

Field Paths Notes
?Dataset if (?E P50_has_current_keeper bm:the-british-museum or ?E P52_has_current_owner bm:the-british-museum)
then "BM" else "RKD"
Used to select RForm template.
For BM is defined according to FR Implementation-old#FC70_Thing for RS.
For RKD is defined by exception, since the owners/keepers vary between The Metropolitan (New York), Rijksmuseum, National Gallery (London), Mauritshuis
?Type ?E bmo:PX_object_type ?Type or
?E rso:P2_has_object_type ?Type
Object type. Even it may be multi-valued
?PrefID ?E P48_has_preferred_identifier ?id.
{?id rdfs:label ?PrefID} UNION {?id P3_has_note ?PrefID}
Prepended to the title
RS-1362
?UID ?E P1_is_identified_by ?uid1. ?uid1 P2_has_type rst-identifier:nuxeo_uid; P3_has_note ?UID. Service property needed by current implementation. Single-valued
?Title ?E P102_has_title ?title1. ?title1 P2_has_type rst-note:title-primary; P3_has_note ?Title.
or {?E P102_has_title ?title1} LIMIT 1. ?title1 rdfs:label ?Title.
or ?E bmo:PX_physical_description ?Title
RKD has single title-primary.
BM doesn't have indication of primary title, so use the first one (eg using a subquery and LIMIT 1). Many objects (eg coins) don't have a title, so use PX_physical_description.
We shorten to 30 chars but leave the last word whole
RS-1362
?Material ?E P45_consists_of ?material1. ?material1 skos:prefLabel ?Material Material
?Technique ?E P108i_was_produced_by ?prod. ?prod P9_consists_of ?subprod. ?subprod P32_used_general_technique ?technique1.
?technique1 skos:prefLabel ?Technique
Technique
?Creator ?E P108i_was_produced_by ?prod. ?prod P9_consists_of ?subprod. ?subprod P14_carried_out_by ?creator1.
?creator1 skos:prefLabel ?Creator
Produced by
?PlaceCreated ?E P108i_was_produced_by ?prod. ?prod P9_consists_of ?subprod. ?subprod P7_took_place_at ?placeCreated1.
?placeCreated1 skos:prefLabel DISTINCT(?PlaceCreated)
Apply DISTINCT since the same place may be mentioned twice (2 different production processes)
?PlaceFound ?E P12_occurred_in_the_presence_of ?find. ?find a bmo:EX_Discovery.
?find P7_took_place_at ?placeFound1. ?placeFound1 skos:prefLabel ?PlaceFound
Place of Discovery (findspot)
?Places ?placeCreated1 P88i_forms_part_of ?place1 or ?placeFound1 P88i_forms_part_of ?place1.
?place1 skos:prefLabel DISTINCT(?Places)
All super-places of "place created" or "place found" + the places themselves. Used from Exhibit for faceted search.
?DateCreated ?E P108i_was_produced_by ?prod. ?prod P9_consists_of ?subprod. {?subprod P4_has_time-span ?dateCreated1} LIMIT 1.
?dateCreated1 P82_at_some_time_within ?DateCreated
We take only the first ?prod with P4_has_time-span.
This will often return 2 dates "from-to" (P82a and P82b), especially for BM objects.
TODO Jana: check if they are returned in the right order.
TODO Jana, Mitac: how to return them? I suggest separated with a dash
?DateFound ?E P12_occurred_in_the_presence_of ?find. ?find a bmo:EX_Discovery.
?find P4_has_time-span ?dateFound1. ?dateFound1 P82_at_some_time_within ?DateFound
This will often return 2 dates (same questions as above)
?Date if ?DateCreated then ?DateCreated else ?DateFound Used in Exhibit facet and timeline
?Image Use rso:FR_main_representation as defined in FR Implementation#Thing-Image
RS-1539
Single value.
  • RKD returns a Nuxeo GUID while BM returns an external URL. These 2 cases can be distinguished at the frontend
  • ?img rso:P3_has_image_file is the filename, while P1_is_identified_by[P2_has_type=nuxeo_uid]/P3_has_note is a GUID needed to retrieve the image from Nuxeo
?Images Use rso:FR138i_representation as defined in FR Implementation#Thing-Image
RS-1539
All images. This is not used in search results, but in CompleteMO view

Place vs Hierarchical Places

?Place returns the own place name of each place along the hierarchy chain, not a compound name including the super-places.

  • The reason is that the thesauri that we have take care to make the place name unique, eg Paris, Texas vs Paris, France.
  • So there is no need to use the heavy-weight variant of North America> USA> Texas> Paris vs Europe> France> IleDeFrance> Paris
  • BM places: out of 157967 concepts in thesaurusandplace_*.trig there are only 342 duplicate prefLabels:
     grep -h prefLabel th* | perl -pe "s{\s+skos:prefLabel }{}" | sort | uniq -d | wc -l 
  • I've checked one pair and it's legitimate: the 2 terms are in different thesauri: Place vs State.
    • idThes:x117928 skos:inScheme idThes:state; skos:prefLabel "Vatican City".
    • idThes:x40886 skos:inScheme idThes:place; skos:prefLabel "Vatican City".

?Places returns all hierarchical places, eg "Europe|Bulgaria|Sofia"

The whole Places hierarchy is loaded and cached using 

Vlado suggested using the following query if need to retrieve chain of parents for individual place:

Display Fields implementation

API

  • created a new enum DisplayField with names as specified on the above table. This is similar to the older Shortcut enum. The values were preserved where matching between the DisplayField and Shortcut enums
  • added a new method to SearchAPI - frSearchNew() that should be used instead of frSearch()
  • the results are Entities, for which Entity.getDisplayField(DisplayField) should be called. This is similar to the older Entity.getShortcutField(Shortcut), except that it now returns Literal[] instead of Entity[]. As Jana noted, this will cause problems with the Place hierarchy, which is currently implemented via getAncestors(URI)

Implementation details

  • SearchAPIImpl.java
  • created declarative table, A_DISPAY_FIELD; interpretted in loadDisplayFields()
  • when several options are possible, we select several fields via SPARQL, then do the merging in Java
  • e.g. for the ?Title field, we use 3 different vars
    Var SPARQL
    Title1 ?E crm:P102_has_title ?title1. ?title1 crm:P2_has_type rst-note:title-primary; crm:P3_has_note ?Title1.
    Title2 ?E crm:P102_has_title ?title1. ?title1 rdfs:label ?Title2.
    Title3 ?E bmo:PX_physical_description ?Title3
  • for the ?Places field we use 2 different vars:
    Var SPARQL
    Place1 ?E crm:P108i_was_produced_by ?prod. ?prod P9_consists_of ?subprod. ?subprod crm:P7_took_place_at ?placeCreated1. ?placeCreated1 crm:P88i_forms_part_of ?place1. ?place1 skos:prefLabel ?Places1.
    Place2 ?E crm:P12_occurred_in_the_presence_of ?find. ?find a bmo:EX_Discovery. ?placeFound1 crm:P88i_forms_part_of ?place1. ?place1 skos:prefLabel ?Places2.
    Places3 ?E crm:P108i_was_produced_by ?prod. ?prod P9_consists_of ?subprod. ?subprod crm:P7_took_place_at ?Places3.
    Places4 ?E crm:P12_occurred_in_the_presence_of ?find. ?find a bmo:EX_Discovery. ?find crm:P7_took_place_at ?Places4.
  • the results are combined via two methods:
    • trySeveral(), checks several fields and takes the 1st non-null value, e.g. trySeveral("Title1", "Title2", "Title3") will check if "Title1" has non-empty array of values and take the first one, or else check if "Title2" has non-empty array of values and take the 1st one, etc.
    • combineSeveral(), adds all values for each field, then takes only the unique ones, e.g. combineSeveral("Place1", "Place2") will take all values for Place1, then all values for Place2, then combine them, then take only the unique ones.

Update 14 Nov, 2012 (discussion Mitac/Jana)

  • getDisplayField() will load the field if not loaded already
  • ?Places will be returned in a hierarchical manner, for Exhibit - all Ancestors will be encoded in the Literal, separated with "|", e.g. "Europe|Bulgaria|Sofia"
  • there will be a separate method Entity.ngetImages() which will return all BM images and the main Rembrandt image

Print Display Fields

The display fields are printed in the following order, on one line.
If a field is missing, the corresponding label/punctuation is skipped.
If all fields in the line is missing, the common (prefix) label is skipped.

  • ?Type ": " ?Title ". "
  • "Created: " ?Creator ", " ?PlaceCreated ", " ?DateCreated ". "
  • "Material: " ?Material ", "
  • "Technique: " ?Technique ", "
  • "Found: " ?PlaceFound ", " ?DateFound ". "
  • ?Image: display as image (thumbnail)

Display Field Facets

The following fields are used in Exhibit facets:

  • ?Type
  • ?Material
  • ?Technique
  • ?Creator
  • ?Places
  • ?Date
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Jul 25, 2013

    In future parallel processing of searching may be needed.? Need to understand this.