Skip to end of metadata
Go to start of metadata

Other aspects of semantic search

Search vs Faceting vs Display

Notes from Vlado

  • We want the search to catch wide, so it should traverse the thesauri hierarchy upward (eg PS7T includes super-places)
  • We can use the search relations to also return the facets, rather than writing further complex SPARQLs to fetch them.
    Exhibit doesn't have a concept of "hierarchical facet" (confirmed by Jana), so we need to return to it all values (eg using PS7T not just PS7).
  • Display data of each returned object should be small and specific.
    Should traverse a small subset of the search relations, and not traverse the thesaurus.
    Eg the Author of Susanna is Rembrandt (created part1: the painting), but not Willem de Vries (who created part2: the frame)

Design update:

  • We selected Exhibit for this because it provides faceting of the search results and display of results.
  • Facets include: object type, title, creator, year, creation place, technique, material. These are computed with FRs
  • Facets are multi-valued, eg Material includes both part1 (painting support) and part2 (frame)
  • Facets must include all hierarchical thesaurus values, so eg a painting in Amsterdam is also counted in Netherlands
  • Display (list/gallery view) should show include fewer fields, and single-values. Fields are described in:
    [Semantic Search Spec#Semantic Search screen] https://jira.ontotext.com/secure/EditComment!default.jspa?id=45465&commentId=70459
  • Display fields are unified with Facets. We use the facet (FR value) where available or custom display fields created with a separate SPARQL

Facet Value Deduplication

Consider the following cases re a painting. The FR rso:PS7T_took_place_at will infer duplicate facts, but the desired counts are:

Case Inferred Desired Counts
created in Amsterdam, current location is Amsterdam Twice Amsterdam, twice Netherlands 1 Amsterdam, 1 Netherlands
created in Amsterdam, current location is The Hague (both in Netherlands) Twice Netherlands 1 Amsterdam, 1 The Hague, 1 Netherlands

(Mariana was doubtful whether one painting should be counted in both Amsterdam and The Hague. But that is correct since PS7T is a wider relation, and doesn't carry the meaning of "a physical object can currently be in only one place")

Therefore we must eliminate duplicate facet values. This could happen in:

  • SPARQL answer. I doubt it: I think that SPARQL can return two equal triples
  • Java result-set filtering by putting it in a Java dictionary
  • JSON result-set "squashing" since it's loaded in a JavaScript dictionary

Resolution (Mitac):

  • it is not possible to have two copies of the same statement in OWLIM. The statement exists only once and there is internal data about whether it is explicit or inferred. This applies even in the case where the statement is inferred and also added explicitly
  • it might be possible for literal values if these are explicitly added
  • in any case the "select distinct" in SPARQL can take care of this
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.