Skip to end of metadata
Go to start of metadata
Name Size Creator Creation Date Comment  
Microsoft Word Document ResearchSpace Search (201204) 0.2-V... 8.18 MB Vladimir Alexiev Apr 24, 2012 14:33  

ResearchSpace Semantic Search Specification

 

Date

Ver

Who

What

2012-04-08

0.1

Dominic Oldman

Created

2012-04-21

0.1-VA

Vladimir Alexiev

Commented

2012-04-20

0.2

Dominic Oldman

Extended

2012-04-24

0.2-VA

Vladimir Alexiev

Commented

 

 

1               Introduction               2

2               Adherence to Client Business Requirements               2

3               Standard Search               4

3.1               Search Sentence - Overview               4

3.2               Standard Search – details               5

4               Advanced Explore and Advanced Search               9

Appendix 1 - Example Queries               9

Appendix 2 - Authority control in Merlin (Peter Main 19/08/2009)               10

Appendix 3 – Search Sentence Process               12

Appendix 4 – Timeline and Geographical mapping               12


1                 Introduction

 

Search is described at pages  29, c38, 43 – 46, and page 92 of the ResearchSpace requirements & Specification v2 document. This describes the tool at a high level but includes the following;

 

Ontology Search

Right, FR search

- The ontology search would allow searching by CRM ontology terms. (The project is using the Fundamental Concepts and Relationships devised by Martin Doerr)

 

Taxonomy searching

Right, Thesaurus search

- Co-reference

Out of RS3 scope since there are no planned tasks to match thesauri.

Various matching tools and RDFized thesauri are available for this. See comment in App2

of terms using by the different organizations submitting data.

 

Related Results - The search environment should suggest different avenues of exploration by showing related results using different categories

 

Precision Searching

Same as FR search?

- concentrate on searching material on a particular object or subject and therefore narrow down the potential results from the outset.

 

External relationships - The potential relationships to be discovered should extend to data sources outside the ResearchSpace sphere

 

Keyword Searching

Right, FTS search

-

 

Free text searching of literals

Our FTS index also indexes (multilingual) labels of thesauri values

 

2                 Adherence to Client Business Requirements

Roger that J

3                

 

( https://docs.google.com/a/researchspace.org/viewer?a=v&pid=sites&srcid=cmVzZWFyY2hzcGFjZS5vcmd8cmVzZWFyY2hzcGFjZXxneDo3OTJiM2U1OGQxNmRhNDIz )

Please refer to ResearchSpace Business Requirements & Specification, Date: May 2010, Version: 2. These elements are part of client acceptance testing.

 

Relevant Business rules 

 

Business Rule 2 : Established Data – Existing data and assets, for example accessible organisational collection data and associated images, should be uploaded to the shared ResearchSpace repository and be available to all projects. Some multimedia assets will require some access restrictions and this reflects the practical reality that some media assets are more likely to be subject to conditions and restrictions. However, ResearchSpace would always encourage free and unfettered access to cultural heritage data.

 

Business Rule 3: Open System Standards the ResearchSpace tools should not be dependent onproprietary APIs. It should be possible to take any tool developed for use with open standard RDFdata and incorporate that tool into ResearchSpace without substantial redevelopment. Similarly, tools developed specifically for ResearchSpace should be useable with little or no modification outside the ResearchSpace environment. This business rule ensures the open nature of ResearchSpace and keeps the RDF research tool model simple and accessible.

 

Business Rule 5 : Ontology Standard - In order to obtain a good level of data harmonisation and therefore allow effective exploration of data supplied from different sources, the ontology standard for all data will be the CIDOC Conceptual Reference Model (CRM).

 

Business Rule 6: Use Cases – The main ResearchSpace elements of data, collaboration, analysis, and web publication should be both integrated but also available as separate functions to encourage a wide range of project and related use.

 

Business Rule 7: Open Source – New software created will be released to the community freely as open source. Existing open source tools should, if possible, be utilised for ResearchSpace development.


4                 Standard Search

 

The ResearchSpace system will need to provide different search mechanisms for different users. Some will require very simple search interfaces and some will require more advanced and precise searching.  Standard search will utilise the Fundamental Relationships and categories concept developed at FORTH. These can be used in many different implementations and UI presentations depending upon context.

 

This document refers to a part of the Stage 3 search interfaces which should will be the default search mechanism used by a general user. The system is referred to as search sentence but at this stage this does not represent a grammatical rules based search system (and the complexities that this would entail) but rather refers the horizontal ‘sentence like’ nature of the presentation.  

 

4.1         Search Sentence - Overview

 

1.      The user will conduct a keyword search that uses an index across text and controlled terms

The FTS search looks at all words no matter where they occur (including thesauri!).

The Thesaurus search auto-completes against all thesauri and is more restrictive: the term must occur in the object; and in a FR of the object.

So we can combine the two searches, but they still use different indexes and very different mechanisms.

2.     

3.      The results will be displayed with the search sentence available for the user to refine the results if they wish.

4.      The first item will be the term typed into the keyword search but this could be replaced.

5.      Search term boxes are auto-complete and in selecting a term the thesaurus should be displayed in a box next to the term (similar to a mouse over event) with information about the term. This help method will be applied to other drop down items like boolean operators and relationship terms.

6.      Terms can be refined using the AND, OR (and any other allowed Boolean operators) as well as  WHICH – a special keyword for using relationship connectors.

7.      Boolean operators should have their own scope note box which provides and example of usage. The WHICH keyword should also have an appropriate scope note box.

8.      WHICH - refers the use of a fundamental relationship in the next entry box.

9.      Only fundamental relationships and thesauri terms that apply (based on the operators and relationships used at the entry position) should be available.

10. If a fundamental relationship is specified then only valid terms can be auto-completed in the next box.

11. When selecting a fundamental relationship this too should have a scope note to explain how it is applied and the types that it can be applied to.

12. The system should allow for further configuration

Given the complexity of FRs and the need to reload the repository when OWLIM rules are added, I'd say "further development"

13. of relationships and operators.

 

 

 

 

 

 

 



 

 

4.2         Standard Search – details

 

N

Function

phase

Comment (VA)

1

The design should conform to the BVA design above and this documents adds to features in the design. Where a feature is in the design but not this document it should be assumed and added. For any differences please call the author.

RS3.4

ok

2

The search term builds horizontally from drop down boxes as each entry is made

RS3.4

How about year search, using numbers "from-to"?

3

It should be possible to edit the boxes and re-submit searches.

RS3.4

Propose to have edit only for kewords/terms. FR clauses will be deleted & reinserted

4

It should be possible to delete boxes or add new ones and re-submit searches.

RS3.4

Delete will work on complete FR clauses (FR+terms); or keywords/terms (if used without FR).

If we adopt flexible Grammar, deleting/adding will be posisble only at the end

5

An entry can be either a free text search or a controlled search depending upon whether a controlled term is fully selected.

For example a user could input “cotton” AND ‘cotton’ to search both free text and material controlled term

RS3.4

Bad example: the keyword would also find the term, so the term is sueprfluous. Better example: "cotton and Canvas [Material]" where the first is a keyword, the second a term.

6

Terminology items will be identified by the thesaurus they originate from e.g.

Glass (object)

Glass (material)

RS3.4

Ok. thesaurus Type should be always on-screen, to indicate this is a Term search, and different from Keyword search

7

All dropdown items will have scope notes in boxes that appear on a mouse over (tooltip) .

·                   Terminology – will have the thesauri title and the scope note

·                   Boolean operators - will have a definition and an example

·                   WHICH – will have a definition and an example

·                   Relationships will provide examples based on the predicates that they are aggregating

RS3.4

·                   Term: if we have additional details (e.g. rdfs:comment): there aren't any in RKD thesauri.

·                   Boolean: is it really necessary to explain what AND/OR mean?

·                   WHICH: I still don't understand what it is for, and imho nobody has explained it clearly. Why can't the user just select the FR? I think it's a parasitic word and not needed

·       FR: And most of all to define it (e.g. the definition of Thing From Place is several sentences!)

8

Intelligent Sentences. The search sentence will only provide relationships that apply to the subject that has just been entered.

?

Let's discuss. Eg if I enter term "The Hague [Place]", what should happen? Should it prepend a box with relations "any" (default), "from place", "about place"??

9

It should be possible to add a thumbnail image to scope note to illustrate the term. Therefore it should be possible in some later configuration to associate representative images to thesauri terms

RS4

No images in RKD/BM thesauri, AFAIK

10

When the term is selected the next box in the sentence appears

RS3.4

Yes, there's no need for the user to enter AND since that's the only top-level connective

11

A controlled term or free text word is always followed by a box to “refine“

??

"Refine" what? Describe in detail how this would work

12

A WHICH is always followed by a “define” which has a drop down of relationships supported by that part of the search sentence

 

In RS3 all FRs will be about Thing so all apply all the time. I don't see the need for WHICH

13

Only authority files / thesauri that are support by the relationship should be available on the next autocomplete box

RS3.4

Ok. Each FR should know its range: Thesaurus Type(s) or Date Range

14

If there is a Boolean limitation for OR’s then only controlled terms from the same thesaurus as the connected term will be available for auto-complete

RS3.4

Yes: in OR the FR stays the same

15

The Boolean AND can be used with terminology from any Authority

RS3.4

After AND the user  enters a keyword, term, or selects a new FR that determines the thesaurus Types of the next term

16

The search settings will allow a user to change the datasets that are being searched. A user can select all available datasets, selected datasets including the project dataset. Defaults can also be set. E.g.:

Select All

Select this project

Rembrandt

Cranach

etc

RS4

The plan is to address multi-project and security features in RS3.5.

But we need to define "data set" since the spec talks about graphs (data spaces) and there are only 2 from the viewpoint of a given project: shared and project space.

Need to decide what we use Named Graphs for.

Need to define what happens when data is annotated then changed inside a project

17

The search settings should be able to include external semantic sources. (Another Museum Endpoint, Dbpedia…)

Note: The relationship mapper functionality will probably need to incorporate external taxonomies

RS4

Out of RS3 scope, but IMHO very important for RS4 since we want to leverage available resources: Wikipedia, LOD (including DBpedia, FreeBase), and anything else found through Google.

E.g. SEMLIB (see Related projects) allows Annotation based on either internal thesauri, or external sources like FreeBase, and leverages a FB search API

18

The absence of an object type denotes any object

RS3.4

Clear (and no need to say it)

19

Search results should be displayed with an image and summary metadata (Currently, Title, maker/artist, date, location, material, technique ).

 

 

RS3.4

Need to define "Display Fields":

·       Universally across CIDOC objects (in RSO extension ontology)

·       As specialized (subset) FRs

·       including Main Image

20

Display Fields should be configurable: select from the main metadata fields and we should be able to draw from a larger list.

A dropdown with the field possibilities should be available and provide multi-selection.

You should also be able to turn all metadata text off. See also map and timeline functionality.

RS4

What other fields do you contemplate beoynd the ones mentioned above?

Then let's see how we can define them universally across CIDOC objects

21

It should be possible to filter results with facets. (Note that design only presents core concept facets).  We should be able to make use of other controlled authorities (material, culture, etc – see the annex 2 – when available). An example is the Finnish site at p.62 of the Business Requirements ( www.museosuomi.fi )

RS3.5?

Exhibit handles the faceting.

Do we need to transfer selected facets into the Search (i.e. a Refine Search operation), forming an AND/OR FR search?

Will facets coincide with FRs?

22

It should be possible to select items in the summary results pages for inclusion in the data basket (copy & link)

RS3.4

Link is clear. But what's Copy?

23

It should be possible to select a summary result list and click through to see the object ( full details ) . (Note: we currently only have the data annotation tool that does this and we will need to implement a non research tool detailed results page with a similar design)

RS3.4

(Not sure what you mean: after Search you can also see the object details)

24

Configuration should be available to change the relationship terms to other more meaningful terms (masks)

What do you mean by terms/masks?

.

For example: “From” could refer to relationships that are not obvious to the user.

??

Do we want to create divergence by letting people give their own "translations" of FRs? I think this goes against CIDOC's desire for standardization/ unification.

If we show them the scope notes, they'll quickly learn the FRs.

25

Saving the Search should provide a dialogue box that allows the search to be named and for a description for the search.

RS3.4/5

Need to define ontology for searches mechanism to save them in RDF, and be able to put them in Basket

26

Note on Time line and GeoMap Functions: These features will need to be fully specified as applications in their own right and additional specifications will be required. Only features relevant to search functionality are described in this document.

For search these plug-ins should simply be seen as a view replacing the standard summary thumbnail review but with some additional contextual user controls.

RS3.1

 

RS4

Timeline: available since RS3.1.

 

GeoMap: requires a mapping from RKD/BM Places to GeoNames, or another mechanism to provide coordinates (see App4 for discussion)

27

The user should be able to switch to a timeline or map view (and been

??

these views) which provides a visualisation but has the same functionality as the standard results screen, e.g. selection, facet filtering, presentation of metadata fields etc.

The exhibit plug-in simply replaces the thumbnail view but all other features are retained with some additional exhibit controls.

RS3.1

Yes, Exhibit accomplishes this

28

The plotted results should have the same summary information as the standard results screen and click provide a popup box with the thumbnail and additional selection functions as in the main results screen. (The popup is effectively providing the same as the thumbnail results for adding to the data basket etc ) . Clicking on a result plotted on a timeline or map should provide result in a view of the objects details as a popup similar to the thumbnail view on the standard search screen. 

RS3.1

ok. Please reword to remove redundancy

29

Timeline results Mapping: Should allow the user to change the time line scale (days, months, weeks, years – on two levels

RS3.1

Yes, Exhibit accomplishes this

30

Time line and Map data representation: The object should be represented by a suitable marker similar to that in Google Maps.

 

RS3.4

 

31

Text next to the marker should be configurable by the user and default to the object title. A dropdown with the field possibilities should be available and provide multi-selection.

It should be possible to turn text on and off.              

RS3.5?

Jana?

32

Geo Results Mapping: Should allow zooming into particular parts off the map.

RS4

 

33

Individual results can be saved to the data basket as they can on the main results screen.

RS3.4

 

34

Explore Refin e – The search sentence should allow the user to change the parameters to alter the results. This should include the ability to;

·                   Replace terms with different ones, for example replacing Rembrandt with Flinck

·                   Add additional search criteria, for example add Flinck so that the search is for Rembrandt OR Flinck or Rembrandt AND Flinck.

·                   Remove criteria

RS3.4

Please merge with 3,4

35

Use a result as a basis for exploring. For example a result could be used as a root for finding other items based on relationships with different metadata fields.

For example TODO: The results may have found various objects some of which were created by a particular person

RS3.5?

(Called "Related Results" in the intro).

I think this is very similar to "Refine Search" described above, but uses a single object instead of the selected facets.

Should we use all facet values of the object, or only the lowest-level, or what?

36

Results are paginated and the user must use a paging system. (Presented in rows with paging for large result sets.)

In the Time Line and Map views all results are plotted but the user can change the scales or specify a limited result. (Top 50 / 100 etc)

RS 3.1, RS 4

RS3 uses Exhibit2 for result-set presentation and supports the first 1k results.

RS4 can leverage Exhibit3 (yet another server) to support 100k results.

Various views are supported (rows, thumbnails= lighbox).

5                 Advanced Explore and Advanced Search

To be specified in another document

Appendix 1 - Example Queries

 

Catalogue type questions

Dominic, please fix the numbering and references below, then I'll show how these would be implemented in FR/term searches

 

1.      China vessels made of bronze

2.      Vessels made of bronze from china

3.      Bronze vessels from china

1.      Same as 4. , but excluding objects which are prints (2 objects)

2.      ceramic objects made between 200 BC and 100 AD - (13517 objects)

3.      Topographic representations of Greece (894 objects)

4.      Either topographic representations of Greece, or objects where Greece is mentioned in an inscription

5.      4.9 As for 8., but restricted to Greece specifically (i.e. excluding any narrower terms of Greece this time) (19 objects)

 

Conservation type questions

NONE of these are covered by FRs since Conservation is not Fundamental (in CIDOC's understanding ;-). Same as the "Exhibited at/on" example: we'd need Event FRs and FR composition

 

1.      Which objects have both been conserved and scientifically examined?

2.      Which bronze or brass objects have been conserved in the last year?

3.      Which brands of cleaner have been used to treat marble?

4.      What is the chronological sequence of treatments applied to this object?

5.      What medieval brooches have been examined scientifically since 1980?

 


Appendix 2 - Authority control in Merlin (Peter Main 19/08/2009)

 

The Merlin catalogue record

We have good knowledge of the BM thesauri, since SSL implemented them and we checked them out while working on th offer. [ BM Thesauri ] describes 11 thesauri having 62k terms (45 in Places).

I think BM should strongly consider an effort to match BM Thesauri to more widely used ones. E.g. see [ Thesauri Tools#Cultural Heritage LOD ] concerning places:

- Getty TGN has 89k, RKD has 22k, Rijksmuseum has 11k

- GeoNames has 8M (but these are modern)

- then you have efforts like Lexicon of Greek Names, Google Places, Pleiades…

contains many fields where data entry is controlled by a thesaurus or by a flat authority file. The thesauri are also used to control searching using the hierarchies (e.g. searching for “vessel” will find records where only “cup” is mentioned). In some cases,  more than one field is controlled by the same authority (e.g. Place of manufacture, Findspot and Associated place are all controlled by one Place thesaurus). This document describes what we have in place at present. This does not mean that we are not looking for improvements in the longer term. For example, the limitation of two broader terms in the Place thesaurus we find irksome, and we would prefer it to be polyhierarchical. We would also wish to make use of latitude/longitude information which we do not currently have

- GeoNames has that.

- Location uncertainty of ancient places is non-trivial (e.g. we could know from a historic source that place Z is "between X and Y", but not exactly where)

- See "Integration of Coordinate Information in CIDOC CRM": Gerald Hiebel, Øyvind Eide and Mark Fichtner, CRM SIG mlist 11/7/2011

.

 

Standard Thesauri

 

The thesauri in use are as follows:

 

·       Object type (e.g. pin, cup)

·       Material (e.g. paper, stone)

·       Technique of manufacture (e.g. carved, incised)

·       Material Culture/Period (e.g. 13 th dynasty, Late Minoan)

·       Ware (specialised thesaurus for pottery, e.g. Black Glaze Ware, Samian)

·       School (used for artworks, e.g. Italian, Aesthetic Movement)

·       Escapement type (specialist thesaurus for clocks and watches)

·       Subject (e.g. animal, acupuncture)

·       Ethnic Name (e.g. Aztec, Yoruba)

 

The thesaurus structure is standard, but does not use all fields available in BS standards. We use:

·       term

·       term discriminator

·       broader term(s)

·       related term(s)

·       use-for terms

·       display term

·       scope note

·       whether the term has been authorised

The thesauri are polyhierarchical (i.e. they allow multiple broader terms)

 


Place thesaurus

 

This thesaurus is for geographical places. Its structure is the same as the standard thesauri except that:

 

only two broader terms (up to one each of modern and archaic types) are allowed

there are two additional fields: Place name type (i.e. modern or archaic) and Place Type (i.e. one of a series of codes distinguishing continents, countries, villages and the like)

 

 

Flat authorities

 

As well as a large number of simple drop-down lists, there are two important flat authorities:

Biographical: Used to record information about individuals and institutions. This includes the following fields:

 

·       Name(s), and for each one title, name type, date range name was in use

·       Display name

·       Life date(s)

·       Private address

·       Public address

·       Gender

·       Nationality

·       Profession

·       School (for artists)

·       Biography

·       Bibliography

·       Copyright details

·       Whether the term has been authorised

There is no hierarchy associated with biographical records.

 

Bibliographical

 

To record information about publications. The does not conform to any of the library standards, and is quite simplistic in comparison.

 

It contains the following fields:

 

·       Citation

·       Title

·       Author/Editor(s)

·       Collective title

·       Series title

·       Place of Publication

·       Date of Publication

·       Journal

·       Publisher

Appendix 3 – Search Sentence Process

You describe a combination of all 3 searches (Keyword, Term and FR).

- But is there benefit in combining simple (Keyword+ Term) with complex (FR)?

- Will the simple always come first?

- what's the meaning of e.g.

The Hague [Place] AND fromPlace The Hague
(Term AND FR)?
- if you want to combine all 3, we need a more detailed description of the interaction

 

Appendix 4 – Timeline and Geographical mapping

DO: In the business document these components are apps in their own right (and I assume should be part of a separate spec ant some point) but timeline and geographical representations are built into the search result screen. To what extent should these be described  as part of the search specification. Are these, in this specification just different ways of showing results without much user interaction but would these also form part of the more developed timeline and geo tools?

 

VA: I think we should start a separate spec for Result Set Handling.

·                   Timeline: we do it based on time of creation, which matches the FR "Thing From Event à has_time-span à at_some_time_within". But more needs to be done to accommodate dates specified to different level of precision, see date vs gYearMonth vs gYear

·                   Geographic: cannot do it before we have geomapped Place thesauri (see comments in App3)

For RS4 we should think about: permalinks for timeline/geographic arrangements, saving them to Basket, Annotating (image annotation over a geographic map!)

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.