- Implemented Enhancements
- Mixing Properties
- Creation vs Activity
- Event vs Period
- Combinatorial Explosion
- Part Made Of
- FR Names and Scope Notes
- Search Result Cutoff
- Autocomplete Performance
- FTS Leaks
- Unexpected FR Consequences
- Cannot Reproduce, Invalid, Undecided Issues
- Planned Improvements
- Owner vs Keeper
- By, Met Actor
- Nationality and Ethnic Group
- About (Depicted, Refers to)
- From Place/Findspot
- Thing present at Event or from Period
- About Event
- Preferred FRs
- Keyword-Date-Identifier Search
- Identifier Regexp
- Objects with Images
- Better Autocomplete Ranking
- Hierarchical Place Facet
- Don't loop over place hierarchy for About
- About vs Keyword
- Direct Object Load
- Potential Enhancements
- Dimension Search
- Grammar-Based Search
- Show Matched Sub-FR in Results
- Association Codes Sub-FRs
- Searchable Object Kinds
- Search Scope
- Other Future Enhancements
- TODO Austin's Assessment
- TODO Johathan's feedback
- TODO Re-collect from Jira
Compared to the original FORTH definition, FR Implementation has made many fixes, which often result in simpler network diagrams.
But after initial implementation of FRs, we saw various problems and omissions with the FRs themselves.
Because of the complex nature of this topic, all further discussion and decisions should be in this page.
The rest of the page is broken by sections that reflect the status of enhancements. As planning decisions are made, enhancements will be moved between sections.
New BM COL search tool: http://www.britishmuseum.org/system_pages/beta_collection_introduction/beta_collection_search_results.aspx
Compare RS search features to COL search. Dominic: I think I prefer the features in the RS search!
Notes on specific aspects:
- Objects with images -> (to be done)
- Ethnic Group -> by Actor (done)
- School Of -> influenced by Actor (to be done)
- Escapement Type -> is/has/about type (done)
is not sufficient to catch all cases because it doesn't allow mixed properties, so it should be reformulated like this:
allowing mixed iterations of these properties.
There are many cases like this.
ObjectCreation – (P9B.forms_part_of) (0,n) -> ObjectCreation
The latter should be E7.Activity otherwise you won't catch this:
BM has use cases "Thing was produced in Period/Culture and in Political State", which are mapped to:
P10 can be used, since Production is a period (Production<Modification<Activity<Event<Period),
and the MatCult/PoliticalState terms are also Periods.
I think this should fall in the FR "Thing has met/is from/was present at Event" (uses P12i_was_present_at and P9i_forms_part_of, and has range E5_Event) which should be generalized to "Thing is from Event/Period".
For this, we have to examine/change several classes/properties:
- range: I think all these are legitimate: "Thing refers to or is about Late Christian Period", "Thing from Late Christian Period", "Thing destroyed in Late Christian Period", "Thing created in Late Christian Period", "Thing modified in Late Christian Period", "Thing used in Late Christian Period", so half of section "Thing-Event" of FRthing.docx should be changed to speak of E4.Period instead of E5.Event
- the specific properties (eg created) are all sub-properties of P12, which requires an E5.Event not mere E4.Period.
But by following the hierarchy (P9i), we can transition from E5 to E4
Here's the property path: P12-(E5)
P9i(E4). Note: P108 implies P12 since P108<P31<P12
- P9i_forms_part_of or P10_falls_within?
Various FRs chase the event part hierarchy (P9i) but not the "falls within" hierarchy (P10).
For the most part this is appropriate since P10 "does not imply any logical connection between the two periods"
But in this BM case P10 sounds more appropriate to me than P9i: Production "falls within" Late Christianity, but Late Christianity hardly "consists" of production events of Christian objects
- So overall, I think that the definition "Thing from Event"
should be changed to "Thing from Event/Period"
Should be fixed now
Subjects (eg RKD Keywords) and Iconography (eg RKD IconClass) often cannot be interpreted as Person, Place, Event because they represent ideas. Thus they are often interpreted as "types" (concepts).
- So I included P67_refers_to to FR2_has_type.
- I also included P128_carries: Physical Thing carries a Conceptual Thing that P67_refers_to a concept.
FR names and scope notes are now controlled by meta-data at FR Names Table (rdfs:label and rdfs:comment respectively).
After edited in confluence, the metadata is regenerated with Emacs then saved to thesaurus-meta.ttl.
We cut off the search results at 500, because of Exhibit limitations, and since a user isn't likely to explore more objects in the list.
But there's a bug that maybe we cut off earlier:
- bibliography references (bibo:Document) not marked as skos:Concept: Josh bug
Resolved by marking them as skos:Concept, and cutting off at bibo:Document
- images shared between objects: this is expected
Resolved by cutting off at FC70_Object
- In the original definition, "Thing from Actor" includes P52_current_owner only. We added P50_has_current_keeper
P50_has_current_keeper doesn't appear in any FR at present, so "current keeper is Ancient Eygpt and Sudan department" cannot be formulated.
- "Owner" is not originally defined as a FR. Questions:
- Should it be only P52_has_current_owner (title holder)?
If so we don't need a new FR, just to include P52 in the [FR Names Table] with hasRange=E39_Actor.
- Or maybe we should also include P50_has_current_keeper (custody holder)?
- How about the historic properties: P51_has_former_or_current_owner respectively P49_has_former_or_current_keeper?
- How about historic events: E8_Acquisition and P22_transferred_title_to?
One of the "met" clauses comes in this way.
In good data a historic event should also be expressed as P51_has_former_or_current_owner, but this is not yet done in BM Association Mapping, because BM codes don't always make it clear whether the other party was owner or not
- If we include events, how about fancy loops like in "met" in RS-1000@jira?
- Should it be only P52_has_current_owner (title holder)?
- FR "owner/keeper": P52, P50
- FR "all owners/keepers": P51, P49, and the events E8_Acquisition, E10_Transfer_of_Custody
What's the difference between 'met' and 'by'
I am not sure that the 'from' FR is quite right in terms of balancing recall and precision. In partiuclar that it returns objects from people born at a place together with where a thing was created. Rembrandt was born in Leiden and therefore if you say give me objects from leiden you get all of rembrandts works which seems wrong. We need to tease this FR out to have one specifically for objects created by someone from x . It may be that sojme FRs are not as aggregated as others (if at all) but should be orientated towards what users would expect..
- "by": produced, produced part, made inscription, modified
DONE: "met" as a very broad relation to Agent: met Actor in the same event, or was involved in acquisition or custody, or was owner/keeper. It includes:
- "all owners/keepers"
- TODO: "influenced/motivated" is not yet included
"By Actor" should find things made by a given Nationality or Ethnic Group.
We mapped it to
Eg here's a query to find "Italian" objects:
|object||created||transfered title from|
|CEM313033||Marchese Giovanni Pietro Campana|
We remapped it to rso:hasDomain E39_Actor. Here's a query to find such productions.
|production||ethname||ethnic group||other examples (tested)|
|object/EOC122081/production/7||idThes:x84612||"Papua New Guinean"||EOC111302, EOC117041 , EOC118123|
- How to find coins related to Augustus? Coins whose production is Motivated By Emperor Commodus?
- Production Motivated by
(Similarly, Emperor Commodus motivated the production of CGR307142)
- How can I search for "in the Manner/style of :: Constable, John"? (That's an English painter of special interest to Yale)
- Association code "AL: Manner/Style of" is mapped to Influenced By
- There are no FRs involving P17_was_motivated_by, even in the full version (FORTH TR429-2012)
- Its superproperty P15_was_influenced_by is used only in "Event by Actor", but we're doing only FRs of Thing
- Maps to BM Association Mapping#Influenced By & BM Association Mapping#Production Authority
- new FR "influenced/motivated". Note: this will also include "Motivated" because it's a subproperty
DONE: FR Implementation#Thing influenced-motivated by Actor- FR15_influenced_by
- Example: Depicted (Portrait of)
- Aboutness (P62_depicts, P67_refers_to, P129_is_about, P138_represents).
- We got "Thing refers to or is about Place", but "Thing refers to Actor" is not yet done
- Define FR67_refers_to_actor based on p44 of TR429
- Cross-check against FR67_refers_to_or_is_about
- Roughly maps to BM Association Mapping#Associated Person
- new FR "about actor"
Thing with findspot "Qasr Ibrim Nubia" cannot be found. The data is:
Problem: the Find event for an object is not a standard CRM class. BMO defines an extension class bmo:EX_Discovery. The original definition used something made up (C2.Finding) which is not defined anywhere:
- Thing from Place- FR7_from_place: "without C2.Finding which is undefined"
- Thing found or acquired at Place- WONTDO: "C2.Finding is not defined"
- FR "found at"
DONE: FR Enhancements#Thing found at Place- FR12_found_at
- FR "from": exclude "birth place of the creator" and "residence of the group". Include Findspot
- FR "created at"
- FR "was/is located in": add moved/acquired/found
Material Culture is mapped to crm:E4_Period and should be available through FR "present at" (FR12_was_present_at) having scope note
"Thing was present at (has met, is from) event/period"
The "BM Event" thesaurus includes named historic events, eg 'Capture of Olinda de Phernambuco'.
It should be searchable with FR67_about_event and FR12_was_present_at, but currently is not.
The reason is that it hasDomain=E7_Activity, but should be E4_Period (a superclass thereof)
DONE: two FRs:
- about event (eg historic event)
- present at Event (eg Exhibition) or from Period (eg Etruscan)
We need the most common relationships to be selected by default in the FR search.
This is currently done for Agent only: 'by' is preferred to 'met'.
But we want a more general solution driven by extra metadata in the FR Names Table:
- The integer hasOrder determines the order of FRs in the dropdown
- hasOrder=1 is selected by default (except for the 4 special FRs), see next section)
There are 3 FRs that allow text entry (rso:hasRange=rdfs:Literal): date, identifier and keyword
- When the user enters a text query in:
- The top-right search box
- Or in the search sentence (and does not autocomplete)
- Trim spaces from the start & end of the query
- Do smart regexp matching of the text query:
- Allow switching between these 3 cases
- Use a dropdown including "keyword", "identified by" and "date interval"
- Select the matched alternative by default
- Allow the user to override by selecting "keyword" or "identified by" (but don't allow "date interval" unless it matches the Date Regexp)
- "keyword" is at the bottom of all FR dropdowns. If the user selects "keyword", the dropdown changes to the same 3 alternatives
Should we remove the Objects/Images/Forums dropdown from top-right? It doesn't work
- we can't catch ids including spaces (all grcatno and some otherid examples), without risking to misinterpret multiword keyword queries as id queries
- we want the query to include at least one digit/punctuation, and then maybe some letters
- if the query has only digits, should be >=5 digits (4 digits is a year).
This query regexp implements the above considerations:
Note: *? is a non-eager qualifier that causes the regexp to match in only one way (reduces backtracking)
|assetid||BM Asset Id||69989; 687142|
|bigno||BM Big Number||EA99; EA10898; EA10558,14|
|cmcatno||BM Coins & Medals Number||BA1p222.1023; BC.1067; EO8p330.1134; MI1p21.76N; MI1p36.154B|
|codexid||BM Codex Id||3494578|
|grcatno||BM Greece & Roman Number||Bronze 716; CIL VI 8467; Vase A793.3; Sculpture C159; CIL VI 166 = 30706; CIL XV 6350, 66; CIL VI 2602*; Waywell 1978 368; Old Catalogue No. 123|
|otherid||BM Other number||BS.10058; BS.4799.c; BS.6639a; Hay 468; I.Brit.Mus.Gr. I App:7text edition...; Kropp M; P. Ramesseum 13; Sheet.1; S.1272Sheet.10; Sallier 4Sheet.9; Sheet.1P. Ramesseum 3|
|prn||BM Public Reference Number||PPA44216; RRM192762; YCA66158|
|regno||BM Registration number||S.6223; S.2534; .49; .18192.a; .10466.19; 1977,1105.53; 1920,1115.1.2; 1979,0108.59.A; 1878,1217.123-140; Oc1946,1027.5|
Sources used for the samples above:
- GLAM Wiki: BM Refs
- BM: Help using the Museum number and provenance search
- manual_assertions.ttl, search for thesIdentifier
- get identifier types:
- get all identifiers, so I can extract regexes out of them
- Keyword search doesn't really work for searching identifiers because of the way the FTS index is setup: it breaks words on punctuation, not only on space.
- PRN (eg CGR307294) is found fine using keyword search
- regno (eg 2008,1008.931) was mistaken for a date search (should be fixed now)
- otherid (eg 74.3.7/1) cannot be found by keyword search
That's why we have a specific FR "identified by", which uses exact match, not FTS query
Ability to restrict search to objects that have an image.
- UI: checkbox in front of the search sentence:
- Dominic: approved (Jana thinks it doesn't fit quite well into the search sentence)
- Search name: if the flag is set, append " (with images)" at the end
- Add handling of special FR rso:FR138i_has_representation in SearchObjects method
(FRSearchRestriction with frUri=rso:FR138i_has_representation and value='true')
- If requested, add subquery exists(?Images) where ?Images is defined in Search Result Fields
(note it is different for BM vs RKD, so a UNION query is needed)
- add checkbox "with images" (this is the name of rso:FR138i_has_representation)
- if checked, add rso:FR138i_has_representation to the SearchObjects API call
- add rso:FR138i_has_representation in Search Serialization (for history) and restore from JSON
Try to reuse current FRSearchRestriction, by adding frUri=rso:FR138i_has_representation and value true (false=no such FR)
- if present, add "(with images)" at the end of the search name
The hierarchical place facet in Exhibit sometimes shows parent-child places (eg Easter Island-Rano Raraku) as siblings
Currently FR67_refers_to_or_is_about loops down the place hierarchy (over P89i_contains). This makes for some unintuitive results:
- EPF112795 is a picture of a lake that depicts Europe -> inferred to be about Europe>Bulgaria>Banya Bulgaria
- PPA361578 represents Oceania -> inferred to be about Oceania>Polynesia>Easter Island>Rano Raraku
See the discussion in Jira, I think About should not loop over the place hierarchy, neither up (co-variant) nor down (contra-variant)
RS-1486 Pending decision by Dominic
- If I search using "about" I get 191 hits
- If I search using keyword I get about 94 hits
- "About" loops down the place hierarchy so it incorporates many indirect hits. But I now think it shouldn't, see prev section.
- FTS indexes all text fields, not just controlled terms. Eg "Easter Island" is included in object/EPF109333, although there's no link to the term, since these words are mentioned in bmo:PX_curatorial_comment
- FTS walks the object graph, so indirect paths also contribute to FTS, eg
object - P128_carries -> object/image/1 - P138_represents -> Place
- FTS DOES NOT include broader terms nor altLabels: we thought there's already too many words in the index, and it's not specific enough.
This means "Easter Island" is not automatically included in an object that mentions "Rano Raraku".
Dominic, please comment:
- Should we include broader terms in FTS?
- Should we include altLabels in FTS?
Use case: I sometimes know the object that I want to load into the tool and going through search simply to get to it is not very efficient. It would be useful to have a quick object number mechanism for loading a record into a tool.
Vlado: this is a combination of:
- Search FR "Identified by", and use the top-right search box for [Keyword-Date-Identifier Search] asa convenience
- "if single result, jump into it instead of showing a singleton result list".
Jana: removing the intermediate step (Exhibit) is >2p/d and seems to be lower business value
- This is useful for all searches, not only Identifier search
- An Identifier search doesn't necessarily return 1 object (eg regno may be repeated/reused)
These enahncements may be implemented in the future, or maybe never.
- how to specify the 3 components of a dimension: type, unit, number
- does it matter what the dimension applies to (not all are at the Object level)
- conversions between units
- range search
FRs collapse the richness of CRM relations into a few select alternatives. Although those are complex widely-reaching networks, they reflect FORTH's understanding about what is interesting.
The current "search sentence" UI allows only simple searches: AND of ORs, regarding Things.
- FRs are not composable, so one can only combine at the root (Thing). One cannot use "Composite FRs", eg
things Exhibited by the Metropolitan
things Exhibited in Sofia
things Auctioned in Antwerp
- There are various bugs caused by the pseudo-smartness of this UI, eg
A search based on Controlled Natural Language (grammar-based) will resolve these problems.
Because the FR is an aggregation it would be useful to identify the matched Sub-FR (actual relationship) in the search results.
So "by Ruffo, Don Antonio". The FR "By" would in the case refer to the fact that Roffo commissioned the painting (rather than painted it or soemthing else). In the summary results this would be great if this was identified clearly. The result would have under it "comissioned by Ruffo, Don Antonio)". I would then be good if you could use this to see what other works were comissioned by Rufus using the result as the search link.
The association codes would be useful for incorporation into the search system.
As per BM Association Mapping, assoc codes are mapped to P2_has_type, either of:
- association reification of a specific CRM property, or of
- specific CRM event type (less often).
In BM COL, assoc codes can be used as a facet over a selected entity (eg person):
For RS the first usage would be to refine the FRs to a hierarchy. Partial example:
- This is related to Show Matched Sub-FR in Results since it would also require some extension of the FR system to expose more than "aggregations of FRs"
- Also, it would require to build the hierarchical structure above: currently association codes are in a flat skos:inScheme, but we need to put them in separate schemes, or create skos:broader hierarchy. The hierarchy needs to be matched to the appropriate CRM property.
TODO Vlado: I wrote a longish email to Josh & Dominic around Dec 2012, need to dig it out
We currently search only for Things (museum objects) and small application objects like annotations & bookmarks.
- Should we be able to search for people, places, events (such as Exhibitions), what else?
- Should we have a unified search, or per-object-kind searches?
- The answer should be substantiated with consideration what data we have about them.
- In RKD we don't have enough data about these things to create separate searches for them
- Although Auctions/Exhibitions are not independent objects, we could have a search by sub-object, eg:
search for Event in the object's lifetime, with given characteristics (Type, Date, Actor..)
- In BM thesauri there's plenty of data about people (eg life dates, field of activity, short bio), institutions, places.
So BM authorities provide better scope for this
- Lists of Objects relating to the thing (eg the inverse of "Thing from Place") could be interesting
- Adding other search kinds will complicate the search, display of results, and general UI handling
- Per-kind search could be accommodated by the FR framework: you first select the FC (Thing, Actor, etc), which determines the applicable FRs.
- Different object kinds have different fields, so result display and faceting of mixed result lists would be a major challenge
- Even unifying the Search Result Fields of RKD and BM museum objects was no small feat
- I agree that ResearchSpace will need to deal with dufferent content types. These might be items that are contained in authority files (people for example).
- It will also need to produce results for other records that have associated metadata including archive records. This may be determined by the URI. So an object as a id/object/ but an archive record may be /id/archive. This would demore a different representation if the user clicks on the result, but this would also be true of different object types that have different metadata (a coin as opposed to a painting). The RKD may have an object record collection.rkd.org/id/object/12344 but we may need to use /id/object/2d/12344 and/or /id/object/archive/23903 which we would know would be different record representations from the RKD.
- We may need to associate different RForm representations with different domain names and ID types.
Vlado: yes, we use different RKD and [BM RForm].
Dominic: Search options should allow a user to change the datasets that are being searched: all available datasets, or selected datasets including the project dataset. Eg:
- All data
- This project's data
- RKD data
- BM data
Defaults can also be set.
Named Graphs (G in the quad <S,P,O,G>): we have not yet committed what we will use them for, but they can be used for only one purpose.
- Josh uses one G per object, so he can easily implement data updates (delete old object graph; then reinsert whole object).
- We could strip Josh's per-object G and use a fixed G on import (RKD vs BM): a non-trivial task
- Inference cannot be limited per graph. Given a coreferenced term, I cannot define an FR to select or unselect named graphs.
Nested (Virtual) Repositories (as implemented in OWLIM 5.3) is a better choice if we want to limit reasoning to subsets. The administrator would set up several repositories as combinations of real repositories, and the switch will select amongst them.
Post Filtering: is the simplest solution if we're happy to limit to subsets of objects only:
- mark each object root somehow (eg RKD objects are rso:E22_Museum_Object, and BM objects P50_has_current_keeper the-british-museum)
- add an implicit search clause to look for the marker corresponding to the selected scope
- one benefit is that the filtering can be more granular and dynamic, eg
- "Coins and Medals Department": implement as P50_has_current_keeper <thesauri/department/C>
The original spec speaks about Project Dataspace vs Shared Dataspace, not about Dataset (i.e. origin of the data).
If by "Scope" we mean Data origin, then we'd have these settings:
- All data
- British Museum data
- Rembrandt data
If by "Scope" we mean Project Dataspace, then
- there will be only 2 settings, called eg:
- This project's data
- This project and common data
- But RS3 does not yet have multiple projects, and that's a big task involving Security and Groups
Reply to all of these
I followed more or less the test cases on Confluence for Coreferencing searching on RS3.5.
- Auto-complete and suggestions are great - but there are so many that it is difficult to figure out which to choose. - this feature is helpful especially for the Dutch names but will be frustrating for some users - it would be great if there were an easy way to differntiate immediately between People-Places-Things-etc. Maybe colour codes? or icons?
Vlado: The kind of term and thesaurus is seen in the notation, eg [BM Place]
- For example if I type in "Hague" - a list of possible people named Hague comes up, with one Place (BM place/RKD place). When I select "Hague, The" - the search stays at "by Hague, The" rather than "met, Hague, The". When I change the search to "met, Hague, The" - I get no results. I only get results with "keyword".
- I admit the "met" search took time to figure out - and we will have to find a way of making this simple for users!
- Also the co-references of the Thesaurus is not always easy - and Rembrandt data is only really searchable with Rembrandt terminology - if I search for keyword "oak" this does not come up with anything as their term is "panel (oak)"
Vlado: Yep, RS currently has only 1 person and 15 major cities that are coreferenced manually
- Timeline view only works for RKD data? on BM data I either get a "no results" or "working screen"...
Vlado: For a long time BM data didn't have proper XSD dates in the appropriate properties. This should work now
There was a very good email by Jonathan (I think), forwarded by Dominic. To be added here, broken into subsections.
Asked amongst other things: "how do you search by bibliography"
Do another pass through search issues posted by Dominic, move to subtasks of RS-1321@jira, make more subsections