View Source

{excerpt}Imprecision in E60 Number and E61 Time Primitive{excerpt}
{toc}
h1. Vladimir's questions
Sent to CRM SIG [mailto:crm-sig@ics.forth.gr]
Very often in the museum domain measurements are imprecise, so dimensions must be expressed as an interval.

h3. Imprecise Dimension

E54 Dimension says "The properties of the class E54 Dimension allow for expressing the numerical approximation of the values of an instance of E54 Dimension".
My understanding is that can only happen through: E54 Dimension. P90 has value: E60 Number
E60 Number says "... including *intervals* of these values to express *limited precision*".

h3. Time Spans

Regarding time spans, CIDOC CRM allows imprecision to be expressed in two ways:

h4. Imprecise Duration

E52 Time-Span. P83 had at least duration. E54 Dimension
E52 Time-Span. P84 had at most duration. E54 Dimension

IMHO this pair of properties is unnecessary, since:
- E54 Dimension already accomodates (or should accommodate) imprecision, see 1
- If we have this pair, then shouldn't we also split P43 has dimension in two (has minimum dimension, has maximum dimension)?
- The pair allows "P91 has unit" of the two Dimensions to differ, which I think is unnecessary
("between 1 and 2 cm" is used often, but who'd say "between 1 cm and 1 meter"?)

h4. Imprecise Start/End

As depicted in the CRM Tutorial (online at [slide27@crmt]) two properties allow to express the Outer & Inner bounds of a Time-Span:
E52 Time-Span. P81 ongoing throughout: E61 Time Primitive (outer bound)
E52 Time-Span. P82 at some time within: E61 Time Primitive (inner bound)

Each of the bounds has start/end. This is confirmed by the spec:
E61 Time Primitive says "... interval logic to express *date ranges*"

h3. RDFS/OWL implementations

Let's see what the current RDFS/OWL implementations of CIDOC CRM offer
(neither one allows E54 Dimension to express a numerical approximation, i.e. item 1):

h4. OWL2 DL proposal

[http://bloody-byte.net/rdf/cidoc-crm/core_5.0.1.rdf]

{code}
<owl:DatatypeProperty rdf:about="http://purl.org/NET/cidoc-crm/core#P90_has_value">
<rdfs:domain rdf:resource="http://purl.org/NET/cidoc-crm/core#E54_Dimension"/>
<rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/>
<skos:scopeNote xml:lang="en">This property allows an E54 Dimension to be approximated by an E60 Number primitive.</skos:scopeNote>
{code}

h4. OWL DL

[http://erlangen-crm.org/current]
P90_has_value is a Data Property

h4. RDFS

[http://www.cidoc-crm.org/rdfs/cidoc_crm_v5.0.2_english_label.rdfs]
"The primitive values "E60 Number"... are interpreted as rdf: literal.
{code}
<rdf:Property rdf:ID="P90F.has_value">
<rdfs:domain rdf:resource="#E54.Dimension"/>
<rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/>
{code}

h3. BMX

Seme4 defined a CRM extension for the British Museum (called BMX), see [http://crm.rkbexplorer.com].
It defines several extension properties (prefix PX):

h4. PX.min_value, PX.max_value

PX.min_value, PX.max_value are defined as subPropertyOf P90F.has_value.
- If you assert e.g. min_value=35 and max_value=45, that would infer
*both* has_value=35 and has_value=45, which I think is strange.
Instead I'd leave has_value independent, and set it to the average of min_value and max_value using some calculation
- This implements the requirement [#Imprecise Dimension], but is it faithful to CIDOC CRM?
CIDOC CRM says the imprecision should be captured in the domain of P90.has_value, not through parallel properties

h4. PX.time-span_earliest, PX.time-span_latest

PX.time-span_earliest, PX.time-span_latest are defined as properties of E52.Time-Span.
- (Actually these are defined merely as rdf:Property and don't specify the domain and range).
- these properties are superfluous, given P81 ongoing throughout and P82 at some time within
- they don’t allow to capture outer & inner bound, as per 3
- they are unrelated to CIDOC CRM properties, so the extension is not CRM Compatible.
A compatibility condition from the [CRM Intro|http://personal.sirma.bg/vladimir/crm/introduction.html#extensions] is:
"all properties of the extension are either subsumed by CRM properties, or are part of a path for which a CRM property is a shortcut"

h3. Looking for Solution

CIDOC CRM leaves an important question (imprecise dimensions) unspecified,
hidden in the scope notes of primitives E60 Number and E61 Time Primitive.
This shouldn't be dismissed as "mere RDF implemenattion issue" since it is important for practical CRM interoperability.

What would be the best way to represent imprecision?

h4. Proposal

If we define E60 Number and E61 Time Primitive as RDF classes, that would imply minimal changes to CIDOC CRM.
- E60.Number with dataProperties crm:min_value, crm:max_value, and rdf:value (average or expected)
- E61.Time_Primitive with dataProperties crm:min_date, crm:max_date, and maybe rdf:value
- (see 2) The pair P83.had_at_least_duration and P84.had_at_most_duration should be merged to one property has_duration
Is there a better way?

h4. Disadvantage

I'm sure that people who expect P57 has number of parts. to be a simple xsd:integer
will be very unhappy to suddenly find a class E60.Number (and rightly so\!)
But E60.Number also gives examples of complex numbers, 3D coordinates, etc... So it really is not a literal, it needs to be a class

Your comments/advice will be appreciated.
I googled "E60 site:[http://lists.ics.forth.gr/pipermail/crm-sig]" and couldn't find relevant discussion.

h1. Martin's response

Dear Vladimir,
Thank you very much for your important questions. As a general remark I'd like to remind you that the CIDOC CRM as a standard is an ontology in the narrower sense, a formal model approximating a human conceptualization, and not a standard database schema. Any implementation, in particular any RDF Schema, is again an approximation of this conceptualization. The CRM has a much wider scope and longer life-cycle than RDF. In Relational Databases, quite different issues occur.
The Definition of the CIDOC CRM makes very clear that "Primitive Values" are dependent on the capabilities of the respective IT infrastructure.
These details cannot be standardized in the same way as the CRM, because the change in shorter periods of time than the ones for which we want to have conceptual interoperability, not bitwise interoperability.
Therefore the CRM refers loosely to concepts of time and number in a mathematical sense. So far, no database implementation is compatible with all mathematical numerical systems. Rather, we can make mathematical models of the database implementations and by that devise algorithms to mediate between different implementations.

bq. E60 Number says "... including *intervals* of these values to express *limited precision*".

This means that you have, according to your application, to specialize the respective concepts and available primitive values. Different Dimensions need different numeric systems.

bq. 2. Imprecise Duration
bq. E52 Time-Span. P83 had at least duration. E54 Dimension
bq. E52 Time-Span. P84 had at most duration. E54 Dimension
bq. IMHO this pair of properties is unnecessary, since:
bq. - E54 Dimension already accomodates (or should accommodate) imprecision, see

Good point! We'll make this any issue.

bq. If we have this pair, then shouldn't we also split P43 has dimension in two (has minimum dimension, has maximum dimension)?

Not so good, because "Dimension" is the concept of the actual dimension of something at some time, and the interval is the uncertainty about it. It is not, that the dimension itself would vary. In cases of multi-dimensional values, such as color vectors (HSI) etc., the uncertainty may be an odd area. Restricting that to minimum-maximum in the ontology, would make such more complex cases incompatible with the CRM. Time, in contrast, has one dimension (except for in science fiction).
The CRM follows the principle of "minimal commitment" by Thomas Gruber here.

bq. 3. Imprecise Start/End
bq. E52 Time-Span. P81 ongoing throughout: E61 Time Primitive (outer bound)
bq. E52 Time-Span. P82 at some time within: E61 Time Primitive (inner bound)
bq. Each of the bounds has start/end. This is confirmed by the spec:
bq. E61 Time Primitive says "... interval logic to express *date ranges*"

There are several large-scale Relational Databases that have implemented precisely that.

bq. 4. OWL2 DL proposal

This is not work of CRM-SIG or ISO, but adequate, see below.

bq. 5. OWL DL

This is not work of CRM-SIG or ISO either

bq. 6. RDFS
bq. "The primitive values "E60 Number"... are interpreted as rdf: literal.

This is work of CRM-SIG. It is adequate for data transport, because rdf does not have the necessary constructs, and in a literal we can encode any numbering system. This is the standard way how for instance xsd:DateTime is added to RDF.

bq. Seme4 defined a CRM extension for the British Museum (called [BMX|http://crm.rkbexplorer.com]). It defines several extension properties (prefix PX):
bq. 7. PX.min_value, PX.max_value as subPropertyOf P90F.has_value.
bq. - If you assert e.g. min_value=35 and max_value=45, that would infer *both* has_value=35 and has_value=45, which I think is strange.

This is not strange. If you read careful the Definition of the CRM, it is clearly stated that in an implementation multiple values of the same unique property have to be interpreted as alternatives. Hence, the result is correct. Both values are possible, like multiple fathers...

bq. Instead I'd leave has_value independent, and set it to the average of min_value and max_value using some calculation

An average of an uncertainty interval does in general not make sense. It makes only sense if an hypothesis about the nature of the deviation from the true value exists, which requires knowledge of the measurement process.

bq. - This implements the requirement 1, but is it faithful to CIDOC CRM?
bq. CIDOC CRM says the imprecision should be captured in the domain of P90.has_value, not through parallel properties

Sure, but we do not have (any more) the machines that provide interval values. Necessarily, we can only write transformation algorithms between different solutions. The CRM does not intend to standardize the impossible.

bq. 8. PX.time-span_earliest, PX.time-span_latest as properties of E52.Time-Span.
bq. - they are unrelated to CIDOC CRM properties, so the extension is not CRM Compatible.
bq. A compatibility condition from the CRM Intro is:
bq. "all properties of the extension are either subsumed by CRM properties, or are part of a path for which a CRM property is a shortcut"

The CRM does not prescribe any property. Not implementing inner bounds does not violate comaptibility.
Note, that the subsumption requirement ends at primitive values, because they are out of scope of the CRM (this should may be stated more explicitly).
"Subsumed by CRM properties" must be seen algorithmically, since the CRM is not bound to a particular KR language. We can write an algorithm, that transforms instances of pairs of PX.time-span_earliest, PX.time-span_latest into instances of P81, encoding the interval into a literal with the intended meaning of a Time Primitive. Thereby data transport and data transformation is supported.

If we want to query in addition a real database implementation for dates, we need a practical implementation.

bq. CIDOC CRM leaves an important question (imprecise dimensions) unspecified, hidden in the scope notes of primitives E60 Number and E61 Time Primitive. This shouldn't be dismissed as "mere RDF implemenattion issue" since it is important for practical CRM interoperability.

Practical interoperability is a task of applications. The CRM-SIG does not "dismiss" that. It is highly interested in that. But it will definitely not propose a standard serving a particular encoding form and database, which causes then incompatibilities with other implementations.

It is a task for particular implementer communities to provide their solutions and suggest for adoption by others. If a consensus is achieved on this level, CRM-SIG will make recommendations.

bq. If we define E60 Number and E61 Time Primitive as RDF classes, that would imply minimal changes to CIDOC CRM.
bq. - E60.Number with dataProperties crm:min_value, crm:max_value, and rdf:value (average or expected)
bq. - E61.Time_Primitive with dataProperties crm:min_date, crm:max_date, and maybe rdf:value

This causes the maximal number of joins, highly inefficient for querying and data entry, introducing at the end of the chain properties that have no possible subsumption with existing CRM properties. In my eyes the worst case, because retrieving with the query one instance of P90, but not being able to write a SPARQL condition directly on this value solves nothing except for the paper exercise of "minimal change" to the CRM.

bq. - (see 2) The pair P83.had_at_least_duration and P84.had_at_most_duration should be merged to one property has_duration

yes

bq. 10. I'm sure that people who expect P57 has number of parts. to be a simple xsd:integer will be very unhappy to suddenly find a class E60.Number (and rightly so!)

Please note that what the user finds in a user interface is explicitly not the concern of the CRM. ONLY because such concerns have been excluded, the CRM could ever be standardized.
{warning}Your GUI has to provide the adequate filters. The CRM has NEVER been recommended as a data entry form!{warning}

bq. But E60.Number also gives examples of complex numbers, 3D coordinates, etc... So it really is not a literal, it needs to be a class

Exactly. This is why an implementation has to specialize E60 Number on a case by case basis. The CRM does not want to deal with that.

bq. I googled "E60 site:http://lists.ics.forth.gr/pipermail/crm-sig" and couldn't find relevant discussion.

Most discussions are in the meetings. You may like to read the meeting minutes.

Best wishes and thank your for your comments!

h3. Proposed recommendation
Martin: We do have a recommendation for RDF implementetations of P81, P82, which is out for vote.
(i) See attached: [Imprecise Begin-End (and general)^Recommendation_time_spans.docx], [Imprecise Begin-End (and general)^time_spans.rdfs]
!time_spans_begin_end.png|width=500!

h1. How to represent start/finish (min/max) times of an Event
Using [P116 starts (is started by)@crm] is wrong. It just brings another E2 into the picture, without getting us closer to capturing the time. P116 says: "This property allows the starting point for a E2 Temporal Entity to be situated by reference to the starting point of *another* temporal entity of longer duration. This property is only necessary if the time span is *unknown*"

We have three options:
# Use "P82a begin of the begin", "P82b end of the end" (domain E52.Time-Span, range xsd:dateTime), as defined by Martin in response to my questions.
It is important to note that xsd:dateTime can express years BC by [allowing a negative sign|http://www.w3.org/TR/xmlschema-2/#signallowed]
# Use [PX.time-span_earliest_int|http://crm.rkbexplorer.com/description/bm-extensions/PX.time-span_earliest_int], [PX.time-span_latest_int|http://crm.rkbexplorer.com/description/bm-extensions/PX.time-span_latest_int] (domain E52.Time-Span, range xsd:integer) from BMX.
Inferior solution since the properties are not standard, nor is the used range
# Define a class Time Primitive with properties min_date, max_date (my proposal).
Inferior solution, since it involves a useless level of indirection
# Mariana:
Time edges
crm:E2_Temporal_Entity crm:P116_starts crm:E2_Temporal_Entity .
crm:E2_Temporal_Entity crm:P4_has_time_span crm:E52_Time-Span .
crm:E52_Time-Span crm:P81_ongoing_throughout crm: E59_Time-Primitive/(String, or sxd:date) .
crm:E2_Temporal_Entity crm:P115_finishes crm:E2_Temporal_Entity .
crm:E2_Temporal_Entity crm:P4_has_time_span crm:E52_Time-Span .
crm:E52_Time-Span crm:P81_ongoing_throughout crm: E59_Time-Primitive/(String, or sxd:date) .

no new properties have been introduced.

Guess we'll go with 1 :)

h3. Adopted Solution

For the time being with aregoing with 1 :-)