Skip to end of metadata
Go to start of metadata

Date range search spec

Regexp

  • When the user enters a text query, RS removes leading/trailing spaces, then tries to recognize it as a Date Range or Identifier (see FR Enhancements#Keyword-Date-Identifier Search)
  • the Date Range regexp is
    ^\d+ *(bc|BC)( *- *\d+ *(bc|BC))?$

    but not

    ^\d{5,}$

    which means:

    • one date "from" or two dates "from-to", with optional spaces around "-"
    • each date is a year represented by any number of digits, where "BC" (or lowercase "bc") indicates "Before Christ" (negative)
    • the complete query is matched (^...$)
    • a query consisting of >=5 digits (and only digits) is not a Date query (it's an identifier query)

Examples

Examples of allowed syntax for date ranges (equivalent spellings are given in each line)

  • 1000bc, 1000BC, 1000 BC
  • 1234, but NOT 12345
  • 1000bc-1000, 1000 bc -1000, 1000 BC - 1000

Design Considerations

RS-920
Translation:

  • "from" and "to" are translated to proper xsd:year, which uses "-" instead of "BC" and must have at least 4 digits, eg
    -10000 -1000 -0100 -0010 -0001 0000 0001 0010 0100 1000 10000
  • "0" is not a valid year. "From" is translated to "-0001" and "to" is translated to "0001", eg
    query translated to
    000 0001
    5bc-0 -0005; 0001
    0-100 -0001; 0100

How to make a date (or date-range) query. The task is non-trivial, because:

  • An object may have P82_at_some_time_within or both P82a_begin_of_the_begin and P82b_end_of_the_end.
    P82a and P82b are subproperties of P82, so in that case we'd get two P82 values
  • We want all objects whose interval "P82a-P82b" intersects the search interval "from-to".
    This can be translated to
    {from <= P82 <= to}
    UNION
    {P82a <= from < to <= P82b}
    
  • The query searches along this property path: P108i_was_produced_by / P4_has_time-span / P82_at_some_time_within
  • Objects could specify dates as date vs gYearMonth vs gYear (RKD has date and gYear, BM has date, none have gYearMonth)
  • BM data used to have various strings (not proper XSD types) in P82/a/b, but that's fixed now
  • BM data suffers from this (but it affects results display, not search)
    RS-1140
  • OWLIM can compare two date or gYearMonth or gYear, but not mixed variants:
    OWLIM-680
  • We must use the OWLIM Literal Index, else the query is too slow
    RS-1711
    • We don't want to use a function to coerce stored dates to a uniform representation (eg date)
    • Currently this index has a limitation that it works only for conjunction (not disjunction, negation, etc), so we must use UNION and not ||

Date Search Query

  • set $1 := from ($1 and $2 are used in the query below)
  • set $2 := if to then to else from (i.e. if the user entered only from, treat it as date range "from-from")
  • The query is

Testing/debugging

(You don't need to read the section below, unless you need to modify the query above)

Basic query

Here's a query that returns both BM and RKD objects

Basic Query Count

select (count(distinct ?E) as ?c) returns 11266:
because of RS-1164 each object is returned 3 times, eg

  <http://collection.britishmuseum.org/id/object/YCA79334>
  <http://collection.britishmuseum.org/id/object/Y_80820>
  <http://collection.britishmuseum.org/id/object/3385734>

The true number is (11266+11*2)/3 = 3762
If these duplicates are a problem, write at the issue above.

Reducing Date Range

Reducing the date range to years 1-100 reduces the number of results to 699 (true number 233), as expected

Adding Second Case

The above doesn't cover the case P82a <= from < to <= P82b.
So we add an OPTIONAL(?d1 ?d2) and 2 comparison clauses.

This returns 702 (true 234), so there must be one extra object whose production straddles years 1-100

Finding the Extra Object

We find this extra object:

Do we really need gYear

Only the 11 RKD objects have gYear, but this alternative doesn't slow down the search, so let's leave it in for other data sets.
(On the other hand, other data sets might also have gYearMonth)

11 RKD objects
Nothing
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.