Date range search spec
- When the user enters a text query, RS removes leading/trailing spaces, then tries to recognize it as a Date Range or Identifier (see FR Enhancements#Keyword-Date-Identifier Search)
- the Date Range regexp is
- one date "from" or two dates "from-to", with optional spaces around "-"
- each date is a year represented by any number of digits, where "BC" (or lowercase "bc") indicates "Before Christ" (negative)
- the complete query is matched (^...$)
- a query consisting of >=5 digits (and only digits) is not a Date query (it's an identifier query)
Examples of allowed syntax for date ranges (equivalent spellings are given in each line)
- 1000bc, 1000BC, 1000 BC
- 1234, but NOT 12345
- 1000bc-1000, 1000 bc -1000, 1000 BC - 1000
- "from" and "to" are translated to proper xsd:year, which uses "-" instead of "BC" and must have at least 4 digits, eg
- "0" is not a valid year. "From" is translated to "-0001" and "to" is translated to "0001", eg
query translated to 000 0001 5bc-0 -0005; 0001 0-100 -0001; 0100
How to make a date (or date-range) query. The task is non-trivial, because:
- An object may have P82_at_some_time_within or both P82a_begin_of_the_begin and P82b_end_of_the_end.
P82a and P82b are subproperties of P82, so in that case we'd get two P82 values
- We want all objects whose interval "P82a-P82b" intersects the search interval "from-to".
This can be translated to
- The query searches along this property path: P108i_was_produced_by / P4_has_time-span / P82_at_some_time_within
- Objects could specify dates as date vs gYearMonth vs gYear (RKD has date and gYear, BM has date, none have gYearMonth)
- BM data used to have various strings (not proper XSD types) in P82/a/b, but that's fixed now
- BM data suffers from this (but it affects results display, not search)
- OWLIM can compare two date or gYearMonth or gYear, but not mixed variants:
- We must use the OWLIM Literal Index, else the query is too slow
- We don't want to use a function to coerce stored dates to a uniform representation (eg date)
- Currently this index has a limitation that it works only for conjunction (not disjunction, negation, etc), so we must use UNION and not ||
- set $1 := from ($1 and $2 are used in the query below)
- set $2 := if to then to else from (i.e. if the user entered only from, treat it as date range "from-from")
- The query is
(You don't need to read the section below, unless you need to modify the query above)
Here's a query that returns both BM and RKD objects
select (count(distinct ?E) as ?c) returns 11266:
because of RS-1164 each object is returned 3 times, eg
The true number is (11266+11*2)/3 = 3762
If these duplicates are a problem, write at the issue above.
Reducing the date range to years 1-100 reduces the number of results to 699 (true number 233), as expected
The above doesn't cover the case P82a <= from < to <= P82b.
So we add an OPTIONAL(?d1 ?d2) and 2 comparison clauses.
This returns 702 (true 234), so there must be one extra object whose production straddles years 1-100
We find this extra object:
Only the 11 RKD objects have gYear, but this alternative doesn't slow down the search, so let's leave it in for other data sets.
(On the other hand, other data sets might also have gYearMonth)