Skip to end of metadata
Go to start of metadata

Summary

  • Initially, the search was implemented to 1) select the entities that match the given criteria (FR Restrictions); 2) Obtain the display fields for each entity via several additional SPARQL Queries (one to two per display field; two when BM display fields were different than Rembrandt ones).
  • Two variations of the search were implemented: 1) All-in-one query where the restrictions and the display fields were together - this created cartesian explosion; 2) select the entities + 1 query per entity, that selected all the display fields. Both variations were slower than the original search
  • Then we tried two new approaches: 1) SPARQL 1.1. Group_Combine query and 2) Construct query which was expected to construct the graphs
  • Vlado's notes:
    • SELECT suffers from Cartesian product expansion (eg 5 techniques * 5 materials = 25 rows per object)
    • Tried GROUP_CONCAT(DISTINCT ?technique): surprisingly goes exponential with the number of fields
    • Tried CONSTRUCT instead of SELECT (return one graph per object instead of Cartesian product of rows per object): again exponential; OWLIM doesn't eliminate duplicate statements from the result graph
  • Finally we tried several queries per-field (10 or so) instead of per-entity, so we end up with a query like:

    Currently, the last approach seems to be the fastest.

Sample query

We used sample FTS query ("Portrait") that returns 152 results. The execution of the restriction itself is not an issue, but fetching the results (the display fields) is where the performance problem lies.

The select part (w/o the display fields) is:

Description of the Group_Concat and Costruct approaches:

Group concat

Timing - with increasing number of display fields in the where/select clauses

  • 1.5s with just dataset, RDFType, E55
  • 47s with everything else w/o imgUrl/imgUrl2
  • 1165s (~18mim) everything

Construct

Timing - with increased number of display fields

  • 8s for the small query with 1 optional Tech & dataset, rdf:type, e55
  • 708s full query (results: 12,695,990)
  • 2644 s, removed Title (results: 2,813,704)
  • 11s, removed RDFType, results: 128,404
  • 3.2s, removed imgURLs (results: 1,704)

Per display fields queries

The speeds are tested for a resultset of 150, 1,000 and 3,000. However for 3,000 the parser crashes with stack overflow, so currently we have results just for 150 and for 1,000 results. The results are shown per-field, with times in ms for 150 and for 1,000 results:

Field for 150, ms for 1000, ms
dataset 190 1493
RDFType 218 1317
E55 141 1503
uid 191 1541
imgUrl21 2653 3540
Title1 40 469
Title2 135 964
Support21 115 843
Tech1 128 1004
Support1 124 1019
Artist1 123 985
Place1 124 1000
DateCreated1 118 922
Totals 4300 16600

Current search performance

Query Results Time, ms
LocatedIn: London 2 297
FTS: Portrait 249 2438
LocatedIn: Hangue 6 267
Date 1000-2000 11 2912

There might be some problem with the imageURL (e.g. I suspect it doesn't select the main image for BM objects but more images).

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.