- Initially, the search was implemented to 1) select the entities that match the given criteria (FR Restrictions); 2) Obtain the display fields for each entity via several additional SPARQL Queries (one to two per display field; two when BM display fields were different than Rembrandt ones).
- Two variations of the search were implemented: 1) All-in-one query where the restrictions and the display fields were together - this created cartesian explosion; 2) select the entities + 1 query per entity, that selected all the display fields. Both variations were slower than the original search
- Then we tried two new approaches: 1) SPARQL 1.1. Group_Combine query and 2) Construct query which was expected to construct the graphs
- Vlado's notes:
- SELECT suffers from Cartesian product expansion (eg 5 techniques * 5 materials = 25 rows per object)
- Tried GROUP_CONCAT(DISTINCT ?technique): surprisingly goes exponential with the number of fields
- Tried CONSTRUCT instead of SELECT (return one graph per object instead of Cartesian product of rows per object): again exponential; OWLIM doesn't eliminate duplicate statements from the result graph
- Finally we tried several queries per-field (10 or so) instead of per-entity, so we end up with a query like:
Currently, the last approach seems to be the fastest.
We used sample FTS query ("Portrait") that returns 152 results. The execution of the restriction itself is not an issue, but fetching the results (the display fields) is where the performance problem lies.
The select part (w/o the display fields) is:
- 1.5s with just dataset, RDFType, E55
- 47s with everything else w/o imgUrl/imgUrl2
- 1165s (~18mim) everything
- 8s for the small query with 1 optional Tech & dataset, rdf:type, e55
- 708s full query (results: 12,695,990)
- 2644 s, removed Title (results: 2,813,704)
- 11s, removed RDFType, results: 128,404
- 3.2s, removed imgURLs (results: 1,704)
The speeds are tested for a resultset of 150, 1,000 and 3,000. However for 3,000 the parser crashes with stack overflow, so currently we have results just for 150 and for 1,000 results. The results are shown per-field, with times in ms for 150 and for 1,000 results:
|Field||for 150, ms||for 1000, ms|
There might be some problem with the imageURL (e.g. I suspect it doesn't select the main image for BM objects but more images).