Intro
- as part of RS 20120502 Mellon Demo, they wanted to see some query working across different servers
- we'll be able to provide only tabular data as a demo,
is this sufficient
- (too bad Yale's deployed OWLIM only now, they wanted a very similar thing, and maybe the idea comes from them)
- BarryN helped with SPARQL Federation syntax
- will be executed on the RS endpoint, since SPARQL Federation (SERVICE) doesn't allow to specify user/password, but the RS endpoint has such
Potential Endpoints
name | interactive query | SERVICE | notes |
---|---|---|---|
ResearchSpace | http://researchspace.ontotext.com/openrdf-workbench/repositories/susana/query | http://researchspace.ontotext.com/openrdf-sesame/repositories/susana | |
Yale | http://collection.britishart.yale.edu/openrdf-workbench/repositories/owlim/query | http://collection.britishart.yale.edu/openrdf-sesame/repositories/owlim | Has no artist names, ulan:nnn don't really correspond to ULAN IDs, doesn't seem to have |
BritishMuseum | http://collection.britishmuseum.org/Sparql | unknown | "Syntax=SparqlResults/XML&Query=..." gives error Invalid parameters. Without Syntax= gives An error occured with retrieving the results of the query |
DBpedia/FactForge | http://factforge.net/sparql | http://factforge.net/sparql | distinguishes by Accept header, eg
curl -H "Accept:application/sparql-results+json" _ "http://factforge.net/sparql?query=SELECT+*+{%3Fs+%3Fp+%3Fo}+LIMIT+10" |
- the current plan is to use RS and DBpedia.
Test Queries
Vital data from DBpedia
(on RS) Rembrandt's vital data, obtained from DBpedia
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dbp-prop: <http://dbpedia.org/property/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT * WHERE { SERVICE <http://factforge.net/sparql> { ?act a foaf:Person. ?act rdfs:label "Rembrandt"@en. OPTIONAL {?act dbp-prop:birthdate ?birthDate} OPTIONAL {?act dbp-prop:deathdate ?deathDate} OPTIONAL {?act dbp-prop:birthplace ?birthPlace. FILTER(lang(?birthPlace)="en")} OPTIONAL {?act dbp-prop:deathplace ?deathPlace. FILTER(lang(?deathPlace)="en")} }}
Things in RS
(on RS) thing, creator, name
(uses SPARQL 1.1 property path syntax)
PREFIX crm: <http://erlangen-crm.org/current/> select ?obj ?act ?name { ?obj crm:P108i_was_produced_by / crm:P14_carried_out_by ?act. ?act crm:P131_is_identified_by / crm:P3_has_note ?name. }
Paintings in RS
(on RS) painting, title, creator (of any part), name
PREFIX crm: <http://erlangen-crm.org/current/> select ?obj ?title ?act ?name { ?part crm:P108i_was_produced_by / crm:P14_carried_out_by ?act. ?part crm:P46i_forms_part_of ?obj. ?obj crm:P102_has_title / crm:P3_has_note ?title. ?act crm:P131_is_identified_by / crm:P3_has_note ?name. }
Rembrandt paintings in DBpedia
(on FF) Rembrandt paintings, with their EN title
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> select * { <http://dbpedia.org/resource/Rembrandt> <http://rdf.freebase.com/ns/visual_art/visual_artist/artworks> ?work. ?work rdfs:label ?title. FILTER(lang(?title)="en")}
- Note1: the range of <http://dbpedia.org/property/works> is string not URI of the painting?!
- PREFIX fb: <http://rdf.freebase.com/ns/> is not useful since the property uses slashes, so fb:visual_art/visual_artist/artworks is not valid syntax. (FF wrongly suggests the property is visual_art.visual_artist.artworks)
Demo Queries
Things from RS, together with vital data from DBpedia
Painting authors from RS, add vital data (date/place of birth/death) from DBpedia
- The only matches (from RDF Search and Explore) are:
- http://factforge.net/resource/dbpedia/Rembrandt
- http://factforge.net/resource/dbpedia/Jacobus_Houbraken ("Jacob Houbraken" in RS)
- http://factforge.net/resource/dbpedia/George_Hendrik_Breitner ("Breitner, George Hendrik" in RS)
- so we have to add some sameAs assertions:
PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX dbpedia: <http://dbpedia.org/resource/> PREFIX rkd-artist: <http://rkd.nl/thesaurus/artist/> INSERT DATA { dbpedia:Rembrandt owl:sameAs rkd-artist:Rembrandt. dbpedia:George_Hendrik_Breitner owl:sameAs rkd-artist:Breitner_George_Hendrik. dbpedia:Jacobus_Houbraken owl:sameAs rkd-artist:Houbraken_Jacob. }
- then query:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dbp-prop: <http://dbpedia.org/property/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX crm: <http://erlangen-crm.org/current/> select * { ?obj crm:P108i_was_produced_by / crm:P14_carried_out_by ?act. ?act crm:P131_is_identified_by / crm:P3_has_note ?name. SERVICE <http://factforge.net/sparql> { ?act a foaf:Person. OPTIONAL {?act dbp-prop:birthdate ?birthDate} OPTIONAL {?act dbp-prop:deathdate ?deathDate} OPTIONAL {?act dbp-prop:birthplace ?birthPlace. FILTER(lang(?birthPlace)="en")} OPTIONAL {?act dbp-prop:deathplace ?deathPlace. FILTER(lang(?deathPlace)="en")} }}
- we can also get all alternative names from DBPedia, but that's too much (eg Rembrandt has 6-7)
OPTIONAL {?act rdfs:label ?DBPname. FILTER(lang(?DBPname)="en")}.
Things from the RS and Yale collections (UNION)
(on RS) thing, title
PREFIX crm: <http://erlangen-crm.org/current/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT ?thing ?title { {?thing crm:P102_has_title / crm:P3_has_note ?title} UNION {SERVICE <http://collection.britishart.yale.edu/openrdf-sesame/repositories/owlim> {?thing crm:P102_has_title ?t. ?t rdfs:label ?title . FILTER (?title != "") }}}
There are some bugs with the federated query parser, so keep the above just as it is
- property path syntax (crm:P102_has_title / rdfs:label) doesn't work
- deleting the space after (?title != "") doesn't work
Vlado's Presentation
Dominic's Site
http://annotate.oldman.me.uk/federate.html
Dominic Oldman - April 2012
Semantic Federated Search - Test developed by Yale Center for British Art, the British Museum and Ontotext
This web page retrieves results from two different decentralised databases with a single federated search query. There is no middleware or program code to connect the two data sources. One database contains data provided from the RKD's Rembrandt Project and is located in Sofia, Bulgaria. The other store belongs to the Yale Center for British Art at Yale University (YCBA), New Haven in the United States. This web site is located in Amsterdam. This test is part of work being undertaken by the British Museum and YCBA. Additional RDF stores will be added and functionality developed over time.
The main features of the test are:
- No middleware is used for the search. There is one query searching two separately located data stores.
- The data is semantically harmonised simply by conforming to the CIDOC-CRM ontology. Both sets of data agree about the meaning of certain concepts.
- There has been no hardcoded links between the datasets. They have been created separately and no RDF triples have been inserted to assert any direct connections.
- The search illustrates that both sets of data agree about the meaning of a painting title, even though these concepts may have been implemented differently within the respective organisations.
- The web page has been tested with Firefox but may work with later versons of Internet Explorer.
- British Museum data will be included in short course.
- There are no database optimisations. The stores have only recently be installed.
Abstract: A data harmonistation without any middleware. Without any hard coded links and based on a common semantic understand on what constitutes a title for a painting. In this context not very complicated but it shows the principle that can be applied. The tools in ResearchSpace are about, one the one hand improving information and knoweledge and on the other hand, by in doing so, making the connections between the data much stronger and based on scholalrly annotation as well as catalogue and scientific information.
The academy and the museum are combined!
Function:
- There is a drop down with some keywords
- You select a word and the keyword is injected into an AJAX call to a federated search (python CGI script0 across the titles from the Rembrandt semantic database and the yale semantic database. This is based on the CRM understanding of a title.
- The web page retrieves the data from http://www.rembrandtdatabase.org in Bulgaria and http://collection.britishart.yale.edu at Yale.
- The images are sourced on the basis of these onject identifiers (although more artificially in terms of Bulgaria)
- You can clearly see the result of the keyword search.
Reading material
- From: andreas.schwarte@fluidops.com
http://lists.w3.org/Archives/Public/public-lod/2011Oct/0022.html
Forwarded by: lec.maj@yale.edu
Date: Wed, 5 Oct 2011 10:39:33 +0200
Cc: public-lod@w3.org
There is currently some development going on in this area. Please find a short overview below.
First, there is Sesame. We are currently working on an integration of the SPARQL 1.1 federation extensions into core Sesame. A release of the new version is planned in around 2-3 weeks.
Then there is FedX, a federation SAIL for Sesame with sophisticated optimizations for federated query processing. Allthough FedX does not yet support SPARQL 1.1, it is capable of executing SPARQL queries against a federation of pre-configured SPARQL endpoints. Note that FedX automatically performs source selection (on the registered sources), and thus does not need the endpoints specified explicitely via SERVICE by the user. FedX provides a command line interface to allow for fast interaction, and an integration of SPARQL 1.1 is scheduled for the next few weeks.
Besides these projects there are some research projects dealing with federated query processing: SPLENDID (UKoblenz, paper at COLD 2011), SPARQL DQP (UPM) and DARQ (distributed ARQ, discontinued 2006).
- Vlado: TODO: add references to projects presented at ESWC 2012 query federation workshop