Open Annotation Collaboration (OAC) at http://www.OpenAnnotation.org
Intro
OAC is a Mellon-funded project to develop the OAC ontology. Principal participants:
- Los Alamos National Laboratory
- The University of Queensland
- University of Maryland
- University of Illinois at Urbana Champaign
- Maryland Institute for Technology in the Humanities
Phases:
- Phase1: Jul 2009-Dec 2010. Final report (Apr 2011)
- Phase2: started Nov 2011
Resources
- Guiding Principles (latest revised version)
- Beta Data Model Guide, 10 August 2011
- oac.rdf ontology, saved to SVN
The one served by the namespace URI http://www.openannotation.org/ns/ is old (alpha3) - Examples:
- Hubble: images from the Hubble telescope
- Scholarly Editions
- Gatsby: commentaries on The Great Gatsby by F.Scott Fitzgerald
- CNN
- Google Group
I've studied most of it - Reading list (recent papers)
I haven't read any of them yet (about 11) - Use cases and User narratives
Other resources
- list of Annotation tools, esp. Image annotation
- another list of Annotation tools
Benefits
- Based on general Guiding Principles that hopefully will ensure longevity, sustainability and relevance to various domains
- Treats Annotation, Target (item being annotated) and Body (item comprising the annotation) as separate and distinct
- Provides for annotations of text, structured data, images, other multimedia
- Provides examples of application in widely varying domains, with great illustrations
Disadvantages
- the model is still beta, not finalized
- recommends storing RDF data as an encoded string (cnt:chars) (sec 3.5, 3.7.3.1, 3.9.1), which IMHO is a very bad idea
- does not yet define any way to publish, share, search, discover, subscribe to annotations:
- to transfer: "Dereferencing the HTTP URI of the Annotation document results in an RDF serialization of an instance of this data model. Any of the RDF serialization formats are permissible, however RDF/XML is recommended"
- "companion publish/discovery document" (not yet available)"
- "In order to increase the likelihood of adoption, and in alignment with the goal of sharing annotations, no client-server protocol for publishing/updating/deleting annotations will be specified. Rather, the specifications will take a perspective whereby clients publish annotations to the Web and make them discoverable using common Web approaches"
Implementations
Annotation implementations conforming to OAC:
- Ajax XML Encoder was the first app to do OAC on a pilot basis
- big name: “The MITHological AXE: Multimedia Metadata Encoding with the Ajax XML Encoder”
- live demo: includes text, image, audio, video. Not quite impressive
- Zotero AXE demo video: integration of AXE to annotate image captured in Zotero
- Shared Canvas
- SEMLIB
- YUMA
In development
- Annotation Supporting Collaborative Development of Scholarly Editions
- Annotation and Transcription Tools for Digital Medieval Manuscripts
- VideoAnnotator: in development, open source available
- Annotation of Digital Emblematica
Maphub Phase II: Web portal for annotating online historic maps. ("Georeferencing and annotating digitized, high-resolution historic maps"). All annotations will be represented in the OAC data model and become Web resources that can be referred to.
Created by Cornell, builds upon YUMA, built using Ruby on Rails. Open source.- Annotator for Fedora: developed at Brown University, not yet available
Watch Projects and pages for future announcements
OAC Overview
OAC resume/cheat sheet
Baseline Model
An oac:Annotation consists of oac:Body (sometimes called "content") and oac:Target (the resource being annotated).
oac:hasBody | from Annotation to Body |
---|---|
oac:hasTarget | from Annotation to Target |
- There can be Multiple Targets, eg this Body relates to 3 Targets:
Fitzgerald makes key mistakes in accounting for Daisy's chronology:- We are told at one point in Chapter 4, that...
- We also know from elsewhere in Chapter 4, that ...
- And we are told at one point in Chapter 1, that ...
Additional Properties
dc:title | name of the Annotation |
---|---|
dcterms:creator | creator of the Annotation |
dcterms:created | time and date at which the Annotation was created |
oac:annotates | from Body to Target |
Annotation Predicates
Unofficial proposal for various annotation "dispositions" that are sub-properties of oac:annotates, eg:
- advises: advisesPreferredEdition advisesSupportingReference advisesFurtherResearch
- categorizes: categorizesBySubject categorizesByTemporal categorizesBySpatial
- changes: changesAddsErrata changesSuggestsEdit
- comments:
- compares: orders contrasts correlates
- explains
- questions: questionsFact questionsOpinion questionsConsistency
- replies: repliesAgreement repliesDisagreement repliesMixedOpinion
- seeAlso: seeAlsoOtherSources seeAlsoConfirming
Annotation Type
Defines specialized sub-classes of oac:Annotation. We also use one: rso:DataAnnotation.
Currently there is only one (the additional Annotation Type List is empty)
oac:Reply | an Annotation (A2) that it is a reply to another Annotation (A1). A2.hasTarget=A1 |
---|
- Problem: A2 has no direct pointer to the ultimate target (eg image), nor to the root of the discussion thread.
(A W3C proposal for thread structure includes such pointer: http://www.w3.org/2001/03/thread#root.)
So if you need to find all annotations in a thread, or having the same ultimate target, you need to walk a chain.
Serialization, Inline
"Dereferencing the HTTP URI of the Annotation document results in an RDF serialization of OAC".
- This follows Linked Data principles
- It's not entirely clear how much of the annotation graph should be returned. The included example returns only oac:Body, using RDFa->RDF conversion
Inline Body, Data Annotations, Constraints
Uses the W3C spec Representing Content in RDF to put encoded data in a node (Body, Constraint):
cnt:ContentAsText | signifies that content is available as encoded data |
---|---|
cnt:chars | representation of the resource, as plain text, in a certain encoding |
cnt:characterEncoding | character encoding of cnt:chars (or cnt:bytes?), such as "utf-8" or "ascii" |
dc:format | data format of the body, eg "N3" for RDF/N3;Turtle "SVG" for a SvgConstraint |
I have argued that Representing RDF data as cnt:chars is a bad idea since it defeats semantic repository indexes and query optimization
Fragments and Constraints
Fragments and Constraints allow one to select a resource part as annotation Body and/or Target, not just the entire resource. This important topic accounts for about half the spec
- Section in the main Data Model
- Extended Constraint Type List
Fragments
Fragments are denoted as "URI#Fragment" and include:
- text: char= start,end; line= start,end
- X/HTML: id or a.name; xpointer=
- PDF: page= n; viewrect= top,left,width,height
- image: xywh= top,left,width,height
- video: t= kind:start,end
- where: kind: Normal Play Time (npt), SMPTE (smpte), or Wall Clock (clock)
- start,end: are in seconds
dcterms:isPartOf | links the fragment Target to the full resource. Typically: <URI#fragment> dcterms:isPartOf <URI> |
---|
Constraints
Constraints represent the resource part in a structured way. Some reuse AO Selectors ( aos: ). They are represented as subclasses of oac:Constraint and include:
- text: oac:OffsetRangeConstraint: aos:offset (starting char), aos:range (number of chars)
- text: oac:PrefixPostfixConstraint: aos:prefix (before selection), aos:exact (the selection), aos:postfix (after selection)
- image: oac:SvgConstraint: part is delimited by an SVG shape: path, rect, circle, ellipse, line, polyline, polygon, g(roup)
- g(roup) is used only to create a part with a hole (eg donut)
- point (marker) cannot be used?
- the SVG content is obtained thus:
if the oac:SvgConstraint node is also cnt:ContentAsText, then from property cnt:chars
else the node's URI is dereferenced and the server should return the content
- time: oac:WebTimeConstraint: oac:when: to annotate the version of a resource as of the given time.
(When you cite a URL in a scientific paper, good style asks you to say "Accessed on ...", this is the same idea) - context: oac:ContextConstraint: oac:inContextOf: to annotate a resource only in the context of a bigger resource.
Eg "this image is too bright in the context of that web page" - rdf: constraint-specific predicates that specify the resource part in a structured way
RDF Resource Constraint
I made a proposal for OAC constraint extension to annotate an RDF property-instance:
RDF Resource Constraints are used to point to RDF statements. They match the framework of Data Model Guide section 3.7.3.2 and are attached to the Constraint (node C-1 on Figure 7.3.2).
They reuse the RDF Reification vocabulary.
rdf:Statement | Signifies that the constraint is about an RDF statement. The statement conceptually "belongs" to the target T-1 (the object of oac:constrains). |
---|---|
rdf:subject | Subject of the statement being annotated, required. May coincide with the target T-1. |
rdf:predicate | Property being annotated, required. |
rdf:object | Object of the statement being annotated, optional. If missing, the annotation is about all statements with the given subject and predicate, and none of them in particular |
However, I have now come up with a cleaner representation (see below)
RS Data Annotation in OAC
The previous version of RS Annotation Design used a vocabulary based on CRM and extended with RSO.
We have now decided to use OAC, and the correspondence is shown below.
The key to understanding is this: Annotation Points map exactly to oac:Target (target of a link!).
There are two cases: entire MO or a Statement
Case 1: Annotate Entire MO
The target is rso:E22_Museum_Object (the entire MO)
RSO+CRM | OAC+Reification | meaning; comments |
crm: E13_Attribute_Assignment | rso:DataAnnotation | annotation |
rso:root | oac:hasTarget | link to target (being the MO) |
oac:Target,rso:E22_Museum_Object | annotation target (being the MO) | |
oac:hasBody | link to body | |
oac:Body | annotation body |
Case 2: Annotate statement
The target is rdf:Statement with optional rdf:object. The target uses dcterms:isPartOf (as in Fragments) to point to the entire MO
RSO+CRM | OAC+Reification | meaning; comments |
crm: E13_Attribute_Assignment | rso:DataAnnotation | annotation |
oac:hasTarget | link to target | |
oac:Target,rdf:Statement | annotation target (being a statement) | |
rso:root | dcterms:partOf | Museum Object being annotated |
crm: P140_assigned_attribute_to | rdf:subject | subject of statement being annotated |
rso:property | rdf:predicate | property-instance being annotated |
rso:object | rdf:object | object/value annotated/proposed |
rso:other_object | same | old object/value criticised/justified |
Common properties
The rest of the properties apply to both cases:
RSO+CRM | OAC+Reification | meaning; comments |
rso:DataAnnotation | Won't use Annotation Type oac:Reply: see problem described there | |
rso:reply_to | same | point to original oac:Annotation; keep oac:hasTarget pointin towards MO |
oac:hasBody | link to body | |
oac:Body | annotation body | |
rso:has_link | same | |
rso:P3_has_title | same | title. Make subproperty of dcterms:title |
rso:P3_has_description | same | description |
rso:P2_reply_disposition | same | Won't use Annotation Predicates because they are only proposed, not part of the spec |
rso:P2_other_disposition | same | |
rso:P2_annotation_status | same | |
crm: P14_carried_out_by | dcterms:creator | creator |
crm: P4_has_time-span. P82a_begin_of_the_begin | dcterms:created | date/time created |
Graphical Comparison
A graphical example of Case2 (annotation of Statement) is shown below, comparing the old and new way
Annotation with RSO and CRM
Annotation with OAC and Reification
Diff Comparison
TODO
- decide on URI scheme. In particular, will the AP (rdf:Statement) have nuxeo:uid or hand-crafted URI based on its components