
Nomisma is a LD numismatic thesaurus/site, NUDS a XML schema for describing coins
Contact: Ethan Gruber: ewg4xuva@gmail.com:
- at American Numismatic Society (Numisma, Numishare)
- previously at UVirginia
20121109 Dominic
How long would it take to convert nomisma (NUDS) to CRM (ResearchSpace style)
- http://nomisma.org/nuds/numismatic_database_standard
- http://wiki.numismatics.org/nuds:nuds: latest documentation
(There's potential of including numismatics in RS])
20121112 Vlado
Some questions/observations:
1. Is this to define a mapping, or also implement a convertor?
2. When do they plan to have an XSD?
"Eventually, XML schemas will be created to enforce validation of NUDS-XML records"
There are numerous examples of unclarities:
- "The <nuds> element has a recordType attribute...":
it doesn't become clear what is that element's type; and whether 1 file (root) can include info only about 1 coin - "The undertypeDesc can hold anything found in typeDesc" :
but what is "undertype"? - "typeDesc for physical objects can, instead of containing elements, use the xlink:href or some other attribute to point to a URI for a conceptual type"
"some other"? Really?? - what goes in <typeSeries/>, <symbol/> ?
- are the given xlink:role complete lists, or only a sampling?
- "xlink:href and URI to source also possible":
where else could this URI be located, except in xlink:href ? - which of the elements are repeatable? Is there an "envelope" element to hold repetitions (that's a XML best practice).
E.g. I guess <acqinfo> is repeatable, but has no envelope - what goes in?
- where is <digRep> located?
- are there (or are there plans to create) cross-field validation rules, e.g.
<physloc/> can be used only if recordType="physical"
If not, is it ok if the result is an inconsistent (e.g. unlinked) CRM RDF graph? - should there be some correlation between:
<acqinfo><acquiredFrom/> and <previousColl>?
<acqinfo><sale*> (3 elements) and <previousColl><sale*>
3. Alternatively, would there be a task to dig through a number of existing records and figure out the schema from there?
What is that number?
4. Will someone be able to provide answers/clarifications in a timely manner?
- E.g. what's <saleItem/>? I guess a lot number, but can't be sure.
5. The CRM modeling won't be quite simple, because:
- it will include physical & conceptual objects:
"The <nuds> element has a recordType attribute for defining physical/conceptual (an XML representation of a coin type rather than a physical coin itself)." - it will involve some elements of FRBR, consider
- <publicationStmt>, <revisionStmt>, <rightsStmt>
- "If the object is conceptual, digRep can contain <associatedObject xlink:href=””/> for each associated coin"
20121206 Ethan Gruber
To address a few things:
- Several attributes can occur on any element in the document: "certainty" and the xlink attributes. Values in xlink:role will not be enforced in the schema so that institutions can define their own roles. The roles seen in the wiki are the most common you would see associated with coins (and defined with nomisma ids, e.g., http://nomisma.org/id/engraver).
- An undertype is used with a physical coin that has been countermarked by another entity for re-circulation. For example, Arabs reused Byzantine coins after putting a punch mark into them. The typeDesc in this case describes the characteristics of the now-Arab coin while the undertypeDesc describes the characteristics of the Byzantine coin type beneath the punch mark.
- saleItem = lot number
- digRep is located outside of the descMeta, in the <nuds> root. The root element contains nudsHeader, descMeta, and digRep.
- The root element contains information about one coin (or coin type). There is no mechanism for grouping multiple records in a single XML file.
- I will probably implement cross-field validation. The XSD schema has been a long time coming. I will begin working on a draft today now that NUDS is in a fairly stable state.
20121207 Vlado
Nomisma Notes
I’ve looked at http://nomisma.org and it's a really well-conceived collaborative effort to build a semantic database of numismatic info.
(Dominic, I'll add Nomisma in the RS confluence in "related projects".)
It uses simple but effective approaches. An example entity:
- http://nomisma.org/id/igch0262 is a page about hoard 262 described in "Inventory of Greek Coin Hoards", including data, links and map.
- Press Alt-V to see the source. It's HTML+RDFa.
- Select "RDF triples" (or go to this URL) to see the RDF data
Questions:
- are these pages crafted by hand? Or is there some automatization or maybe Wiki markup?
- if it's a large collaborative effort, how do you ensure quality, i.e. that everyone sticks to the same format?
- is there an ontology (data model) that this data is based on, or documented by?
Comments:
- The RDF (3) includes a lot less info than the page (1 & 2). E.g.:
- RDF says only nm:findspot "38.183333 22.183333" while the page also says "Diakofto, on the coast c. 20 km. E of Aegium, Achaea, 1965".
While the nearby places may be recoverable from the coordinate info, the find year is irretriavably lost - all other free text is also lost, e.g. "Burial: c. 146 B.C. (T), Contents: 3000+ AR"
- RDF says only nm:findspot "38.183333 22.183333" while the page also says "Diakofto, on the coast c. 20 km. E of Aegium, Achaea, 1965".
- The RDF omits info that is not completely structured. E.g.
- this is lost from RDF because we don't know the exact Aetolian mint that made these 63 coins:
"Aetolian League: 63 triob." - the number of coins from each mint is not present. Therefore one cannot ask the total value of hoards, or find hoards including specific types of coins
- this is lost from RDF because we don't know the exact Aetolian mint that made these 63 coins:
- The HTML has structuring info expressed through indentation (that's why the data is in PRE) that is lost in the RDF.
- The HTML says this:
Hoard igch0262 includes: Sicyon: 459 triob CNG Coin Shop 775210 CNG Coin Shop 832025 CNG Coin Shop 831994 Achaean League: 1601 triob. CNG Coin Shop 775269 Messene: 6 triob. CNG Coin Shop 790011 CNG Coin Shop 877551
- while the RDF says this:
The following were made in mint Sicyon: CNG Coin Shop 775210 CNG Coin Shop 832025 CNG Coin Shop 831994 Hoard igch0262 includes: CNG Coin Shop 775269 CNG Coin Shop 790011 CNG Coin Shop 877551
So the RDF fails to state that the Sycion coins were found in that hoard
- The HTML says this:
Potential improvements:
- A more comprehensive data model can capture more of the info in RDF
- I think that these problems can be resolved by deploying SMW (semantic media wiki) and creating page templates and semantic forms that prompt the users to capture more of the info in a structured way
- Of course, a big question is how to migrate the existing data to a new modell/system.
If all entities (pages) are formatted consistently then it should be possible to capture more of the info:- recover more of the structure from the indentation
- make more links (e.g. to places) through text mining
NUDS questions
How long would it take to make nomisma (NUDS) to CRM (ResearchSpace style)
I assume that means only NUDS XML, not the Nomisma pages?
1. Is this to define a mapping, or also implement a convertor?
Could you please answer this?
Overall, this is a quite interesting topic, and a lot of data structuring work is done.
And some more questions (for discussion):
- Is the best approach to convert to CRM+FRBRoo? What will we done with this data afterwards?
- Are there any data entry solutions for NUDS XML data?
NUDS Data Entry Forms
I looked around and I see that you created an EAD data entry solution using Orbeon XForms:
http://wiki.numismatics.org/eaditor:introduction_to_admin_home
I guess you also created the other one (XEAC) since it looks similar, though I coudln't find your name on it.
http://wiki.numismatics.org/xeac:xeac
I guess a popular way to create NUDS is through conversion from EAD, since I see in one of the published examples:
http://papalcoinage.org/display/papal0002.xml "Migrated from EAD to NUDS (2012-01-28)"
And inside I see Orbeon namespaces, e.g. http://www.orbeon.com/oxf/processors
All this is very interesting! I have a lot of experience with XForms in varous Customs/Excise projects.
20121207 Ethan Gruber
Many of the id types on Nomisma are still in a proof of concept stage. Nomisma was set up several years before I joined the American Numismatic Society by Sebastian Heath, and I have little influence over data models and architecture. The hoards on Nomisma (from IGCH, for example) were a demonstration that they could be defined by URIs. The HTML description is taken nearly verbatim from a printed book, with a little additional markup. In the conversation to RDF, you are right that much of the semantic meaning is lost. We have a grant application in to the NEH, and we will soon go back to the drawing board with some of the data models and the architecture of Nomisma. Currently only two or three people have access to editing data on Nomisma, and this is done by editing the text directly (which has resulted in XHTML validation problems in the past). We will likely go the XForms route in editing. This will give us greater control over validation, among other things.
I am mainly responsible for NUDS/XML and for the development of another XML schema for describing coin hoards, as well as software development of Numishare and derivative projects (like OCRE) The hoard schema is derived NUDS, and very much in a draft state. I am developing it with data provided by Kris Lockyear for Roman Republican coin hoards. The model, and software designed to publish it, can interact heavily with coin types defined by Nomisma. Additionally, users can input typological attributes in the NUDS namespace for coin types not defined on Nomisma (or when it is necessary to store physical attributes, like weight and diameter).
Here is an example record: http://admin.numismatics.org/chrr/id/INF.xml
The file is little more than a bit of data about the findspot (a Geonames URI defining a place in Italy) and a list of contents (and associated numeric counts), mainly defined by Nomisma URIs for Roman Republican coin types.
- The HTML page (http://admin.numismatics.org/chrr/id/INF) gathers the RDF from Nomisma dynamically (and Apache Cocoon will cache it). This might be a little slower than storing the coin type data in a local database, but the advantage is that the people responsible for maintaining the hoard data do not need to concern themselves with maintaining coin type data.
- Data can be extracted rapidly directly from Nomisma to perform comparative analyses as well
The backend for Numishare is an XForms framework using Orbeon. I've been using Orbeon for probably three or four years now. We use it not only for creating and editing NUDS, but also publishing records to the datastore and indexing them into Solr, as well as interacting with REST APIs provided by various projects, like VIAF, Geonames, and, most importantly, querying Nomisma and storing its URIs in the xlink:href attributes in NUDS.
For now, I would recommend putting conversion of hoards to CRM on hold so that we can focus on mapping NUDS to CRM+FRBR. This would enable us to provide our collections in the alternative form of CRM for a ResearchSpace trial run.
20121012 Vladimir
We use Orbeon not only for creating and editing NUDS
A fundamental problem with CRM is that there are no established data viewing/editing solutions.
And perhaps there cannot be, since how do you enter a Man-Made Object?
Successful data forms are specific for each specific domain (one for Painting, completely different for Coin, quote different for Hoard).
So the question is not just how to convert some domain data (e.g. NUDS) to CRM, but then how to display/edit it.
The transformation should somehow "demarcate" the source information elements in the resulting RDF/RDFa graph/document.
but also publishing records to the datastore and indexing them into Solr, as well as interacting with REST APIs provided by various projects, like VIAF, Geonames, and, most importantly, querying Nomisma
I guess you use eXist for datastore?
I see at http://code.google.com/p/eaditor/ that you use/follow Dan McCreary's XRX architecture/pattern.
- http://code.google.com/p/xrx/
- http://www.danmccreary.com/xrx/index.html
- http://www.danmccreary.com/training/xrx/index.html
I studied to some extent these wikibooks. A couple are also available in PDF:
- http://en.wikibooks.org/wiki/XForms:
very useful, hundreds of examples
- http://en.wikibooks.org/wiki/XQuery
- http://en.wikibooks.org/wiki/XRX
I also studied some XML Prague 2012 proceedings, in particular
- JSONiq: XQuery for JSON, JSON for XQuery
- Treating JSON as a subset of XML: Using XForms to read and submit JSON
Ethan's papers/presentations:
- XForms for Libraries, An Introduction paper in Code4Lib Journal 11: September 2010
- Linking Roman Coins presentation
20121211 Ethan Gruber
We're using eXist as our datastore. eXist comes pre-installed within Orbeon as a default, so we have opted for using it. The architecture isn't quite XRX. There is some xquery in the administrative back-end of Numishare, but the search functions have been separated from the datastore. We get much better performance out of Solr than we would building a public user interface that uses xquery against the eXist database. If you are interested in reading more about Numishare's architecture, I can send you a paper that will be published in the 2012 proceedings for the Computer Applications in Archaeology conference.
As for CRM, it may be theoretically to use XForms. CRM is a sophisticated model, so I don't think that one form could be used to edit every permutation of CRM (paintings, sculpture, coins, etc.). However, I wonder if schematron (http://www.schematron.com/) might be useful for defining the models for various domains of CRM. It may be possible to generate editing interfaces dynamically based on schematron rules. Xquery is useful in the administrative back-end in this case because it would allow you to connect a painting with an artist that already exists in the system, for example. My knowledge of CRM is very basic, so it's difficult for me to comment on what may or may not be possible, but certainly XForms (and Orbeon) allow for some pretty complex web service interactions.
20121212 Vladimir
Currently we use heavily modified RForms (code.google.com/p/rforms) to render CRM RDF.
.bq As for CRM, it may be theoretically to use XForms. CRM is a sophisticated model, so I don't think that one form could be used to edit every permutation of CRM (paintings, sculpture, coins, etc.).
Yes. We'd need different "application profiles" to have useful data entry forms.
A thought that keeps banging inside my head: the same "units of work" (templates defining business sub-objects) should be used to specify:
- XML->RDF conversion
- RDF visualization
- RDF editing
I wonder if schematron (http://www.schematron.com/) might be useful for defining the models for various domains of CRM. It may be possible to generate editing interfaces dynamically based on schematron rules
If you need cross-field validation, XSD is not enough and you have to use Schematron to validate (or maybe RelaxNG, but I'm not that familiar with it), or XForms to enforce the rules when entering data.
The match between Schematron xpath rules & assert messages, and XForms MIPS (especially "constraint" and "calculate") and alerts is obvious.
- There was an early attempt to generate Schematron->Xforms at IBM alphaWorks:
http://www.ibm.com/developerworks/xml/library/x-xfrmschematron/
but I tried it at the time and it didn’t work well.
Now both the XML Forms Generator and Visual XForms Designer are gone; I may have some stashed JARs... - here's a good presentation of Schematron->XForms generation
http://infinitesque.net/2007/XML2007-OgbCla07/validation.html
In our Customs & Excise projects we use our own tool called Sirma RAD (SRAD), because there weren’t very good XForms implementations at the time, and because we need some extra features.
E.g. here’s a “R”equired rule and its implementation with 1 line of XPath
Here is the rule enforced at the user interface (rightmost field is marked as Required in yellow, and the tooltip shows the rule and all other info about the field)
The same rule definition is validated using the same implementation when we receive a XML message.
Here’s the generated error message according to DG TAXUD specifications (notice the last 2 fields):
All of the above are XML technologies, but how about RDF?
It may be possible to adapt XForms to work over an RDF graph; “combed” from a root object down through all its sub-objects.
But that’s a much bigger topic…
An alternative is RDFa editors, of which there is a bunch…