Counting and analysis of repository content
- Total statements
- statements per property.
The max limit=200, so we get them in two portions:
- class instances (one instance has many rdf:type!)
We provide historic data, but focus on the latest data (BM-triples.xls of 2012-12).
Wihout sameAs expansion: 89995389 (2.9M=3.1% less triples)
- rdf:type=58426160 is 62.9% of all triples (see breakdown below)
- Object (business) & thesauri triples are 26.0+4.9=30.9%, of which we can assume objects are 21% and thesauri 10%.
- FRs=5751214 are 6.2% of all triples, or 29% of business triples
- bmo:PX_physical_description=25584 ~ rso:FC70_Thing=23993 is 3x more than the 8k objects!? Due to owl:sameAs
- owl:sameAs=72010 is 9x more than the 8k objects.
Each object has 3 sameAs URIs (a,b,c), which causes 9 statements: aa bb cc ab bc ca ba cb ac
That's what an equivalence relation will do to you.
- skos:inScheme=357283 ~ skos:Concept=357318 is the total number of thesaurus terms
- skos:exactMatch=4495 come from RKD. E.g. rkd-plaats:renaix and rkd-plaats:renaix give 4 triples (2 symmetric, 2 reflexive)
- _:nodeXX=23528903: 40.3% useless OWL DL restriction types
We could eliminate these (24% of all triples) by:
- Delete such statements after loading the ontologies and before loading the data
- Write a perl script to cut down ECRM to RDFS+inverse (what Doerr wanted) + transitive
- CRM classes=30864964: 52.8%: this is broken down into a decreasing number down the class hierarchy (ok):
owl:Thing=3627096 ~ crm:E1_CRM_Entity=3626903