View Source

{excerpt}Counting and analysis of repository content{excerpt}
{toc}
{attachments:patterns=.*xls}

h2. Counting
- Total statements
{noformat}
select (count(*) as ?c) {?s ?p ?o}
{noformat}
- statements per property.
The max limit=200, so we get them in two portions:
{noformat}
select ?p (count(*) as ?c) {?s ?p ?o} group by ?p order by ?p
select ?p (count(*) as ?c) {?s ?p ?o} group by ?p order by ?p offset 200
{noformat}
- class instances (one instance has many rdf:type!)
{noformat}
select ?t (count(*) as ?c) {?s rdf:type ?t} group by ?t order by ?t
{noformat}

h2. Analysis
We provide historic data, but focus on the latest data (BM-triples.xls of 2012-12).

h3. Properties
!triples-BM-props.png!
Wihout sameAs expansion: 89995389 (2.9M=3.1% less triples)

- rdf:type=58426160 is 62.9% of all triples (see breakdown below)
- Object (business) & thesauri triples are 26.0+4.9=30.9%, of which we can assume objects are 21% and thesauri 10%.
- FRs=5751214 are 6.2% of all triples, or 29% of business triples
- bmo:PX_physical_description=25584 ~ rso:FC70_Thing=23993 is 3x more than the 8k objects!? Due to owl:sameAs
- owl:sameAs=72010 is 9x more than the 8k objects.
Each object has 3 sameAs URIs (a,b,c), which causes 9 statements: aa bb cc ab bc ca ba cb ac
That's what an equivalence relation will do to you.
- skos:inScheme=357283 ~ skos:Concept=357318 is the total number of thesaurus terms
- skos:exactMatch=4495 come from RKD. E.g. rkd-plaats:renaix and rkd-plaats:renaix give 4 triples (2 symmetric, 2 reflexive)

h3. Types
!triples-BM-types.png!
- _:nodeXX=23528903: 40.3% useless OWL DL restriction types
{noformat} crm:En_Whatever rdf:type [owl:Restriction...] {noformat}
We could eliminate these (24% of all triples) by:
-# Delete such statements *after* loading the ontologies and *before* loading the data
{noformat}
delete where {?e rdfs:subClassOf ?t. ?t a owl:Restriction}
{noformat}
-# Write a perl script to cut down ECRM to RDFS+inverse (what Doerr wanted) + transitive
- CRM classes=30864964: 52.8%: this is broken down into a decreasing number down the class hierarchy (ok):
owl:Thing=3627096 ~ crm:E1_CRM_Entity=3626903
crm:E77_Persistent_Item=3092726
crm:E2_Temporal_Entity=240162

h3. Statements and MB
{viewxls:name=triples-BM.xls|sheet=statements}