Skip to end of metadata
Go to start of metadata

Measure the speed of the different steps in Repository Creation application


The 1st dataset is with 6k objects, the 2nd is with 44k objects (lots of sameAs though, so actually half of it show as Museum objects in the UI).
The 2 test runs were measured on different machines (Mitac's laptop vs. Cr4 server), so a new clean nightly run on Cr4 with the small set is needed. Still there are some obvious points:

  • thesauri index takes about a minute, it has ~250k objects
  • main index has 6k (or 44k) objects and takes 0.5h (2.5h) because of the larger molecules and because each molecule should be navigated
  • The speed of Lucene indexing is
    item size time speed
    thesauri 250k 1min 250k/min
    6k objects 6k 30min 200/min
    44k objects 6k 30min 300/min
  • The speed of adding objects is: 700/min vs. 1300/min for the larger dataset; this also includes the overhead of parsing

New Mapping


8k 115k
museum objects 23,993 115,559
explicit statements 14,697,760 24,749,779
total statements 99,139,808 192,550,603
entities 7,720,698 10,355,104
file sizes MB MB
thesaurus FTS index size 24 27
object FTS index size 126 13,312
total repo size 4,506 21,504
repo-indices 4,356 8,165
Loading times sec sec hrs hrs
object FTS indexing 1,317 48,673 0.4 13.5
add objects 5,626 99,354 1.6 27.6
add images 3,129 3,063 0.9 0.9
thesaurus FTS index 171 167 0.0 0.0
add ontologies/thesauri 1,282 1,151 0.4 0.3
Total 11,525 152,408 3 42
  • objects: the 8k repo has sameAs, so the objects are tripled. E.g. Lucene indexes 24k objects, not 8k
  • statement expansion explicit:total has grown from 5.5-6.5 to 8x, need to investigate this
  • the 115k repo uses the new objects, but old thesauri/images files
  • FTS indexing is quite fast. But FTS size is still too large

Full Set

See BM Data Volumetrics#Full Set

  • Storage location was on a RAM drive. Took 55G out of 64G. Using a RAM drive for repo load is times faster according to previous experiments on other servers
  • storage size: 50+GB
  • adding BM objects: start speed 132 obj/s, end speed: 26 obj/s. Approx ~20h total time
  • ~2M BM bjects according to nuxeo ID file (not 1.5M as we said before)
  • lots of DBPedia thesauri items, w/o label; don't know where those came from
  • ~407,000 thesauri items, indexed in 3min
  • nuxeo ids added in 2600s (<1h)
  • failed w/ Exception during Rembrandt paintings
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.