View Source

{toc}

h1. Background

* OWLIM repository is created from the files in trunk/data, which needs to be updated with the latest files
* entity-api is the project that is responsible for the repository creation
* we use two separate scripts, one to generate the repository, and another to deploy it. This is because the repository creation can fail due to various reasons
* the creation script also updates entity-api and data projects to the latest version
* the SVN projects are checked out to /nidata/researchspace/trunk on the dev server

h2. Summary

The whole update is executed as follows (but read the next section for the details first)
* login to the dev server
* run the following commands (checking if each step went ok):
{code}
cd /nidata/researchspace/trunk/entity-api
bin/create-repo
nohup bin/ctap > ctap.txt &
bin/deploy-cr4
{code}

h2. Update instructions

- Login (ssh) to the dev server, see \[[0 Contacts]\] for login details
- Go to entity-api folder
{code}
cd /nidata/researchspace/trunk/entity-api
{code}

- Create the repository
{code}
bin/create-repo (takes about 10 mins)
{code}
- Check if it says BUILD Successful at the end
The successful output looks like this:
{code}
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 10 seconds
[INFO] Finished at: Thu Dec 01 18:26:26 EET 2011
[INFO] Final Memory: 93M/235M
[INFO] ------------------------------------------------------------------------
adding: repositories/susana/ (stored 0%)
adding: repositories/susana/storage/ (stored 0%)
adding: repositories/susana/storage/entities.index (deflated 100%)
adding: repositories/susana/storage/pos.index (deflated 96%)
adding: repositories/susana/storage/backup (deflated 89%)
adding: repositories/susana/storage/predicates (deflated 100%)
adding: repositories/susana/storage/entities-doc (deflated 81%)
adding: repositories/susana/storage/entities (deflated 91%)
adding: repositories/susana/storage/pso.index (deflated 96%)
adding: repositories/susana/storage/fts/ (stored 0%)
adding: repositories/susana/storage/fts/storage.index (deflated 96%)
adding: repositories/susana/storage/fts/storage.tree (deflated 93%)
adding: repositories/susana/storage/owlim.properties (deflated 72%)
adding: repositories/susana/storage/pos (deflated 99%)
adding: repositories/susana/storage/entities.hash (deflated 100%)
adding: repositories/susana/storage/pso (deflated 99%)
{code}

- Add the annotation points (takes about 2.5h; this is why we start it with "nohup")
{code}
nohup bin/ctap > ctap.txt &
{code}

- Check if everything went ok
{code}
tail -20 ctap.txt
{code}
-- It should say "BUILD SUCCESSFUL" at the end

- If both the creation and the annotations went OK, deploy either as "susana" or as "susana.new":
-- For "susana"
{code}
bin/deploy-cr4
{code}
-- For "susana.new"
{code}
bin/_deploy-new
{code}

Note. Both deploy scripts stop and the start the tomcat where the openrdf-workbench is deployed.

Note. If you are building on a less well endowed box you may benefit from reducing the \-Xmx12g JVM parameter to around \-Xmx3g in the create-repo script


h2. Important steps in BM Repository Creation

- Set "FR-Implementation" as BM Repository ruleset;

- Insert both the ontology and BM data within SystemTransaction;

- Add the ontology to the BM repository:
Loading the ontology \*.ttl files(listed in String array A) from main loading directory - ../data

- Fixing QUDT units:
Executing insert Sparql query for {?u a skos:Concept; skos:inScheme unit: }
where {?u qudt:symbol ?s}

- Add main BM Thesauri
Loading thesauri \*.trig files from server directory: data/BM/thesauri;

- Simplify ECRM: remove owl:Restriction:
Executing query - "delete where {?e rdfs:subClassOf ?t. ?t a owl:Restriction}",
Following "RS owl:Restriction" RS-1279-Optimize repo loading,
delete blank node types owl:Restriction (will reduce size by 24%);
Note: The DELETE doesn't work well. Therefore RS-1279 "variant 2 becomes the preferred one". And RS-1370 makes an ontology without owl:Restriction, where this DELETE step is not needed([https://confluence.ontotext.com/display/ResearchSpace/ECRM+Simplification])

- Fix thesauri labels:
It is now fixed (does not generate extra pref labels). See RS-1040.

- Create thesauri index:
Selecting by rdf:type skos:Concept to .uri files,
applying Lucene parameters and creating index via ASK queries included in "thes.lucene":
luc:moleculeSize = "2";
luc:index = "uris";
luc:includeEntities = "";
luc:includePredicates = "<[http://www.w3.org/2000/01/rdf-schema#label]>";
luc:languages = "en,nl,none";
luc:useRDFRank = "yes";

- Main BM data adding
Loading from /BM/data/data.zip

- Add paintings
Executing /InsertMainImageQueries.sparql inserting rdf:type rso:E38_Main_Image statements,
getting via rso:P3_has_image_file certain ".jpg" files listed

- Fix RS-1375 issues:
Delete orphan Image (asset) statements;
Currently disabled.

- Adding BM images
Loading from /BM/data/images.zip

- Create main index
Replaced with the following points:

- Total Object counting
Counting rdf:type rso:FC70_Thing objects

- Calculating Thesaurus Counts
Saving them in HashMap for further usage

- Saving Thesaurus counts to RDF Knowledge Base repository

- Creating Autocomplete Index
Using StandardAnalyzer Version.LUCENE_35
Iterating through sparql selecting thesauri skos:inScheme and optionally crm:P3_has_note, skos:scopeNote, skos:altLabel; rso:numberOfUses