Skip to end of metadata
Go to start of metadata
Name Size Creator Creation Date Comment  
Java Source ParseXML.java 26 kB tna Nov 24, 2011 15:18 Slight re-working of mitac's java code so it outputs utf-8 and takes parameters  
ZIP Archive migration-aproach-mitac-kalin.zip 22 kB Dimitar Manov Nov 17, 2011 18:12 Supporting files for the Java/DOM approach  
Text File data-mig-TalenD-notes-20111114.txt 3 kB Vladimir Alexiev Nov 18, 2011 09:34    

Preparation

Approach

Mitac, Kalin, SSL please write considerations

Preferred approach (Java/DOM parser + output to RDF(Turtle or NTriples))

  • parse the XML using DOM parser
  • convert each XML tag to a list of statements (type, note, attributes and relations)
  • output the statements to a file

Example code:

Excerpt from migration - simple fields; the code is refactored
Excerpt from migration - nested/collection fields; the code is not refactored

In progress

  • thesaurus usage is just a mockup, not implemented and cases when element not found/multiple elements found are not resolved
  • refactoring of the initial code is only partially done, we wanted to show the approach (in the code examples above), not to write the full migration
  • Files migration-aproach-mitac-kalin.zip
    • Sample output (OUTPUT.txt)
    • Java files (*.java)
  • Please note the use of DataOutputStream writeBytes() in the code above is problematic as we lose the benefit of built in utf-8 support in Java. If a BufferedWriter write() call is used we do not have the issue.
  • I propose refactoring the code to parametrize calls the as much as possible so for instance apart from the parseFrame method much could be driven by a properties file. This grows out of the proposal to use a properties file for setting the thesaurus references.
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.