- read Rembrandt data to learn the data. A DB schema is enclosed (not so complex) and sample data in excel
- read Rembrandt to CRM- old for a draft sample converted to CRM.
- the diagrams are not entirely correct
- IMHO easiest to understand is the RDF Turtle:
(SSL, to get svn accounts post a task in jira comp "00 Infra" giving the names & emails)
Color-coded html (export from Emacs), not final version
- read Data Migration and Ingestion Tools for tools that can be useful.
The study of these tools should determine the approach
- I still need to write a spec of the migration (should be done Nov 9 right after my vacation).
You can see notes in the excel Rembrandt data#Reduced Sample Record (last column) that should give you some idea about tricky parts
Mitac, Kalin, SSL please write considerations
- parse the XML using DOM parser
- convert each XML tag to a list of statements (type, note, attributes and relations)
- output the statements to a file
- thesaurus usage is just a mockup, not implemented and cases when element not found/multiple elements found are not resolved
- refactoring of the initial code is only partially done, we wanted to show the approach (in the code examples above), not to write the full migration
- Files migration-aproach-mitac-kalin.zip
- Sample output (OUTPUT.txt)
- Java files (*.java)
- Please note the use of DataOutputStream writeBytes() in the code above is problematic as we lose the benefit of built in utf-8 support in Java. If a BufferedWriter write() call is used we do not have the issue.
- I propose refactoring the code to parametrize calls the as much as possible so for instance apart from the parseFrame method much could be driven by a properties file. This grows out of the proposal to use a properties file for setting the thesaurus references.