KIM provides the ability to (de)serialize annotated documents in the XML Corpus Encoding Standard (XCES) format. The serialized document contains document features as well as content with inline annotations.
This serialization enables you to store documents in a more human-readable format compared to the default GATE xml, where content and annotations are stored separately. Moreover, it is easy to extend and customize for any other format. KIM utilizes the transformation to write documents to xhtml.
(De)serializing GATE documents is done by the SerializationXCES class. It uses the SAX event handling methodology. The "read" and "write" methods for GATE documents reside in the SerializationXCES class.
Writing (serialization) is performed by XCESWriter.write (KIMDocument, ContentHandler). It takes a KIM document and fires consecutively events for writing document features, pieces of content, and annotations. The given ContentHandler handles these events (full list of methods). For example, SerializationXCES uses a handler that simply prints the xml to a file.
For the reading (deserialization), SerializationXCES uses a custom ContentHandler , which handles the SAX events and restores the document.
Serialization to XHTML
When documents are displayed in the document detail screen, the content and annotations need to be in html format. For instance, we surround each annotation with an <a> tag. To achieve this, KIM uses the XCESWriter.write method with a custom handler (XHTMLHandler), which transforms xml tags to xhtml. The xhtml handler does not write html directly, but forwards it to a given content handler. This enables developers to add custom handlers to the chain. The default KIM interface uses a simple XML handler.
A common technique for modeling the content layout and styling is by adding the so-called markup annotations. You can add custom markup with jape rules or other methods, while documents are annotated. KIM knows that an annotation is marked-up, if it is in the "Markup" annotation set.
For instance, if you have two persons in a sentence, add a "two_persons" markup annotation, which later will be transformed to <div class="two_persons"> and style it with css. The other option is to add "div" annotation with the feature "class" with value "two_persons".
Custom markup GAPP
Sometimes the number of markup annotations needed for representing documents can grow significantly. In this case, we don't recommend to generate and store them when you add documents. Instead, KIM allows you to run a custom GATE application each time a document is displayed. This gives you the freedom to generate the document markup dynamically, based on its existing annotations. To achieve this, you have to set this property in the kwebui.config:
For displaying the document you can generate all types of annotations in markup.gapp. They are not stored, after the page is rendered.
Custom HTML Handler
If you need to modify the markup KIM produces, you can extend the chain of SAX handlers with a custom one. By default, when displaying documents, the XCESWriter.write method uses internally a GateToXCESHandler to handle the SAX events. After some processing, they are forwarded to the XHTMLHandler (given as parameter to the "write" method), which uses a simple XML handler to write tags to a given stream. Here is how it is done in DocumentDetailScreen:
In order to add a custom handler, you can do this: