Skip to end of metadata
Go to start of metadata

Once annotated, articles for the recommendation service can be uploaded to the /content endpoint or added directly to Solr. Annotating articles is outside the scope of this document.

Article schema

Recommendation articles consist of fields, each field having a name and value. Fields used by recommendation are:

  • id (required) - the id of the article, could be arbitrary string.
  • title (optional) - the title of the article.
  • summary (optional) - a short summary of the article.
  • content (optional) - the text content of the article.
  • published (optional) - the date and time of publishing this article. Allowed formats are described here
  • url (optional) - the source URL of the article, if applicable
  • tags (optional) - a space-separated list of tags for this article. Anything can be a tag, but we usually use instance ids of mentioned entities. For example, an article mentioning New York, London and New York again will have "http://dbpedia.org/resource/New_York_City http://dbpedia.org/resource/London http://dbpedia.org/resource/New_York_City" in its tags field. This field should be used for named entities only - general "important" words and phrases from the text should go in the keyphrases field.
  • keyphrases (optional) - a space-separated list of keyphrases. The format is the same as tags - anything can be a keyphrase, but we generally use generated URIs like http://data.ontotext.com/publishing/topic/Metal or http://data.ontotext.com/publishing/topic/Oil_refiner. Again, named entities should go in tags and general important words and phrases in this field.
Recommendations mostly work with tags and keyphrases so having those is important

Recommendation articles are naturally represented by a simple JSON object, for example:

Uploading articles through the /content endpoint (recommended)

If the recommendation engine is deployed at http://recommendation/app, then a couple of actions are available through http://recommendation/app/content:

  • POST /content/<id> to upload a single article with the specified id. In that case, you can skip the id field from the JSON. If both are present, the one from the URL will take precedence. Example:
  • POST /content to upload a list of articles. Example
  • GET /content/<id> to retrieve the article with the specified id in the format above (example: GET /content/test-document-1). For example GET /content/test-document-1 will yield:
  • DELETE /content/<id> to delete the document with the specified id.

Notes:

  • for the /content endpoint, the PUT method is equivalent to POST and the two can be used interchangeably
  • Content-type header for PUT and POST requests should be application/json
  • It's recommended (though not required) to set Accept: application/json header for all requests

Uploading directly to Solr (not recommended)

Uploading data to Solr is best described by the relevant Solr documentation. The article format is the same as for /content endpoints.

Reference: relevant Solr schema.xml sections:

Fields:

And the onto_tags type used above

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.