Skip to end of metadata
Go to start of metadata

Overview

Ontotext provides a crawling tool called RSS Feeder, which is integrated with KIM. Thus, you can manage RSS feeds to be crawled and populated to a running KIM server. However, it is a standalone application which is distributed and runs separately from KIM.

Download and install

When downloading KIM, there should be a download link to the RSS Feeder as well.

After acquiring the zip file just extract it somewhere. To start it navigate to the extracted folder and execute:

  • bin/ncf.sh for Linux/Mac
  • bin/ncf.bat for Windows
    By default the RSS Feeder starts with no configured feeds. You can add some through KIM's management section.

The RSS Feeder will run with a universal boilerplate removal algorithm based on a patched Java port of Arc90's readability.

Manage RSS Feeds from KIM

This tutorial supposes you already have a local KIM server running.

  1. Navigate to the KIM UI - http://localhost:8080/KIM
  2. Click on Manage, as shown in the picture below. The default credentials are user: admin, pass: admin.
  3. Now click on Manage RSS feeds.
  4. If your setup is correct you will see a page like this and you will be able to add/remove and search through currently configured feeds.
  5. In case the RSS Feeder isn't connected to KIM the following page will be displayed

Advanced RSS Feeder configuration

Configure KIM server location

In a very common situation the KIM server and the RSS Feeder won't run on their default host and port, and will be also located on different machines. To configure where the KIM server is running edit the file kim_connection.properties, which is located in the config folder of the RSS Feeder distribution.

Configure default feeds

One may not like the default feeds, which are quite a lot and cover some general topics like finance, healthcare, technology and politics. Instead, a person would probably like to subscribe only to a few specific feeds. The default feeds are configured in the file feeds.xml which is also located in the config folder of the RSS Feeder distribution. To erase them, just remove all defined feed tags.

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.