View Source

{toc}


h2. Overview

Ontotext provides a crawling tool called RSS Feeder, which is integrated with KIM. Thus, you can manage RSS feeds to be crawled and populated to a running KIM server. However, it is a standalone application which is distributed and runs separately from KIM.

h2. Download and install

When downloading KIM, there should be a download link to the RSS Feeder as well.

After acquiring the zip file just extract it somewhere. To start it navigate to the extracted folder and execute:
* bin/ncf.sh for Linux/Mac
* bin/ncf.bat for Windows
{note}By default the RSS Feeder starts with no configured feeds. You can add some through KIM's management section.{note}

The RSS Feeder will run with a universal boilerplate removal algorithm based on a patched Java port of Arc90's readability.

h2. Manage RSS Feeds from KIM

This tutorial supposes you already have a local KIM server running.
# Navigate to the KIM UI - [http://localhost:8080/KIM]
# Click on *Manage*, as shown in the picture below. The default credentials are user: admin, pass: admin.
!manage.png|thumbnail,border=1,align=center!
# Now click on *Manage RSS feeds*.
!management.png|thumbnail,border=1,align=center!
# If your setup is correct you will see a page like this and you will be able to add/remove and search through currently configured feeds.
!tracked-feeds.png|thumbnail,border=1,align=center!
# In case the RSS Feeder isn't connected to KIM the following page will be displayed
!rss-feed-manager-error.png|thumbnail,border=1,align=center!

h2. Advanced RSS Feeder configuration

h3. Configure KIM server location

In a very common situation the KIM server and the RSS Feeder won't run on their default host and port, and will be also located on different machines. To configure where the KIM server is running edit the file _kim_connection.properties_, which is located in the _config_ folder of the RSS Feeder distribution.

h3. Configure default feeds

One may not like the default feeds, which are quite a lot and cover some general topics like finance, healthcare, technology and politics. Instead, a person would probably like to subscribe only to a few specific feeds. The default feeds are configured in the file _feeds.xml_ which is also located in the _config_ folder of the RSS Feeder distribution. To erase them, just remove all defined _feed_ tags.