Version 6 by Stefan Enev
on Apr 11, 2012 14:18.

compared with
Version 7 by Stefan Enev
on Mar 12, 2013 19:29.

Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (13)

View Page History
h2. Overview

We have a crawling tool called RSS News Collector, which is integrated with KIM. Thus, you can manage RSS feeds to be crawled and populated to a running KIM server. However, it is not a part of the KIM distribution and runs as a separate application.
Ontotext provides a crawling tool called RSS Feeder, which is integrated with KIM. Thus, you can manage RSS feeds to be crawled and populated to a running KIM server. However, it is a standalone application which is distributed and runs separately from KIM.

h2. Download and install

We've been using the RSS News Collector only internally, so it is not publicly available for download. There are 2 options:
* build it from source
** checkout [https://svn.ontotext.com/svn/kim/trunk/artifacts/rss-feeder/]
** {code}mvn clean install{code}
** a zip package will be generated in $\{project.checkout.dir\}/target/rss-feeder-3.5-SNAPSHOT-bin.zip
* download it from Hudson - [http://hudson.ontotext.com/view/KIM/job/kim-distribution/ws/rss-feeder/target/rss-feeder-3.5-SNAPSHOT-bin.zip] (User: onto-dev, Pass: tutmanik1)
When downloading KIM, there should be a download link to the RSS Feeder as well.

After acquiring the zip file, just extract it somewhere. To start it navigate to the extracted folder and execute:
* bin/ncf.sh for Linux/Mac
* bin/ncf.bat for Windows
{note}The RSS News Collector does not implement availability tactics like ping - echo. It will try to connect to KIM only once, so you should start your KIM server before the RSS News Collector\!{note}
{note}By default the RSS Feeder starts with no configured feeds. You can configure such through KIM's management section.{note}

The RSS News Collector will run with default configuration - universal boilerplate removal algorithm and a list of popular news agencies' RSS feeds (~60).
The RSS Feeder will run with a universal boilerplate removal algorithm based on a patched Java port of Arc90's readability.

h2. Manage RSS Feeds from KIM
# If your setup is correct you will see a page like this and you will be able to add/remove and search through currently configured feeds.
!tracked-feeds.png|thumbnail,border=1,align=center!
# In case the RSS News Collector Feeder isn't connected to KIM the following page will be displayed
!rss-feed-manager-error.png|thumbnail,border=1,align=center!

h2. Advanced RSS News Collector Feeder configuration

h3. Configure KIM server location

In a very common situation the KIM server and the RSS News Collector Feeder won't run on their default host and port, and will be also located on different machines. To configure where the KIM server is running edit the file _kim_connection.properties_, which is located in the _config_ folder of the RSS News Collector Feeder distribution.

h3. Configure default feeds

One may not like the default feeds, which are quite a lot and cover some general topics like finance, healthcare, technology and politics. Instead, a person would probably like to subscribe only to a few specific feeds. The default feeds are configured in the file _feeds.xml_ which is also located in the _config_ folder of the RSS News Collector Feeder distribution. To erase them, just remove all defined _feed_ tags.