Skip to end of metadata
Go to start of metadata
You are viewing an old version of this page. View the current version. Compare with Current  |   View Page History

Overview

We have a crawling tool called RSS News Collector, which is integrated with KIM. Thus, you can manage RSS feeds to be crawled and populated to a running KIM server. However, it is not a part of the KIM distribution and runs as a separate application.

Download and install

We've been using the RSS News Collector only internally, so it is not publicly available for download. There are 2 options:

After acquiring the zip file, just extract it somewhere. To start it navigate to the extracted folder and execute:

  • bin/ncf.sh for Linux/Mac
  • bin/ncf.bat for Windows
    The RSS News Collector does not implement availability tactics like ping - echo. It will try to connect to KIM only once, so you should start your KIM server before the RSS News Collector!

The RSS News Collector will run with default configuration - universal boilerplate removal algorithm and a list of popular news agencies' RSS feeds (~60).

Manage RSS Feeds from KIM

This tutorial supposes you already have a local KIM server running.

  1. Navigate to the KIM UI - http://localhost:8080/KIM
  2. Click on Manage, as shown in the picture below. The default credentials are user: admin, pass: admin.
  3. Now click on Manage RSS feeds.
  4. If your setup is correct you will see a page like this and you will be able to add/remove and search through currently configured feeds.
  5. In case the RSS News Collector isn't connected to KIM the following page will be displayed

Advanced RSS News Collector configuration

Configure KIM server location

In a very common situation the KIM server and the RSS News Collector won't run on their default host and port, and will be also located on different machines. To configure where the KIM server is running edit the file kim_connection.properties, which is located in the config folder of the RSS News Collector distribution.

Configure default feeds

One may not like the default feeds, which are quite a lot and cover some general topics like finance, healthcare, technology and politics. Instead, a person would probably like to subscribe only to a few specific feeds. The default feeds are configured in the file feeds.xml which is also located in the config folder of the RSS News Collector distribution. To erase them, just remove all defined feed tags.

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.