
Overview
We have a crawling tool called RSS News Collector, which is integrated with KIM. Thus, you can manage RSS feeds to be crawled and populated to a running KIM server. However, it is not a part of the KIM distribution and runs as a separate application.
Download and install
We've been using the RSS News Collector only internally, so it is not publicly available for download. There are 2 options:
- build it from source
- checkout https://svn.ontotext.com/svn/kim/trunk/artifacts/rss-feeder/
- a zip package will be generated in ${project.checkout.dir}/target/rss-feeder-3.5-SNAPSHOT-bin.zip
- download it from Hudson - http://hudson.ontotext.com/view/KIM/job/kim-distribution/ws/rss-feeder/target/rss-feeder-3.5-SNAPSHOT-bin.zip (User: onto-dev, Pass: tutmanik1)
After acquiring the zip file, just extract it somewhere. To start it navigate to the extracted folder and execute:
- bin/ncf.sh for Linux/Mac
- bin/ncf.bat for Windows
The RSS News Collector does not implement availability tactics like ping - echo. It will try to connect to KIM only once, so you should start your KIM server before the RSS News Collector!
The RSS News Collector will run with default configuration - universal boilerplate removal algorithm and a list of popular news agencies' RSS feeds (~60).
Manage RSS Feeds from KIM
This tutorial supposes you already have a local KIM server running.
- Navigate to the KIM UI - http://localhost:8080/KIM
- Click on Manage, as shown in the picture below. The default credentials are user: admin, pass: admin.
- Now click on Manage RSS feeds.
- If your setup is correct you will see a page like this and you will be able to add/remove and search through currently configured feeds.
- In case the RSS News Collector isn't connected to KIM the following page will be displayed
Advanced RSS News Collector configuration
Configure KIM server location
In a very common situation the KIM server and the RSS News Collector won't run on their default host and port, and will be also located on different machines. To configure where the KIM server is running edit the file kim_connection.properties, which is located in the config folder of the RSS News Collector distribution.
Configure default feeds
One may not like the default feeds, which are quite a lot and cover some general topics like finance, healthcare, technology and politics. Instead, a person would probably like to subscribe only to a few specific feeds. The default feeds are configured in the file feeds.xml which is also located in the config folder of the RSS News Collector distribution. To erase them, just remove all defined feed tags.