GraphDB on the Cloud

Version 1 by petar.kostov
on Jul 04, 2013 16:05.

compared with
Current by marin.dimitrov
on Jan 27, 2015 19:23.

This line was removed.
This word was removed. This word was added.
This line was added.

Changes (147)

View Page History
\\ {toc}
\\ !owlim_logo.png|height=88,width=240!
Semantic Repository for RDF(S) and OWL

h1. *OWLIM-SE on Amazon EC2*
h1. Introduction

The self-managed version of GraphDB is a hosted database in the Cloud providing all the power of a scalable triple store as a pay-by-the-hour service through [Amazon Web Services|]. GraphDB (Standard Edition) can be purchased as an AMI running on EC2 instances with 4-core / 32 GB RAM or or 8-core / 64 GB RAM. 

h2. *{_}User Guide{_}*\\
Our customers often tell us that they want to develop and test in the cloud before bringing projects in-house. Now, you can do that without the need for buying GraphDB licenses or provision hardware first - GraphDB in the Cloud is perfect for running limited-time projects or low-volume experiments in a production-quality setting without an investment in hardware.

All GraphDB instances are designed to store data on user-supplied Amazon EBS volumes (network attached storage), so that your data is persisted and safe even if the instance is not running. GraphDB in the Cloud is accessible via standard [RESTful APIs|] and SPARQL endpoints

h3. *Table of Contents*
h2. Amazon Web Services

The following [Amazon Web Services|] concepts which are related to running GraphDB on the AWS cloud:

h3. Introduction

{color:#000000}The process of using OWLIM on the AWS cloud involves the following steps:{color}
# Buying the OWLIM product on AWS
# Starting an EC2 instance with the OWLIM AMI
# Attaching an existing EBS volume to the EC2 instance
# Logging into the EC2 instance via SSH
# Mounting the EBS volume on the filesystem of the OWLIM instance
# If necessary, configuring OWLIM
# Starting the OWLIM process
* [AWS Marketplace|] is an online marketplace which makes it possible for customers to use its "1-Click deployment" to instantly launch pre-configured software and services on the AWS cloud infrastructure and pay only for what they use by the hour
** The GraphDB software is available as a product on the AWS Marketplace.
* [AMI|] (Amazon Machine Image) provides a virtual server image which can be instantly launched on the AWS cloud
** GraphDB provides such an AMI, and customers can provision it on virtual instances running on AWS.
* [EC2|] (Elastic Compute Cloud) is the computing infrastructure where AMIs are launched as _virtual instances_. [Security groups|] configure the firewalls controlling the netwprk traffic to a running virtual EC2 instance. [Key pairs|] are used to encrypt and decrypt login information and must be used for accessing a running EC2 instance.
** The GraphDB AMI will be provisioned as an EC2 virtual instance and a security group will be used to restrict network access to the instance, based on the user preferences
** the user will use the private key pair to log into the running EC2 virtual instance with GraphDB
* [EBS|] (Elastic Block Store) provides network attached storage volumes that can be used with running EC2 instances
** the EBS volume is created via and managed by the user's own AWS account. The user is responsible for data volume maintenance tasks such as: volume expansion, snapshots, backup & restore.
* [on-demand|] EC2 instances are charged by the hour with no long-term commitments or upfront payments, while the [reserved |]EC2 instances provide a cheaper alternative to on-demand instances for longer term use. *Note* that GraphDB *SHOULD NOT* be deployed on spot instances, since they can be terminated abruptly which can lead to database file corruption.

Step No.1 needs to be performed _only once_ and afterwards customers may start/stop OWLIM on AWS whenever needed. After an EC2 instance with OWLIM installed is activated and the OWLIM process is started the customer may access the OWLIM server via the public IP address of the particular EC2 instance as a standard SPARQL endpoint, RESTful service (SPARQL graph store protocol) or the JMX monitoring and control port (if set up).
h2. Pricing Details

h3. Prerequisites
GraphDB in the AWS cloud is available in various server configurations:

|| instance type \\ || virtual cores \\ || RAM (GB) \\ || GraphDB price ($/hour) \\ || EC2 cost ($/hour) ||
| M3-L \\ | 2 \\ | 8 \\ | 0.35 \\ | 0.10 - 0.14 (reserved/on-demand) \\ |
| R3-L \\ | 2 \\ | 15 \\ | 0.40 \\ | 0.11 - 0.18 (reserved/on-demand) |
| R3-XL \\ | 4 \\ | 30 \\ | 0.75 \\ | 0.22 - 0.35 (reserved/on-demand) |
| R3-2XL \\ | 8 \\ | 61 \\ | 1.40 \\ | 0.44 - 0.70 (reserved/on-demand) |

The EC2 cost depends on the type of instance being used - on [demand instances|] are optimal only for short term and occasional use, while [reserved instances|] are optimal for longer term and more frequent use.

{warning}Note that GraphDB in the AWS cloud *SHOULD NOT* be deployed on _spot instances_, since they can be abruptly terminated and this may lead to data corruption{warning}

h2. Prerequisites

In order to use OWLIM GraphDB on AWS you need the following:
# A valid *AWS account* (including a valid SSH key pair).
# An existing *EBS* [|] *volume* with an *ext3* filesystem on it
## OWLIM will store data on the EBS volume (and *not* on the local/ephemeral storage of the EC2 instance) so that the data is preserved between EC2 instance restarts. EBS volumes can vary between 1GB and 1TB in size.
## Note that the EBS volume has to be in the *same AWS availability zone* [|] as the EC2 instance that will run OWLIM.
## Also note that the EBS volume must be in *"available" state*, i.e. not already attached to another EC2 instance.
## The filesystem on the EBS volume should be accessible *(read/write) from the* *{_}tomcat{_}* *Linux user account.*
# A valid [AWS account|]
# EC2 *security group* [|] [security group|] in the same region as the EC2 instance and configured as follows. Alternatively, the security group can be created and configured at instance launching time:
## Port 22 open to the IPs which will be administering the EC2 instance
## Port 8080 open to the IPs which need to access OWLIM GraphDB (upload data, SPARQL queries, Workbench, etc)
# An EC2 key pair [|] [key pair|] in the same availability region, used for user authentication on the EC2 instance. Alternatively, the key pair can be created at EC2 instance launching time.

h3. Pricing and Billing Details
h1. Setup

{color:#000000}Using OWLIM on the AWS cloud involves the following charges (Table 1):{color}
* One time setup fee
* On-demand charge for each hour of use
{color:#000000}The process of configuring and starting GraphDB in the AWS cloud involves the following steps:{color}
# Activating the GraphDB product on the AWS Marketplace _(one time step)_
# Starting an EC2 instance with the GraphDB AMI
# Logging into the running EC2 instance via SSH
# Mounting the EBS data volume on the filesystem of the running EC2 instance
# Starting the GraphDB server
# Creating and configuring repositories with the Workbench
# Verifying that everything is correctly configured and running

Additionally AWS will charge you for:
* EC2 instance hours, depending on the type of instance used and the region
* Data transfer in/out of the AWS datacenters
* EBS usage
The following diagram shows the sequence of steps to be followed:  

{color:#4f81bd}{*}{_}Note that currently OWLIM is available only on the following EC2 instance types{_}{*}{color} [|] {color:#4f81bd}{*}_:_{*}{color}
* {color:#4f81bd}{*}{_}M2 2XL (34GB RAM, 4 virtual cores, 64 bit, Not EBS optimised)_{*}{color}
* {color:#4f81bd}{*}{_}M2 4XL (68GB RAM, 8 virtual cores, 64 bit,_{*}{color} {color:#4f81bd}{*}{_}{+}EBS optimized{+}{_}{*}{color} {color:#4f81bd}{*}{_}1,000{_}{*}{color} {color:#4f81bd}{*}{_}Mbps)_{*}{color}

| | OWLIM cost | AWS cost | Total cost |
| One time charge | {color:#0070c0}$45.00{color} | \- | $45.00 |
| 1 hour of OWLIM usage on a M2-2XL (Linux) EC2 instance | {color:#0070c0}$0.87{color} | Depends on AWS region \\
($0.82 for US East) | $0.87 + M2-2XL cost for the selected region \\
($1.69 for US East) |
| 1 hour of OWLIM usage on a M2-4XL (Linux) EC2 instance | {color:#0070c0}$1.05{color} | Depends on AWS region \\
($1.64 for US East) | $1.05 + M2-4XL cost for the selected region \\
($2.69 for US East) |
| Data transfer IN to Amazon | {color:#0070c0}$0.00{color} | $0.00 | $0.00 |
| Data transfer OUT from Amazon | {color:#0070c0}$0.00{color} | Depends on AWS region and data volume [|] \\
($0.12 per GB for US East up to 10TB) | AWS transfer OUT cost for the selected region and data volume \\
($0.12 per GB for US East up to 10TB) |
{color:#4f81bd}{*}Table 1 - a summary of the usage costs for OWLIM on various EC2 instances{*}{color}
After an EC2 instance with GraphDB is activated and the GraphDB server is started the customer may access it via the public IP address of the particular EC2 instance as:
* [OpenRDF / rdf4j|] RESTful service, including a standard SPARQL endpoint
* [GraphDB Workbench|] web based administration tool for configuring, querying and monitoring a running GraphDB database

At any time you can check the current costs for using OWLIM on the AWS cloud from the AWS Management Console at [|]

h3. Buying the OWLIM Product on AWS

Follow the steps:
1. Go to [|]


h2. Buying the GraphDB Product on the AWS Marketplace

* Sign In to the [AWS Marketplace|] portal
* Search for the _GraphDB Cloud_ product
* alternatively, access directly the product pages on the Marketplace:
** [GraphDB 6.1|]
** ...


{color:#4f81bd}{*}Figure 1 OWLIM on AWS start page{*}{color}
* A detailed product preview page including pricing options shows up. Select the _Continue_ button to proceed to the purchase preview screen.

2. Follow the _"Start Using OWLIM"_ link which will redirect you to the Amazon DevPay site where you can buy the OWLIM product on AWS

3. Review the pricing options for the various AWS regions and EC2 instance types
* The purchase preview screen offers two options for launching the product: _1-Click Launch_ and _Manual Launch_. The following sections follow the process of manual launching the product via the EC2 Console describing the various configuration options and their default values

!1.pricing_description.1.PNG|border=1! !launchProduct.png|border=1!

h2. EC2 Instance Configuration & Startup

{color:#4f81bd}{*}Figure 2 OWLIM product description on AWS{*}{color}
* Choose an instance type - one of: *m3.large*, *r3.large*, *r3.xlarge*, *r3.2xlarge*
* Add storage. The GraphDB AMI is bundled with a pair of EBS volumes - one for the application and one for the data storage. The latter can be reused beyond the life-cycle of the product usage and initially it contains no data. There are several important parameters which might be adjusted at this step:
** Volume size - by default it will allocate 4GiB (sufficient for approximately 15 million triples) but depending on the estimated needs the size should be adjusted prior to volume creation
** Volume type - affects the IO performance (SSD vs Magnetic drives)
** Delete on Termination *SHOULD NOT* be selected. Otherwise the data will be lost after machine termination
** Device name (*/dev/sdf*) *SHOULD NOT* be changed
** If there already exists a data volume from previous use of the system, remove the second volume configuration row and attach the old volume manually when the instance is already running (as */dev/sdf*)

!1.pricing_description.3.PNG|border=1! !gdbStorage.png|border=1!

{color:#4f81bd}{*}Figure 3{*}{color} {color:#4f81bd}{*}OWLIM product description on AWS (2)*{color}
* creating a _security group_ (or reusing an existing one). Two ports has to be opened: 22 (SSH) for EC2 instance management; 8080 (HTTP) for accessing GraphDB service (Workbench UI & RESTful APIs)

Section Pricing & Billing Details provides details on the various costs for using OWLIM on the AWS Cloud. The charges displayed on the page will include:
* The one-time charge
* Data transfer charges between and out of the AWS data centers. This charge includes only the price for data transfer as set by Amazon.
* The charge per hour for using OWLIM on AWS (called "Box Usage"). This charge includes the price for OWLIM +and+ the EC2 instance running it (as set by Amazon)

4. Click on the "Place your order" button in the upper right corner of the screen.
* creating a _key pair_ (or reusing an existing one)

5. After that you will be redirected back to a confirmation page on the Ontotext website, showing the AMI IDs of the OWLIM images deployed on the various AWS regions. !2.confirmation_page.png|border=1!

{color:#4f81bd}{*}Figure 4{*}{color} {color:#4f81bd}{*}OWLIM Confirmation page{*}{color}

6. You can now launch an AMI running OWLIM on the AWS cloud

h3. EC2 Instance Launching, Setup and OWLIM Startup
* Review and Launch

Follow the steps:
1. Select the AWS region [|] where you want to launch the EC2 instance with OWLIM.
h2. GraphDB Startup

{color:#4f81bd}{*}{_}Note that currently OWLIM is available on the following AWS regions:_{*}{color}
* {color:#4f81bd}{*}{_}US East (Northern Virginia)_{*}{color}
* {color:#4f81bd}{*}{_}US West (Oregon)_{*}{color}
* {color:#4f81bd}{*}{_}US West (Northern California)_{*}{color}
* {color:#4f81bd}{*}{_}EU (Ireland)_{*}{color}
* Login into the instance via SSH using the private key for the EC2 instance and user *{_}ec2-user{_}*


2. Locate the OWLIM AMI

The OWLIM AMIs can be directly accessed via the links on the confirmation page displayed after the purchase:
* *US East (Northern Virginia)*
** AMI ID: ami-349c0f5d
** Direct link: [|]
* *US West (Oregon)*
** AMI ID: ami-aa3fb59a
** Direct link: [|]
* *US West (Northern California)*
** AMI ID: ami-ec3a18a9
** Direct link: [|]
* *EU (Ireland)*
** AMI ID: ami-4a6d653e
** Direct link: [|]
* run the script responsible for proper mounting of the EBS data volume:

The links to the OWLIM AMIs will activate the Instance Wizard on AWS (Figure 5)
{color:#4f81bd}{*}Figure 5 Instance Request Wizard (US West / Oregon region)*{color}

\\ {code}
3. Choose the preferred EC2 instance type and Availability Zone within the region

!4.1.instance_details.png|border=1! !sshattachVolumeA.png|border=1!

The script verifies that the EBS data volume is properly attached and creates a mount point for it. If the EBS volume is not attached yet for some reason, the script prompts the user for that and performs several delayed retries giving time to the user to attach the volume via the AWS Management Console. If the time is not sufficient this script should be rerun again.

On successful execution of the script confirms that the volume is mounted and prints out the mount point location: */data_mount/data*.

{color:#4f81bd}{*}Figure 6 Instance Type and Availability Zone{*}{color}
* Running the _GraphDB_ service:

\\ {code}
{color:#4f81bd}{*}{_}Note that OWLIM is currently available only on the following EC2 instance types{_}{*}{color} [|] {color:#4f81bd}{*}_:_{*}{color}
* {color:#4f81bd}{*}{_}M2 2XL (34GB RAM, 4 virtual cores, 64 bit, Not EBS optimised)_{*}{color}
* {color:#4f81bd}{*}{_}M2 4XL (68GB RAM, 8 virtual cores, 64 bit, EBS optimized 1000Mbps)_{*}{color}
/home/ec2-user/ start
The script will verify that the data volume is available (if not it terminates with a reminder message) and will start the service:

4. If necessary, specify additional EC2 instance parameters

For example, CloudWatch https:/ monitoring can be enabled for the EC2 instance.
h3. Workbench Configuration

{color:#4f81bd}{*}Figure 7 Additional EC2 instance settings{*}{color}\\
5. Optionally, add tags to the EC2 instance to locate it faster
* Open the GraphDB Workbench UI in your web browser under http://<instance-public-url>:8080/graphdb

!4.4.instance_details.png|border=1! !workbenchWelcomeNew.png|border=1!
{color:#4f81bd}{*}Figure 8 EC2 instance tags{*}{color}

6. Specify the key pair for connecting to the instance

The public/private key pair allows you to securely connect to the instance via SSH and manage it after it is launched. You can use an existing key pair, or create a new one.

{color:#4f81bd}{*}Figure 9 Secur{*}{color}{color:#4f81bd}{*}e key pairs{*}{color}

{color:#4f81bd}{*}{_}NOTE that you need to either use an existing key pair, or create a new one. DO NOT choose the "Proceed without a Key Pair" option._{*}{color}
7. Specify a Security Group (or create a new one)

The security group [|] should have at least the following network ports open:
* *Port 22* open to the IPs which will be administering the EC2 instance
* *Port 8080* open to the IPs which need to access OWLIM (upload data, SPARQL queries, Workbench, etc)

{color:#4f81bd}{*}Figure 10 Firewall / Security Group settings{*}{color}
If the data volume attached was used previously, the old repositories will be detected and listed under *Admin* > *repositories*.

8. Launch the AMI

Review the various AMI settings, in particular:
* Availability zone selected
* Instance type
* EBS optimization settings
* The key pair to be used for secure connection to the instance
* The security group selected

A confirmation dialog will be displayed after the instance is launched:
h3. Verifying the Configuration & Startup

You can check the status of the AWS instance from the AWS Management Console. The instance will be in "pending" state initially and in "running" state when ready to use.
* Testing the service. Back in the SSH console, test the configuration of the GraphDB instance by executing:

{color:#4f81bd}{*}Figure 11 Pending EC2 instance{*}{color}
/home/ec2-user/ test
It will perform various automated tests like creating a repository, loading some data, query the data and delete the repository. Results from each test is printed in the console.

The detail page about the running EC2 instance will provide a summary of important information such as:
* The key pair that can be used to securely access and administer the instance
* The public DNS and IP of the instance which can be accessed by end users and applications

{color:#4f81bd}{*}Figure 12 Running EC2 instance{*}{color}

9. Attach an existing EBS volume to the running EC2 instance

{color:#4f81bd}{*}{_}Note that the EBS volume has to be in the same AWS availability zone as the EC2 instance that will run OWLIM._{*}{color}
{color:#4f81bd}{*}{_}Also note that the EBS volume must be in "available" state, e.g. not already attached to another EC2 instance{_}{*}{color}

After the EBS volume is successfully attached to the instance its state will be changed to _"in use"_:

10. SSH to the running EC2 instance
h2. GraphDB Shutdown & Restart

You can use the AWS Management Console (Instance Management > Connect > Connect from your browser using the Java SSH client). You will need to specify the private key corresponding to the key pair associated with the EC2 instance and *{_}ec2-user{_}* as the user account.
\\ !15.login_details.PNG|border=1!
{color:#4f81bd}{*}Figure 13 Connect to an EC2 instance via SSH{*}{color}
The termination of the GraphDB service should be done *only* via the provided shell script:

/home/ec2-user/ stop
This will perform a graceful shutdown of the service persisting any in memory data to the EBS volume. This operation might take some time so be sure there's no active java process prior to restarting the service or terminating the EC2 instance.

11. Mount the EBS volume to the local filesystem of the running EC2 instance
The GraphDB service can be started again at any time (only possible if the EC2 is *stop{*}ped rather than *terminate{*}d) with these steps:
# Mount the external EBS volume with the data:
# Start the GraphDB service:
/home/ec2-user/ start

Execute the *{_}{_}* script located in your home directory _/home/ec2_user_. The script will mount the EBS volume onto the EC2 file system at _/data_mount/owlim_data_&nbsp;.
\\ !17.execute_script.PNG|border=1!
{color:#4f81bd}{*}Figure 14{*}{color} {color:#4f81bd}{*}Mounting the EBS volume to the filesystem{*}{color}
h2. Stopping the EC2 Instance
{warning}Note that the GraphDB service has to be gracefully shut down as explained in the previous step
The EC2 resources can be completely or partially released depending on the use case requirements:
* stopping the instance - this operation stops the instance and preserves its filesystem state. You can use the _EC2 Management Console_ for performing this task. This scenario is appropriate when the service is not needed for certain time period but it will be restarted later when it is necessary. In this case the attached EBS volume remains attached.
* terminating the instance - complete termination of the service. This terminates the EC2 machine and its file system. Only the EBS data volume remains intact and it is automatically detached.


h1. Working with GraphDB on AWS

12. Start the OWLIM server process.

Execute: *{_}sudo service tomcat6 start{_}*
{color:#4f81bd}{*}Figure 15 Starting OWLIM{*}{color}\\
h2. SPARQL Endpoint Access

13. OWLIM is now accessible by the IPs configured in your AWS security group at:
* http://<instance-public-url>:8080/graphdb/repositories/<repo-id>?query=<sparql>

* _EC2-isnatnce-public-DNS_:8080/openrdf-sesame
* _EC2-isnatnce-public-DNS_:8080/openrdf-workbench

{color:#4f81bd}{*}Figure 16 Accessing the Sesame Workbench on the EC2 instance running OWLIM{*}{color}
h2. REST API Access

h3. OWLIM Shutdown and EC2 Instance Termination
* OpenRDF Sesame REST API root: http://<instance-public-url>:8080/graphdb/

{color:#000000}Follow the steps:{color}
1. Shutdown the OWLIM process.

{color:#4f81bd}{*}Figure 17 OWLIM shutdown{*}{color}\\
h2. GraphDB Workbench

2. Terminate the EC2 instance from the AWS Management Console.
* [GraphDB Workbench Documentation|]


{color:#4f81bd}{*}Figure 18 EC2 instance termination{*}{color}\\
h2. 3rd Party Tools

After that the EC2 instance should appear with a "terminated" state on the AWS console.
* The [Information Workbench|] from _fluidOps_ is a Web-based open platform for Linked Data and Big Data Management Solutions
* [OpenRDF Sesame|] API & Workbench for GraphDB remote access and management
* &nbsp;

h1. Administration

h2. User management

By default, the user management and security is disabled. To enable it, go to *Administration > Users* and enable the _Security_ option


The default login for the 'admin' user is with password 'root'. Make sure you change the password as soon as the security is enabled\!

h2. EBS Volume Expansion

* Stop the instance if it is running (not terminate)
* Create a snapshot of the volume to be expanded
* From the new snapshot create a new volume with the desired size (in the same availability zone)
* Detach the old volume from the instance
* Attach the new volume
* Start the instance and expand the file system on the new volume from with the instance

Detailed description is available at: []

h2. Backup & Restore

Backing up the data is a simple process of taking snapshot of the EBS data volume. The snapshot then can be used for restoring the application data state or for replication of the data or migrating it to other data center.

The proper order of steps for data backup are:
* stop the _GraphDB_ service to ensure all in-memory data is persisted properly on the file system
* stop the AWS instance to ensure the file system is in consistent state
* take a snapshot of the EBS data volume
* restart the AWS instance and the _GraphDB_ service.

Data restore steps (on running AWS instance):
* stop the _GraphDB_ service if it is running
* detach the old EBS data volume (if any)
* create a new EBS volume from the backup data snapshot
* attach the new volume on */dev/sdf* device
* run the ** script and then the _GraphDB_ service

Data restore steps (new AWS instance):
* in the _Launch_ instance wizard,
** remove the default blank data volume
** add the backup data snapshot as a source for the data volume
* follow the rest of the start-up and configuration procedure described above

h1. Support

The standard support channels are available for questions, feedback and general information related to GraphDB on AWS:
* email: [ |]
* Twitter: [@OntotextGraphDB|]
* Ontotext Answers forum at []