The self-managed version of GraphDB is a hosted database in the Cloud providing all the power of a scalable triple store as a pay-by-the-hour service through Amazon Web Services. GraphDB (Standard Edition) can be purchased as an AMI running on EC2 instances with 4-core / 32 GB RAM or or 8-core / 64 GB RAM.
Our customers often tell us that they want to develop and test in the cloud before bringing projects in-house. Now, you can do that without the need for buying GraphDB licenses or provision hardware first - GraphDB in the Cloud is perfect for running limited-time projects or low-volume experiments in a production-quality setting without an investment in hardware.
All GraphDB instances are designed to store data on user-supplied Amazon EBS volumes (network attached storage), so that your data is persisted and safe even if the instance is not running. GraphDB in the Cloud is accessible via standard RESTful APIs and SPARQL endpoints
Amazon Web Services
The following Amazon Web Services concepts which are related to running GraphDB on the AWS cloud:
AWS Marketplace is an online marketplace which makes it possible for customers to use its "1-Click deployment" to instantly launch pre-configured software and services on the AWS cloud infrastructure and pay only for what they use by the hour
The GraphDB software is available as a product on the AWS Marketplace.
AMI (Amazon Machine Image) provides a virtual server image which can be instantly launched on the AWS cloud
GraphDB provides such an AMI, and customers can provision it on virtual instances running on AWS.
EC2 (Elastic Compute Cloud) is the computing infrastructure where AMIs are launched as virtual instances. Security groups configure the firewalls controlling the netwprk traffic to a running virtual EC2 instance. Key pairs are used to encrypt and decrypt login information and must be used for accessing a running EC2 instance.
The GraphDB AMI will be provisioned as an EC2 virtual instance and a security group will be used to restrict network access to the instance, based on the user preferences
the user will use the private key pair to log into the running EC2 virtual instance with GraphDB
EBS (Elastic Block Store) provides network attached storage volumes that can be used with running EC2 instances
the EBS volume is created via and managed by the user's own AWS account. The user is responsible for data volume maintenance tasks such as: volume expansion, snapshots, backup & restore.
on-demand EC2 instances are charged by the hour with no long-term commitments or upfront payments, while the reserved EC2 instances provide a cheaper alternative to on-demand instances for longer term use. Note that GraphDB SHOULD NOT be deployed on spot instances, since they can be terminated abruptly which can lead to database file corruption.
Pricing Details
GraphDB in the AWS cloud is available in various server configurations:
instance type
virtual cores
RAM (GB)
GraphDB price ($/hour)
EC2 cost ($/hour)
M3-L
2
8
0.35
0.10 - 0.14 (reserved/on-demand)
R3-L
2
15
0.40
0.11 - 0.18 (reserved/on-demand)
R3-XL
4
30
0.75
0.22 - 0.35 (reserved/on-demand)
R3-2XL
8
61
1.40
0.44 - 0.70 (reserved/on-demand)
The EC2 cost depends on the type of instance being used - on demand instances are optimal only for short term and occasional use, while reserved instances are optimal for longer term and more frequent use.
Note that GraphDB in the AWS cloud SHOULD NOT be deployed on spot instances, since they can be abruptly terminated and this may lead to data corruption
Prerequisites
In order to use GraphDB on AWS you need the following:
EC2 security group in the same region as the EC2 instance and configured as follows. Alternatively, the security group can be created and configured at instance launching time:
Port 22 open to the IPs which will be administering the EC2 instance
Port 8080 open to the IPs which need to access GraphDB (upload data, SPARQL queries, Workbench, etc)
An EC2 key pair in the same region, used for user authentication on the EC2 instance. Alternatively, the key pair can be created at EC2 instance launching time.
Setup
The process of configuring and starting GraphDB in the AWS cloud involves the following steps:
Activating the GraphDB product on the AWS Marketplace (one time step)
Starting an EC2 instance with the GraphDB AMI
Logging into the running EC2 instance via SSH
Mounting the EBS data volume on the filesystem of the running EC2 instance
Starting the GraphDB server
Creating and configuring repositories with the Workbench
Verifying that everything is correctly configured and running
The following diagram shows the sequence of steps to be followed:
After an EC2 instance with GraphDB is activated and the GraphDB server is started the customer may access it via the public IP address of the particular EC2 instance as:
OpenRDF / rdf4j RESTful service, including a standard SPARQL endpoint
GraphDB Workbench web based administration tool for configuring, querying and monitoring a running GraphDB database
A detailed product preview page including pricing options shows up. Select the Continue button to proceed to the purchase preview screen.
The purchase preview screen offers two options for launching the product: 1-Click Launch and Manual Launch. The following sections follow the process of manual launching the product via the EC2 Console describing the various configuration options and their default values
EC2 Instance Configuration & Startup
Choose an instance type - one of: m3.large, r3.large, r3.xlarge, r3.2xlarge
Add storage. The GraphDB AMI is bundled with a pair of EBS volumes - one for the application and one for the data storage. The latter can be reused beyond the life-cycle of the product usage and initially it contains no data. There are several important parameters which might be adjusted at this step:
Volume size - by default it will allocate 4GiB (sufficient for approximately 15 million triples) but depending on the estimated needs the size should be adjusted prior to volume creation
Volume type - affects the IO performance (SSD vs Magnetic drives)
Delete on Termination SHOULD NOT be selected. Otherwise the data will be lost after machine termination
Device name (/dev/sdf) SHOULD NOT be changed
If there already exists a data volume from previous use of the system, remove the second volume configuration row and attach the old volume manually when the instance is already running (as /dev/sdf)
creating a security group (or reusing an existing one). Two ports has to be opened: 22 (SSH) for EC2 instance management; 8080 (HTTP) for accessing GraphDB service (Workbench UI & RESTful APIs)
creating a key pair (or reusing an existing one)
Review and Launch
GraphDB Startup
Login into the instance via SSH using the private key for the EC2 instance and user ec2-user
run the script responsible for proper mounting of the EBS data volume:
The script verifies that the EBS data volume is properly attached and creates a mount point for it. If the EBS volume is not attached yet for some reason, the script prompts the user for that and performs several delayed retries giving time to the user to attach the volume via the AWS Management Console. If the time is not sufficient this script should be rerun again.
On successful execution of the script confirms that the volume is mounted and prints out the mount point location: /data_mount/data.
Running the GraphDB service:
The script will verify that the data volume is available (if not it terminates with a reminder message) and will start the service:
Workbench Configuration
Open the GraphDB Workbench UI in your web browser under http://<instance-public-url>:8080/graphdb
If the data volume attached was used previously, the old repositories will be detected and listed under Admin > repositories.
Verifying the Configuration & Startup
Testing the service. Back in the SSH console, test the configuration of the GraphDB instance by executing:
It will perform various automated tests like creating a repository, loading some data, query the data and delete the repository. Results from each test is printed in the console.
GraphDB Shutdown & Restart
The termination of the GraphDB service should be done only via the provided shell script:
This will perform a graceful shutdown of the service persisting any in memory data to the EBS volume. This operation might take some time so be sure there's no active java process prior to restarting the service or terminating the EC2 instance.
The GraphDB service can be started again at any time (only possible if the EC2 is stopped rather than terminated) with these steps:
Mount the external EBS volume with the data:
Start the GraphDB service:
Stopping the EC2 Instance
Note that the GraphDB service has to be gracefully shut down as explained in the previous step
The EC2 resources can be completely or partially released depending on the use case requirements:
stopping the instance - this operation stops the instance and preserves its filesystem state. You can use the EC2 Management Console for performing this task. This scenario is appropriate when the service is not needed for certain time period but it will be restarted later when it is necessary. In this case the attached EBS volume remains attached.
terminating the instance - complete termination of the service. This terminates the EC2 machine and its file system. Only the EBS data volume remains intact and it is automatically detached.
Backing up the data is a simple process of taking snapshot of the EBS data volume. The snapshot then can be used for restoring the application data state or for replication of the data or migrating it to other data center.
The proper order of steps for data backup are:
stop the GraphDB service to ensure all in-memory data is persisted properly on the file system
stop the AWS instance to ensure the file system is in consistent state
take a snapshot of the EBS data volume
restart the AWS instance and the GraphDB service.
Data restore steps (on running AWS instance):
stop the GraphDB service if it is running
detach the old EBS data volume (if any)
create a new EBS volume from the backup data snapshot
attach the new volume on /dev/sdf device
run the attach_data_vol.sh script and then the GraphDB service
Data restore steps (new AWS instance):
in the Launch instance wizard,
remove the default blank data volume
add the backup data snapshot as a source for the data volume
follow the rest of the start-up and configuration procedure described above
Support
The standard support channels are available for questions, feedback and general information related to GraphDB on AWS: