View Source

{toc}
h1. Test life cycle
* setUp() - will show "setUp()" in the logs;
** Cluster verification (marked with prefix "CV"). If this fails, the test will be skipped.
** Start of continuous queries (marked with prefix "CQ").
* Actual test
** starts with ">>>>> testXXX";
** ends with ">>>>> done: testXXX";
** if the "done" part is missing, then the test has either been skipped or has failed. This is done in order to detect skipped tests via failed JUnit/assumptions because the assumptions are not logged anywhere.
* tearDown() - will show "tearDown()" in the logs.
* stop continuous queries (CQ). This may take some time, especially for write queries, if the whole cluster is not yet syncrhonized/replicated.
** show CQ statistics: R/W queries executed;
** show CQ failures: R/W queries failed (if there aren't any, this is omitted).

h2. Logging of the life cycle
* This is logged to the console via Log4J.
* Example output - testM2 - (comments on the left before "->"):
{noformat}
[junit] 2014-08-14 18:23:49,178 [main] INFO test.logger - ===========================================
** setUp->
[junit] setUp(): Thu Aug 14 18:23:49 EEST 2014
[junit] 2014-08-14 18:23:49,507 [main] INFO test.logger - nodes started
** CV is OK ->
[junit] 2014-08-14 18:23:49,547 [main] INFO test.logger - client API reinitialized
[junit] 2014-08-14 18:23:49,738 [main] INFO test.logger - CV: OK. retries = 0, stale retries = 0
[junit] 2014-08-14 18:23:49,767 [main] INFO com.ontotext.test.utils.ContinuousQueries - CQ: Starting background queries. Threads: 4R/1W
** Test Name ->
[junit] 2014-08-14 18:23:51,768 [main] INFO test.logger - >>>>>>>>>> testM2
[junit] 2014-08-14 18:23:51,811 [main] INFO test.logger - Stress node: master3, action: 4
** Stress event ->
[junit] 2014-08-14 18:23:51,811 [main] INFO test.logger - - stress: Disk space limit on node: master3
** Test finished OK ->
[junit] 2014-08-14 18:26:01,391 [main] INFO test.logger - >>>>>>>>>> done: testM2
[junit] 2014-08-14 18:26:01,391 [main] INFO test.logger - --------------------
** tearDown()->
[junit] tearDown() - will verify cluster statements & fingerprint - Thu Aug 14 18:26:01 EEST 2014
[junit] 2014-08-14 18:26:01,391 [main] INFO com.ontotext.test.utils.ContinuousQueries - CQ: Stopping background queries...
[junit] 2014-08-14 18:26:01,391 [main] INFO com.ontotext.test.utils.ContinuousQueries - CQ: [WriterThread-024] stopping thread
[junit] 2014-08-14 18:26:01,391 [main] INFO com.ontotext.test.utils.ContinuousQueries - CQ: [ReaderThread-025] stopping thread
[junit] 2014-08-14 18:26:01,392 [main] INFO com.ontotext.test.utils.ContinuousQueries - CQ: [ReaderThread-026] stopping thread
[junit] 2014-08-14 18:26:01,392 [main] INFO com.ontotext.test.utils.ContinuousQueries - CQ: [ReaderThread-027] stopping thread
[junit] 2014-08-14 18:26:01,392 [main] INFO com.ontotext.test.utils.ContinuousQueries - CQ: [ReaderThread-028] stopping thread
[junit] 2014-08-14 18:26:01,394 [cq-stop-029] INFO com.ontotext.test.utils.ContinuousQueries - CQ: Waiting for WriterThread-024...
[junit] 2014-08-14 18:26:01,395 [cq-stop-029] INFO com.ontotext.test.utils.ContinuousQueries - CQ: Waiting for ReaderThread-025...
[junit] 2014-08-14 18:26:01,395 [cq-stop-029] INFO com.ontotext.test.utils.ContinuousQueries - CQ: Waiting for ReaderThread-026...
[junit] 2014-08-14 18:26:01,395 [cq-stop-029] INFO com.ontotext.test.utils.ContinuousQueries - CQ: Waiting for ReaderThread-027...
[junit] 2014-08-14 18:26:01,395 [cq-stop-029] INFO com.ontotext.test.utils.ContinuousQueries - CQ: Waiting for ReaderThread-028...
[junit] 2014-08-14 18:26:01,395 [cq-stop-029] INFO com.ontotext.test.utils.ContinuousQueries - CQ: Stopped background queries in 3ms
[junit] 2014-08-14 18:26:01,395 [main] INFO com.ontotext.test.utils.ContinuousQueries - CQ: Statistics:
** CQ Stats->
[junit] 2014-08-14 18:26:01,395 [main] INFO com.ontotext.test.utils.ContinuousQueries - CQ: Executed queries: 2470R, 7W in 131627 ms.
** CQ Failures->
[junit] 2014-08-14 18:26:01,396 [main] WARN com.ontotext.test.utils.ContinuousQueries - CQ: Failed continues queries: 3R/0W.
** CV Retries->
[junit] 2014-08-14 18:26:01,770 [main] DEBUG test.logger - CV: 600 retries left, 10 stale retries left.
[junit]
[junit] 2014-08-14 18:26:12,959 [main] DEBUG test.logger - CV: 590 retries left, 8 stale retries left.
[junit]
** CV finally OK ->
[junit] 2014-08-14 18:26:17,476 [main] INFO test.logger - CV: OK. retries = 13, stale retries = 5
{noformat}

h1. Continuous query execution

The test suite now continuously executes queries and updates in a number of threads while running the tests. These queries and updates are more complex when taken from the LDBC-50M suite. We currently start 4 Read threads and 1 Write thread. All of the read queries from LDBC SPB are used. Only the insert queries are used for Writing. This is related to:
a) the fact that we use a cache of CWorks, which will be invalidated if we delete a CWork that belongs to it;
b) the cluster validation expects workers that are lagging to only increase their statements - this was mostly to prevent waiting long times before deciding that the cluster is broken.

The fact that we do not use DELETE/Update queries is not a limitation of this tests. They are used as part of the load test.

The following statistics will be printed at the end of each test:

{noformat}
2014-08-14 16:16:50,122 [main] INFO com.ontotext.test.utils.ContinuousQueries - CQ: Statistics:
2014-08-14 16:16:50,122 [main] INFO com.ontotext.test.utils.ContinuousQueries - CQ: Executed queries: 4R, 2W in 3039 ms.
{noformat}

h1. JMX Notification logging

The test tool gathers the JMX notifications from all masters and logs them in the {{notifications.log}} file. This is done so that later, special events can be checked such as replication or split brain.

h1. Log4j

The test tool now uses Log4j to log the output. This makes it easier to control what to log and where to log it.

h1. Cluster Verification: Improved cluster stabilisation detection

Verify that the cluster is in good shape at least twice per test: before you start it and after the test is finished. The old verification process was the following:
# Check if all nodes are on-line and if they have the same number of statements and fingerprints.
# If the check passes, the cluster is OK. Stop.
# If this process is repeated too many times, then the cluster is in a bad state. Stop.
# Otherwise, sleep for some time and go back to step #1.

The new verification process is similar but with two important differences:
# You still check the states but the waiting period is only 1 second. This enables you to pick up a "good" cluster quite fast.
# You count stale states (these are states that are the same as the previous state). If there are too many consecutive stale states (at the moment this is set to 10), you can stop the process early. This way you do not have to wait too long, if the cluster is "frozen".