- setUp() - will show "setUp()" in the logs
- Cluster verification (marked with prefix "CV"). If this fails, the test will be skipped
- Start of continuous queries (marked with prefix "CQ")
- Actual test
- starts with ">>>>> testXXX"
- ends with ">>>>> done: testXXX"
- if the "done" part is missing, then the test has either been skipped or has failed. This was done to detect skipped tests via failed JUnit/assumptions, because the assumptions are not logged anywhere
- tearDown() - will show "tearDown()" in the logs
- stop continuous queries (CQ). This may take some time, especially for write queries if the whole cluster is not yet syncrhonized/replicated
- show CQ statistics: R/W queries executed
- show CQ failures: R/W queries failed (this is omitted, if there aren't any)
- this is logged to the console via Log4J
- example output looks like this - testM2 - (comments on the left before "->"):
The test suite now continuously executes queries and updates in a number of threads while running the tests. These queries and updates are more complex, taken from the LDBC-50M suite. We currently start 4 Read threads and 1 Write thread. All of the read queries from LDBC SPB are used. Only the insert queries are used for Writing (this is related to 1) the fact that we use a cache of CWorks which will be invalidated if we delete a CWork which belongs to it; 2) Cluster validation expect workers that are lagging to only increase their statements - this was mostly to prevent waiting long times before deciding that the cluster is broken). The fact that we don't use DELETE/Update queries is not a limitation of this tests as these are used as part of the load test.
It will print statistics at the end of each test, like these:
The test tool gathers the JMX notifications from all masters and logs them in notifications.log file. This is for later checking of special events, such as replication or split brain.
The test tool now uses Log4j to log the output. This makes it easier to control what to log and where to log it.
We verify that the cluster is in good shape at least twice per test: before we start it, and after the test is finished. This verification used to work like this:
- Check if all nodes are on-line, and that they have the same number of statements and fingerprints
- If the check passes, the cluster is OK. Stop
- If we repeated the process too many times then the cluster is in bad state. Stop
- Otherwise, sleep for some time and go to 1
The new verification works this way. The new process is similar, but with two important differences:
- We are checking the state, but are waiting just 1 second between them. This ensures, that we'll pick up a "good" cluster pretty fast
- We count stale states (which are the same as the previous state). We'll stop the process early if we encounter too many (currently 10) consecutive stale states. This ensures, that we won't wait too long if the cluster is "frozen".