Friday 22 April 2016

Speeding Up Crate By Doing Nothing

Since I started working at Crate.IO I have, of course, had to spend some time getting to learn the technology. For me, overall Crate story is very compelling: distributed SQL at NoSQL-scale. How the technology clusters and distributes load is also very cool. In order to learn a little about the technology for myself I have been working on a specific "game"... What can I tweak in order to maximise the input rate? Here's what I found out so far...

The Setup

So I am playing this "game" on a Macbook Pro and I am deploying Crate into VMs using the laptop as the host. For completeness, I am using CentOS7 for the VMs. There are 3 VMs and they are each configured with 2GB RAM and one VCPU. Provisioning of those VMs is handled by Ansible; this includes the install and configuration of Crate.

You could do this in other ways of course:

  • Deploy 3 Docker containers
  • Deploy 3 Crate instances directly on the host machine

Deploying 3 Nodes is interesting because this shows off the clustering behaviour of Crate. It is not strictly necessary in this case, however, because a single node would have absolutely no trouble maxing-out the CPU with the kind of load that I was inserting.

What's The Aim Of The Game?

So the game is super-simple: for a specified schema, how quickly can I get Crate to ingest 1,000,000 rows? Nothing more to it than that. To help me along with this task, there is some tooling I need to introduce you to: Crash, the Crate Shell; CR8, a a collection of utility scripts for performing specific tasks (in this case, filling a table) with Crate clusters. I used the latest Crate release (0.54.8) and no other tooling.

In order to improve performance, I was allowed to tweak any configuration option that is documented (even if such a tweak is not advisable in the real world). Simple as that.

Playing The Game

The first thing I did was to deploy my VMs, provision Crate and then attach Crash to start firing in my SQL instructions, starting with creating my simple table...

Nothing particularly exciting here, but one thing worth noting is the sharding. Crate, by default, will create 5 shards per table with each shard being a Lucene Index. With only 3 VCPUs available in my cluster there is actually performance increase to be gained by reducing to just 3 shards.

Now I used the CR8 utility to load 1,000,000 rows of auto-generated data into my table and I pay attention to the load on the cluster while I do it...

The total insert takes just over 3 minutes at 7.77 inserts/second (7k rows/second). Not exactly super fast. But that's OK... now it's time to optimise for speed. So I followed the documentation and adjusted my table to prepare it for a faster insert. Again, run CR8 and monitor the

The second peak, above, is the second insert. The data also speaks for itself: the insert took 33s at 29.57 inserts/second. 3.8x improvement. Not bad. But the best we can do? Actually no. I repeated the CR8 command a total of 7 times. The results:

  1. 3:08 @ 7.77 it/s
  2. 0:33 @ 29.57 it/s
  3. 0:32 @ 31:05 it/s
  4. 0:33 @ 29.78 it/s
  5. 0:18 @ 53.75 it/s
  6. 0:19 @ 50.77 it/s
  7. 0:19 @ 50.64 it/s

Now we're talking! But where did this performance improvement come from? I did nothing! I ponder this issue while doing some queries:

Out of the 7 million "users" in my setup, I found 512 called "Dr. Paul..." in 0.102s. Then I ran some subsequent queries (actually, the same query over and over). This produced some very interesting results:

See what happened there? 10 extra results and being returned in 1/10th of the time! Both are easily explained. When I setup my table, I set the refresh interval to 0 in order to speed up the insert. However, I had not yet reset this or manually refreshed the table! For further detail, see this piece of documentation. The same document also (implicitly) explains the query speed up... the entire "name" column was cached the first time I queried it. You probably could have guessed this.

So, we are brought back to my first question about insert speed... why did that suddenly, dramatically speed up? One guess: this might be a runtime optimisation by the Java Hotspot VM. I've not asked, but I am confident the engineers at Crate.IO know the answer. Got a suggestion? Great... submit your answer in a comment or follow this link.

No comments:

Post a Comment