Red Hat
Oct 11, 2012
by Shane K Johnson

On Monday, Tristan and I delivered a webinar.

NoSQL, No sweat wtih JBoss Data Grid

We discussed:

  • How data grids fit into the NOSQL landscape.
  • The NOSQL influence on JBoss Data Grid.
  • The future of data grids in Java.
  • Big Data with JBoss Data Grid
  • Real world use cases for data grids.

Update: The links to play or download the webinar have been fixed.

We planned for Q & A, and a number of questions were submitted. Unfortunately we ran out of time, so I decided to publish the answers here.

The webinar is available to play or for download.

FYI

I’ll be spending October and November working on JBoss Data Grid, and I plan to publish a number of posts during that time.

And now on to the questions…

Is JBoss Data Grid an in memory database?

No. JBoss Data Grid is an in memory data grid (IMDG). It is a key / value store, a distributed hash table (DHT), with query capabilities (e.g. map / reduce).

If a node goes down, is its data mirrored on another node?

Yes. However, it depends on the configuration. If the data is replicated, then the data is mirrored on every node. If the data is distributed, it depends on the number of owners. If the number of owners is greater than one, then a node’s data is in fact mirrored on one or more other nodes.

How do we decide what the key is?

That is up to you. However, JBoss Data Grid is responsible for the internal hash of the key. It uses a MurmurHash3 implementation.

How do we ensure that the data is distributed evenly?

JBoss Data Grid implements consistent hashing. Virtual nodes can be used to improve the distribution of data.

Resource #1: Consistent Hashing – Tom White (link)

When we add a new node, do we need to redistribute the data?

No. That is the benefit of consistent hashing. Rather than rebalancing the entire data grid, the new node will take ownership of a subset of entries from its neighbor.

Resource #2: Programmer’s Toolbox Part 3: Consistent Hashing (link)

What about our stored procedures?

The JBoss Data Grid equivalent would be distributed tasks and map / reduce tasks.

In the context of the CAP Theorem, can separate caches within the same container be configured with different guarantees such that one cache guarantees Consistency and Partition Tolerance while another guarantees Availability and Partition Tolerance?

Yes. One cache can be configured to use asynchronous communication (AP) while another to use synchronous communication (CP).

How do distributed tasks differ from map / reduce tasks?

A distributed task is executed on multiple nodes, and it returns multiple values. A map / reduce task is executed on multiple nodes, but it returns a single value. A distributed task is executed as multiple independent function whereas a map / reduce task is executed in the context of a single function.

Update

Another way to look at it is that a distributed task is executed once per node whereas a map / reduce task is executed once per entry.

How do you see Hibernate OGM fitting into the JBoss Data Grid roadmap?

To be honest, we are still evaluating the role of Hibernate OGM as it relates to JBoss Data Grid.