We, Data Grid

Oct 25, 2012 4:43 PM, Shane K Johnson [ Original post ]

This post gives an overview of the standard features, functionality, and configuration options of a data grid. It does so by establishing a base line with two data grids: JBoss Data Grid (6.0.1) and Oracle Coherence (3.7.1). It begins with core concepts and proceeds directly to intermediate and advanced concepts implemented by both data grids.

  • Core
  • Distributed
  • Concurrency
  • Processing
  • Remote
  • Storage
  • Management

Core

TOPOLOGY

JBoss
Data Grid
Oracle
Coherence
Local Y Y
Invalidation Y Y
Replicated Y Y
Distributed Y Y

BASIC

JBoss
Data Grid
Oracle
Coherence
Eviction Y Y
Expiration Y Y

Distributed

A data grid is a distributed system with a single system image (SSI); it presents itself as a single system. However, a data grid should support the load balancing of data by distributing it to highly available (HA) partitions on multiple nodes via dynamic partitioning. When a node is added, an HA partition is created. As a result, a data grid should support state transfer. It is the process of transferring ownership of a subset of the entries within the grid to a new node when one is added. While the data is distributed, it may be preferable to ensure that related entries are stored on the same node. This is known as data affinity. For example, a department and its employees. It may be desirable to ensure that all employees within a department are stored on the same node. A data grid is elastic; nodes can be added or remove on an on-demand basis. Further, a data grid should support node discovery and failure detection. Because a data grid is elastic and because the data is distributed, it scales linearly. When a node is added, its full capacity is added to the data grid.

A data grid should support both broadcast and peer-to-peer communication. It should support both UDP/IP and TCP/IP. In addition, a data grid should support both multicast and unicast messaging. While node discovery, node failure detection via heartbeats, and group wide communication rely on multicast messaging, peer-to-peer communication relies on unicast messaging. JBoss Data Grid relies on JGroups for its group membership and communication protocol whereas Oracle Coherence relies on Tangosol Cluster Management Protocol (TCMP).

JBoss
Data Grid
Oracle
Coherence
Single System Image Y Y
Data Load Balancing Y Y
Distributed Data Y Y
Highly Available Partitions Y Y
Dynamic Partioning Y Y
State Transfer Y Y
Data Affinity Y Y
Elastic Y Y
Node Discovery & Failure Detection Y Y
UDP/IP & TCP/IP Y Y
Multicast & Unicast Messaging Y Y
Peer-to-Peer Communciation Y Y
Linear Scaling Y Y

Concurrency

A data grid should support transactions with ACID properties and concurrent access. It should be able to participate in Java Transaction API (JTA) compliant transaction. In addition, it should be able to participate in an XA (distributed) transaction to guarantee transaction consistency and support transaction recovery.

A data grid should support distributed locking and it should support Multi Version Concurrency Control (MVCC) to provide transactions with ACID properties. Distributed locking ensures that concurrent transactions do not write to the same entry as the same time. When a transaction locks an entry for writing, other transactions writing to that same entry fail or are blocked. MVCC ensures that transactions that are reading an entry do not block transactions that are writing that same entry and vice versa; reads are not blocked by writes, writes are not blocked by reads.

A data grid should support both optimistic and pessimistic locking. When a transaction is configured with pessimistic locking, the lock is acquired immediately (before the entry is written). When a transaction is configured with optimistic locking, lock acquisition is deferred (until the prepare phase). In addition, a data grid should support explicit locking. With explicit locking, the entry is locked manually via the API. Finally, a data grid should support deadlock detection in the event that concurrent transaction block each other.

A data grid should support the following isolation levels: read committed, repeatable read. With read committed, a transaction always sees the latest version of an entry. If a transaction reads an entry and a second transaction then updates and commits that same entry, the first transaction will see the updated version if it reads that same entry again before committing. With repeatable read, a transaction will always see the same version of that entry regardless of whether or not it has been updated by a separate transaction before it has committed.

JBoss
Data Grid
Oracle
Coherence
Java Transaction API Y Y
ACID Properties Y Y
XA Compliant Resource Y Y
Transaction Consistency Y Y
Distributed Locking Y Y
Optimistic / Pessimistic Locking Y ?
Explicit Locking Y Y
Deadlock Detection Y Y
Multi Version Concurrency Control Y Y
Read Committed Isolation Y Y
Repeatable Read Isolation Y Y

Note

I am unable to state whether or not optimistic and / or pessimistic locking can be configured with Oracle Coherence. The locking strategy is described in neither the NamedCache documentation nor the OptimisticNamedCache documentation. While the interface name OptimisticNamedCache implies optimistic locking, there is no reference to pessimistic locking in the transaction documentation.

Processing

A data grid should support distributed tasks. It should be able to executed a task on some or all of the nodes in parallel. In addition, a data grid should be able to determine which nodes to pass the task to. Rather than passing data to the task, a data grid passes a task to the data. To that end, a data grid should be able to determine which nodes the task should be passed to.

JBoss
Data Grid
Oracle
Coherence
Distributed Tasks Y Y
Parallel Processing Y Y
Grid Processing Y Y

Remote

A data grid should support remote access via a Java API and it should include a Java client. In addition, it should provide an HTTP / REST API to support remote access from clients written in languages other than Java (e.g. Ruby / Python).

JBoss
Data Grid
Oracle
Coherence
Java API Y Y
Java Client Y Y
REST API Y Y

Storage

A data grid should support a read data from a cache loader and read / write data to a cache store. While a cache loader may be used for a read only data grid, a cache store may be used for a read / write data grid. It should support writing entries to a cache store, and it should support reading entries from a cache loader and / or cache store when they are not in the data grid. A cache loader and / or cache store may be implemented with a file system or a database.

In addition, a data grid should support both write-through and write-behind persistence with a cache store. With write-through persistence, the write call does not return until the entry has been written to both the data grid and the cache store. The entry is written to both the data grid and the cache store in the same call. Whereas write-through persistence is synchronous, write-behind persistence is asynchronous. With write-behind persistence, the write call returns after the entry has been written to the data grid. The entry is written to the cache store afterwards.

Finally, a data grid should support activation and passivation via a cache store. Passivation occurs when an entry is evicted from the data grid. It is written to the cache store and deleted from the data grid. Activation occurs when an entry is read after having been evicted. It will be activated. It will be read from the cache store and written to the data grid.

JBoss
Data Grid
Oracle
Coherence
File System Cache Loader / Store Y Y
Database Cache Loader / Store Y Y
Read-Through Y Y
Write-Through Y Y
Write-Behind Y Y

Management

A data grid should support management and monitoring options via the Java Management Extensions (JMX) API with applications such as JConsole or VisualVM.

In addition, a data grid should support an eventing model. For example, when an entry is created, updated, or deleted the data grid should should be able to treat the call as an event. Further, applications should be able to register both synchronous and asynchronous listeners with a data grid so that the can be notified of events. If a synchronous listener has been registered with the data grid, a call (e.g. put / delete) will not return until the listener has been notified.  If an asynchronous listener has been registered with the data grid, a call (e.g. put / delete) will return before the listener has been notified. The listener will be notified afterwards.

JBoss
Data Grid
Oracle
Coherence
Distributed Java Management Extensions Y Y
Events / Notification Y Y
Synchronous Listeners Y Y
Asynchronous Listeners Y Y

This post is the second in a series introducing the concepts of data grids and JBoss Data Grid itself.

  1. Data Grid – Cache Evolved
  2. We, Data Grid
  3. Data Grid, JBoss Data Grid