How It Works
Splice Machine: A Parallel SQL Database with the Massive Scalability Of Hadoop
Splice Machine marries two proven technology stacks: Apache Derby and HBase/Hadoop.
Apache Derby: Java-Based, ANSI SQL Database
With over 15 years of development, Apache Derby is a Java-based SQL database. Splice Machine chose Derby because it is a full-featured ANSI SQL database, lightweight (<3 MB), and easy to embed into the HBase/Hadoop stack.
HBase/Hadoop: Proven, Distributed Computing Infrastructure
HBase and Hadoop have become the leading platforms for distributed computing. HBase is based on Google’s Big Table technology and uses the Hadoop Distributed File System (HDFS) for reliable and available persistent storage. HBase provides auto-sharding and failover technology for scaling database tables across multiple servers. It also enables real-time, incremental writes on top of the immutable Hadoop file system.
HBase and Hadoop are the only technologies proven to scale to dozens of petabytes on commodity servers and are used by companies such as Facebook, Twitter, Adobe and Salesforce.com. Splice Machine chose HBase and Hadoop because of their proven auto-sharding, replication, and failover technology.
Splice Machine: Best of Apache Derby and HBase/Hadoop
Splice Machine integrates these technology stacks by replacing the storage engine in Apache Derby with HBase. Splice Machine retains the Apache Derby parser, but it redesigned the planner, optimizer, and executor to leverage the distributed HBase computation engine.
HBase co-processors are used to embed Splice Machine in each distributed HBase region (i.e., data shard). This enables Splice Machine to achieve massive parallelization by pushing the computation down to each distributed data shard.
Compatible with Standard Hadoop Distributions
Since Splice Machine does not modify HBase, it can be used with any standard Hadoop distribution that has HBase. Supported Hadoop distributions include Cloudera, MapR and Hortonworks.