How It Works
Splice Machine DBaaS has been designed from the ground up to be portable. Leveraging a technology stack of DC/OS, Marathon, ELK and more, applications and storage are containerized, secured and monitored with guaranteed availability. This architecture is portable across public clouds such as AWS, Azure and Google, as well as on-premise infrastructure.
Splice Machine is currently available for on-premise deployments and as a database service on AWS. Other cloud platforms will be added during the remainder of 2017.
Splice Machine has an integrated Zeppelin Notebook interface. Zeppelin notebooks are like text documents, but with code that can be executed and of which the output can be rendered as tables, reports and beautiful graphs. This enables you to prepare and run SQL DDL and DML, stored procedures, Java, Scala, and Python and Spark-SQL programs with Splice Machine data.
Splice Machine comes pre-configured with a set of notebooks to get started, load data and see examples of the work that can be done with the RDBMS.
Scale-Out Hybrid SQL RDBMS
Splice Machine can dynamically scale from a few to thousands of nodes to enable applications at every scale.
SQL Parser, Planner, Cost-Based Optimizer, Executor
Splice Machine runs on each node of a cluster. The Splice Machine optimizer automatically evaluates each query and routes OLTP queries to the distributed HBase regions or OLAP queries to the distributed Spark Workers.
HBase/Hadoop: Proven, Distributed Database Technology
HBase uses the Hadoop Distributed File System (HDFS) for reliable and replicated storage. HBase/HDFS provides auto-sharding and failover technology for scaling database tables across multiple servers. It is the only technology proven to scale to dozens of petabytes on commodity servers.
HBase co-processors are used to embed Splice Machine in each distributed HBase region (i.e., data shard). This enables Splice Machine to achieve massive parallelization by pushing the computation down to each distributed data shard.
Spark: Powerful, In-Memory Computation Engine
Spark has very efficient in-memory processing that can spill to disk (instead of dropping the query) if the query processing exceeds available memory. Spark is also unique in its resilience to node failures, which may occur in a commodity cluster. Other in-memory technologies will drop all queries associated with a failed node, while Spark uses ancestry (as opposed to replicating data) to regenerate its in-memory Resilient Distributed Datasets (RDDs) on another node.
Splice Machine accelerates generation of Spark RDDs by reading HBase HFiles in HDFS and augmenting it with any changes in Memstore that have not been flushed to HFiles. Splice Machine then uses the RDDs and Spark operators to distribute processing across Spark Workers.
ACID Transactions on a Scale-Out Architecture
Splice Machine provides full ACID (Atomicity, Consistency, Isolation, Durability) transactions across rows and tables, using a snapshot isolation design that uses Multiple Version Concurrency Control (MVCC) to create a new version of the record every time it is updated.
With each transaction having its own virtual “snapshot”, transactions can execute concurrently without any locking. This leads to very high throughput and avoids troublesome deadlock conditions.