The Lambda Architecture enables a continuous processing of real-time data. It is a painful process that gets the job done, but at a great cost. Splice Machine offers a simplified solution, Lambda-in-a-Box, that delivers the benefits of Lambda without the “enterprise duct tape” of other approaches.
View our webinar on how Splice Machine simplifies the Lambda Architecture.
Lambda Architectures are ubiquitous in machine learning and data science applications. They enable continuous processing of real-time data without the ETL lag that plagues traditional operational (OLTP) and analytical (OLAP) implementations. Typically, OLTP databases are normalized for performance and then extensive ETL pipelines de-normalize this data, typically into star schemas on OLAP engines. This process usually takes at least a day.
Companies implement a Lambda Architecture to circumvent this lag. For the batch layer, they typically use a batch analytics processing engine on Hadoop, like MapReduce, Hive or Spark. For the serving layer, they use a NoSQL/Key-Value engine like Cassandra, HBase, Impala and Druid. For the speed layer, there typically is a queuing system like Kafka and a streaming system like Storm, Spark Streaming or even Flink.
Splice Machine offers a better solution to the complexity of Lambda Architectures. We call it Lambda-in-a-Box. With the new scale-out RDBMS systems, you can now get all the benefits of Lambda with a much simpler architecture.
For example, here’s how a machine learning application can use Lambda-in-a-Box:
Splice Machine is the open-source SQL RDBMS, powered by Apache Hadoop® and Apache Spark™.
The Splice Machine RDBMS provides:
By centralizing on a relational Lambda Architecture on Splice Machine, teams can build machine learning applications very quickly, maintain them with standard operational personnel, and be able to tightly integrate machine learning into the application without extensive use of “Enterprise Duct Tape”.