Operational Data Lakes…a Contradiction in Terms?
The Data Lake was introduced as an answer to the problem of having important data locked up in silos of production applications and departmental databases. The assumption was that loading all that data into a single structure would be able to reveal important insights about the operation of a company. The trend was amplified by the fact that Hadoop made it possible to store large amounts of data on affordable hardware.
Hadoop Data Lakes are welcoming to data of all types, and there is an abundance of analysis tools to retrieve value from the assembled data. Companies use Exploratory Analytics, seeing if they can uncover new insights that can be applied to improve processes. If they are effective, they can also use the Data Lakes as a staging area for their Data Warehouses, transforming and aggregating the data to prepare it for loading to the data warehouse.
But in pursuit of the goal of cheap scale-out infrastructure, the “schema-on-read” approach sacrifices the ability to run operational workloads on the Data Lakes. What if you can use the scale-out infrastructure to capture both structured and unstructured data, and what if you can access the information using a full implementation of SQL, with millisecond response times on complex queries across petabytes of data? That is what Operational Data Lakes do.
Powering Operational Data Lakes with Splice Machine
Splice Machine is a relational DBMS that leverages HDFS, HBase and Spark to deliver the economics and horizontal scaling of a Hadoop Data Lake, while offering full ANSI SQL, ACID transactions, and real-time analytics to power even the most demanding operational applications.
The result is that Splice Machine can continuously and concurrently ingest large amounts of data from source systems, while supporting transactional applications such as customer service operations and operational reporting, as well as real-time analytical workloads to discover trends that require immediate action.
For a detailed description of the Splice Machine architecture, see “How It Works”.
Cetera Financial Group is building a single source of truth for 10,000 distributed users. Splice Machine has replaced a traditional RDBMS solution and consolidated multiple disparate legacy databases into a single Enterprise Data Hub delivering a single source of truth across a range of applications and use cases.
"As an expanding company, our wealth management technology platform experienced rapid data growth, and we needed additional tools to quickly access our growth in analytic data to guide strategic decisions and optimize our business processes. Moving to an Enterprise Data Hub powered by Splice Machine resulted in significant performance improvements."Mohan Gurupackiam, CTO, Cetera Financial Group
Sample Operational Data Lake Use Cases
Replace Operational Data StoresAn operational data lake has the following additional benefits over an ODS:
Offloading Reporting and Analytics Tasks from SQL DatabasesAs the amount of data in traditional databases grows, their performance on reporting and analytical workloads suffers and negatively impacts their transactional duties.
Complementing an Existing Hadoop Data LakeFor an existing Hadoop-based data lake, Splice Machine becomes a powerful and flexible repository for structured data: