A Data Lake is a data storage repository where data can be stored in its native format regardless of whether it is in unstructured, semi-structured, or structured form. Advanced ones like the SAP data lake are capable of achieving more. This is why organizations are increasingly incorporating data lakes into their existing IT infrastructure for improved database performance at very cost-effective rates.
The Launch of SAP data lake
SAP launched HANA Data Lake in April 2020. The mission was to provide clients with a very cost-effective yet high performance and advanced data storage system. The full package had a native storage extension as well as a relational SAP data lake that was available out of the box. This brought the SAP data lake at par with other leaders in the cloud-based ecosystem such as Microsoft Azure and Amazon S3 (Simple Storage Service) so far as data processing competencies and other functionalities are concerned.
SAP data lake was launched with a host of innovative features, the most important being the 10x data compression capability. This results in massive savings in data storage costs as the volume of data is compressed and reduced 10 times before storage. Further, users can either keep the SAP data lake in the existing HANA Cloud or move it to a new instance. Whatever the method adopted, users can download additional storage space whenever required and get access to all the benefits of the cloud-like data encryption, audit logging, and tracking data access.
The Architecture of the SAP data lake
The architecture of the SAP data lake is unique and different from other data lakes. It offers businesses the option to store data that is frequently used and requires regular access (hot data) and move data that is not used much (warm data) to the Native Storage Extension (NSE) of SAP HANA.
Visualize the SAP data lake as a pyramid with three layers.
The top part of the pyramid is used to store data that is regularly accessed and which is critical for the daily workings of an organization. Because of the high data usage, the cost of storage here is the highest among the three segments in the pyramid.
In the middle of the pyramid is data that is not as important as the top layer but not so insignificant as to be deleted from the system. This is known as warm data which is not as high-performing as the top tier and is not used regularly. Due to lower data access rates, the cost of data storage too is lower than the top layer.
At the bottom of the pyramid lies data that is rarely used and would have been deleted in traditional databases. But the SAP data lake structure offers rock-bottom prices for data storage here and companies prefer to hold on to the data.