SAP HANA Vora on Apache Spark

Industry use cases of Big Data in general and Apache Spark in particular can roughly be divided into two categories
1. Faster and better analytics on existing enterprise data
2. Finding insights using yet untapped data

Though Hadoop’s claim to fame has been finding insights using yet untapped unstructured and semi-structured data, it’s role in merging existing enterprise data silos into enterprise data lake/hub is equally if not more disruptive.

Big Enterprise Data

Traditionally enterprise data is stored in ERP and CRM systems and is moved to Enterprise Data Warehouse for reporting. To provide reports with low latency this data is moved to OLAP cubes which is denormalized dimensional representation of the data.

Big Data for a long time was wild-west for enthusiasts. Some open-source experts would get funding to setup a small cluster and would take a use case to test it out. It would work fine on that use-case but would never go far from being a hobby project, put in yet another silo. This approach was important and it created buzz but it was far from wide enterprise adoption one could hope for.

What paved the way for enterprise adoption were following recent developments:

  • Comprehensive security features in Hadoop
  • Comprehensive compliance and governance features in Hadoop
  • Spark and the low latency it brought along
  • Schema aware file formats like Parquet
  • Schema aware data structures like DataFrames

Enterprise Data <=> SAP HANA <=> Big Data

SAP HANA has worked as gateway to the SAP world looking from big data frame of reference. Inside the SAP world, it is becoming a database of choice to run applications like SAP BW. It can be seen as SAP’s way of reducing latency in SAP eco-system.

SAP HANA <=> SAP VORA <=> Big Data

SAP Vora is SAP’s way to combining latency benefits of HANA to the low-latency of Apache Spark.

If you need help in integrating your SAP eco-system with Big Data, please contact us at bigdata@infoobjects.com and we would be glad to help you.